Infrastructure as Code Tutorial

What in the world is Infrastructure as Code?

Infrastructure as Code is one of the key practices in DevOps.

Back in the days, when the DevOps movement started, people realized that the work of IT operations (system administrators) is very chaotic. Most of the system administration tasks were done manually or via self-written scripts. Those two approaches didn’t work well.

Manually done tasks brought a real mess when working in a team, because no one knew what the managed infrastructure looked like, how the running machines were configured, what changes were made. And when there was a failure in the system, no one could tell what changes could’ve brought the system down. The work of system administrator in this case turned into wild guessing.

Scripts didn’t work well for infrastructure and configuration management either. They were better than manual work in that we sort of automated and standardized the tasks and we could keep the scripts in source control systems like Git tracking the changes made to it. But there were significant problems with the scripts.

First of all, there is a lot of variation in how people can write a script that is meant to do the same tasks. Thus, every script comes out to be unique and often requires a lot of time and effort to understand.

Secondly, as the system configuration grows bigger with the time the scripts become bigger and more complex. This is especially true if multiple people are working on the same scripts. Besides, people may come and go without leaving proper documentation on their scripts. It could easily happen that at some point there would be no people on the team that would really understand what this particular script does.

Finally, scripts are not good for long term configuration and infrastructure management, because they don’t provide idempotence. Idempotence means that if we run a script multiple times, we will get the same results. Most of the scripts can’t ensure the same results if being run more than once. Idempotence provides us with ability to make the changes to the system over time and ensure that it is configured properly by changing and using the same configuration script. It is a key condition to long term configuration and infrastructure management. Even though, it is possible to ensure idempotence with the scripts, it’s very hard to do considering the first and the second reasons mentioned previously.

So smart folks realized that manual work and scripts don’t work really well for IT operations and suggested a new approach, that is Infrastructure as Code.

They found it really helpful to keep all the configuration scripts in source control, because it allowed to control the changes that are happening. This is the key to working in a team environment.

They also realized that most of the configuration and infrastructure management tasks are in reality very common and well defined: starting a VM with specific characteristics, creating a firewall rule, installing system packages, copying files, starting a service, etc. So they thought: “Why don’t we write modules/functions that perform those common tasks for the systems we’re working with? We would keep them tested, ensure they work across our systems and are idempotent.” Now instead of writing our own implementation of the scripts each time we need to do something, we would use these modules/functions (higher level abstractions built on the scripts).

For example, compare this Bash script to install system packages:

to this Ansible playbook, that provides us with an idempotent abstraction to do the same task:

Why is it called Infrastructure as Code if there is no real code?

True, this example doesn’t look like code you’re probably used to see:

But behind apt module there is an actual Python code, that does the job. You can think of apt as the name of the function which accepts parameters, e.g. a flag whether to install or delete a package, names of the packages, etc.

Why do we need it?

Infrastucture as Code approach brings order in the work of system administration. Some of the key benefits are:

It allows us to control the current state of our managed systems. Because we ensure that our configuration is idempotent, we can manage the whole life cycle of the managed system through the same scripts. For example, we might install a package on our system, that was already initially configured with the following Ansible playbook:

We can then change that playbook by adding the name of another package and run that playbook to ensure it’s installed:

If we run the playbook again,we can be confident that if the system already has the right packages, nothing will be installed or if some packages are missing (for example, they were removed manually by some bad employee) they will be installed. Managing the whole life cycle of the system through the same scripts means that the configuration code in our source control repo reflects the current state of the managed system.

We can apply the practices common in development world to our infrastructure management. Keeping the code in source control, using peer reviews, test our code are very common practices in software development. They allow us to keep an eye on all the changes that are happening, ensure people approve it using peer reviews, and prevent failures on early stages by covering our code with tests. These are valuable practices not only for development world, but for operations as well.

I’m a developer, why in the world should I learn about Infrastructure as Code?

Aren’t you curious about how to run the application that you develop on a remote server? And what tools operations folks use to do that?

Moreover, if you’re keen on writing code, Infrastructure as Code should be easy for you to learn. You’ll just have to learn a new language syntax, which is less complex to understand than any of the programming languages you already know. That should be fun :)

What is Infrastructure as Code Tutorial?

I made an attempt to create a simple and easy to follow tutorial about Infrastructure as Code. It’s practice-based, meaning there’s not too much theory in it, but lots of practice to get you a feel of what Infrastructure as Code is.

It covers lots of tools common in modern operations world: Packer, Terraform, Ansible, Vagrant, Docker and Docker Compose. I also plan on adding more examples in the future.

By the end of the tutorial, you’ll get your own repository with a test application and infrastructure code to manage the environment for running that application. Here is an example of the repository you’ll get.

What for this Tutorial was made?

I remember my first encounter with Infrastructure as Code and all of those different tools covered in this tutorial. It took some time for me to get an idea of what is going on, what problems these tools solve and how they work together. I made this tutorial to help the folks who are new to all this or who feel confused about the tools.

It’s open source and I really hope to hear any suggestions on how to make it better :)