Reducing the fear of introducing breaking changes in the cloud --------------------------------------------------------------  Infrastructure as Code is one of the cool things right now. Every DevOps-related conference in the past two years had a talk or two about the subject, and that’s a good thing. In the wake of the DevOps movement, [HashiCorp](https://www.hashicorp.com) emerged as one of the most respected companies in that space. Today I’m going to talk about one of their products: [**Terraform**](https://www.hashicorp.com/products/terraform). ### What is Terraform? Terraform is a tool which allows to easily manage cloud resources in a declarative way. Using a simple Programming Language, it lets you define pretty much the shape of a cloud infrastructure including VPCs, Subnets, Compute Instances, Load Balancers, DNS Records and so on. It works with every major cloud provider, but it’s not cloud-agnostic. That means you can create for example a Load Balancer in AWS or Google Cloud, but the code will be slightly different for each of them. ### What is Blue/Green deployment? Blue/Green deployment is a DevOps practice that aims to reduce downtime on updates by creating a new copy of the desired component, while maintaining the current. Given that, you end with two versions of the system: One with the actual version (_blue_) and another with a newer one (_green_). When the new version is up and running, you can seamlessly switch traffic to it. This is useful not only to reduce downtime, but also to improve rollback time when something bad happens.  Example 1  Example 2 ### Blue/Green Infrastructure While Blue/Green deployment is a technique more commonly used with application deployment, the reduced costs of the cloud, in conjunction with the tools we have right now, make possible to have two copies of an entire cloud infrastructure with little to no pain. It is important to note that doing Blue/Green deployment of an entire Cloud Infrastructure is not a silver bullet and certainly a bit too much if you are doing small changes (for example, adding a new EC2 Instance to your stack). But for major/breaking changes is a win and I personally recommend it. ### Terraform to the rescue! I’ll be using Amazon Web Services for this tutorial, but the code won’t vary too much with another provider. After finishing this, you will be able to create an infrastructure containing: * A Virtual Private Cloud * Three Subnets, each one in a different Availability Zone * A Security Group * Three EC2 Instances serving an NGINX Server on the Port 80 (each one in a different subnet) * A Load Balancer pointing to those Instances Then, you will be able to: * Make changes in the Infrastructure * Create an entire new Infrastructure with that change * Switch traffic through the new Infrastructure * Destroy the Old Infrastructure * Profit The full example can be seen on [https://github.com/santiagopoli/terraform-examples/tree/master/blue-green](https://github.com/santiagopoli/terraform-examples/tree/master/blue-green) To follow this tutorial, you need to have your AWS Credentials configured in your Environment, with at least the **EC2FullAccess** policy attached. #### Creating a VPC (Virtual Private Cloud) I know this is a Terraform tutorial, but a recommended practice is to have a manually created VPC. You can create VPCs with Terraform, but there are a lot of external services that rely on knowing your VPC ID beforehand, so it is better to not create a new one every time on every Blue/Green deployment. Also, you may have security groups that are created externally by another team in your organization. For that matter, we will be creating a VPC using the AWS Console. You can also create a VPC with the command line by doing: (change the CIDR block to anything you like) **\> aws ec2 create-vpc --cidr-block 10.0.0.0/16** { "Vpc": { "VpcId": "**vpc-ff7bbf86**", "InstanceTenancy": "default", "Tags": \[\], "CidrBlockAssociations": \[ { "AssociationId": "vpc-cidr-assoc-6e42b505", "CidrBlock": "10.0.0.0/16", "CidrBlockState": { "State": "associated" } } \], "Ipv6CidrBlockAssociationSet": \[\], "State": "pending", "DhcpOptionsId": "dopt-38f7a057", "CidrBlock": "10.0.0.0/16", "IsDefault": false } } Note your **VpcId**, as you will need it in a second. #### Installing Terraform You can download Terraform by either going to [this link](https://www.terraform.io/downloads.html) or by using any Package Manager (brew, apt) #### Creating the project Create a new folder in your workspace and name it **terraform\_blue\_green.** Then, initialize a GIT repository, add a simple .gitignore that ignores the .terraform folder and open the folder with your favorite text editor. In my case, I’ll be using [Visual Studio Code](https://code.visualstudio.com/). **\> mkdir terraform\_blue\_green \> cd terraform\_blue\_green \> git init \> echo .terraform >> .gitignore \> code .** #### Initializing the Terraform State Terraform stores the state of the infrastructure in a JSON File. It is recommended (and required for this tutorial) to store that file on an external backend like Amazon S3. As I’m using AWS for this Tutorial, I’ll stick to S3, but [Terraform supports the equivalent in each provider](https://www.terraform.io/docs/backends/types/index.html). First of all, you need to create the S3 bucket in which the state will reside. You can do this either by going to the S3 Console or by doing: **\> aws s3api create-bucket --bucket terraform-bluegreen** Then, create a file named **bootstrap.tf** inside the project folder, with this content. In this file we have defined * The **version** of the Infrastructure * The **Cloud provider** we will be using (in this case, AWS) * The **Backend** in which the state will be saved (in this case, S3), and the configuration attached to it. With that file in place, run this command in your project folder: **\> terraform init** Initializing the backend... Successfully configured the backend "s3"! Terraform will automatically use this backend unless the backend configuration changes. Initializing provider plugins... \- Checking for available provider plugins on [https://releases.hashicorp.com](https://releases.hashicorp.com)... \- Downloading plugin for provider "aws" (1.11.0)... Terraform has been successfully initialized! #### Using existing resources in Terraform As we’ll need the ID of the previously created VPC to do anything in our infrastructure, we will be storing it in a variable. To do this, create a file named **vpc.tf** with this content: #### Creating our first resource: Subnets To do anything useful, we first need subnets. We will create three of them, each in a different Availability Zones. Create a file named **subnets.tf,** with this content: In this file we created three subnets specifying: * **Count:** The number of Subnets we want to create * **Availability zone:** In this case we are using the _element()_ function which function takes a list and an index and returns the element, even if the index is greater than the number of elements. This is useful to assign a different availability zone to each subnet. * **VPC ID** * **CIDR Block:** This is probably the most confusing part. We interpolated the previously defined **infrastructure\_version** variable into the CIDR block. This will help in the future, when creating the second version. You may change the CIDR Block with the one you defined in the VPC. * Assign a **Public IP** by default to any Network Interface assigned to this subnet * **Name:** We’ve appended the Infrastructure version into it With the file in place, first do: **\> terraform plan** \+ aws\_subnet.terraform-blue-green\[0\] ... \+ aws\_subnet.terraform-blue-green\[1\] ... \+ aws\_subnet.terraform-blue-green\[2\] ... Plan: 3 to add, 0 to change, 0 to destroy. The **plan command** does a dry run and tells you what changes will be done. It is important to plan before doing anything, as you can spot errors. In this case, the plan tells us that it will add three subnets, and that’s what we wanted. So now we can run this: **\> terraform apply** Do you want to perform these actions? Terraform will perform the actions described above. Only 'yes' will be accepted to approve. Enter a value: **yes** aws\_subnet.terraform-blue-green\[0\]: Creating... ... aws\_subnet.terraform-blue-green\[1\]: Creating... ... aws\_subnet.terraform-blue-green\[2\]: Creating... ... aws\_subnet.terraform-blue-green\[0\]: Creation complete after 5s aws\_subnet.terraform-blue-green\[1\]: Creation complete after 5s aws\_subnet.terraform-blue-green\[2\]: Creation complete after 5s Apply complete! Resources: 3 added, 0 changed, 0 destroyed. Now, you can go to the AWS Console (under the _VPC/Subnets_ section) and your subnets should appear   #### Creating a Security Group To be able to access our resources in the future, we need to create a Security Group in our VPC. For the sake of simplicity, we will be creating a Security Group that enables all inbound traffic from everywhere. Create a file named **security\_groups.tf** in your project, with this content: In this file, we’ve created a security group for our VPC (using its VPC ID) and two Rules: One for Inbound traffic and one for Outbound traffic. The most important parts are: * **From/To Port:** The port range the rule applies for. In this case, we target all possible port ranges * **Protocol:** You can use either HTTP, TCP or “-1”, which applies for both TCP and HTTP * **CIDR Blocks:** A list of CIDR blocks that are enabled by the rule. In our case, we enabled all ipv4 traffic. With the file in place, run a **terraform plan** and **terraform apply**. After that, we should be able to see our security group in the AWS Console (under _EC2/Security Groups_).  #### Creating an SSH Key To be able to access an AWS Instance later in the future, we need to assign an SSH Key to it. First, create a Key Pair by using **ssh-keygen:** **\> mkdir keypairs** **\> ssh-keygen -f keypairs/keypair -P ""** Generating public/private rsa key pair. Your identification has been saved in keypairs/keypair. Your public key has been saved in keypairs/keypair.pub. Given this is a tutorial, don’t bother moving the Private Key to a secure place (but you should definitely do it). Then, create a file named **keypairs.tf** in the root folder of the project. Give it this content: Then do: **\> terraform plan** \+ aws\_key\_pair.key\_pair ... Plan: 1 to add, 0 to change, 0 to destroy. **\> terraform apply** Terraform will perform the following actions: \+ aws\_key\_pair.terraform-blue-green ... Plan: 1 to add, 0 to change, 0 to destroy. Do you want to perform these actions? Terraform will perform the actions described above. Only 'yes' will be accepted to approve. Enter a value: yes Apply complete! Resources: 1 added, 0 changed, 0 destroyed. Now, the Key appears on the AWS Console (under _EC2/Key Pairs_)  #### Creating (at last!) EC2 Instances Create a file named **instances.tf**, and paste the following: Let’s explain a little bit about this file: We have created a resource of type **aws\_instance**, with this parameters: * **Count:** The number of resources of this type. In this case, we will create 3 instances * **AMI:** The Amazon Image for the instances. In this case, we chose the Official AWS ECS Image, as we will run a docker container Inside of it. Keep in mind that this AMI only works in the US-WEST-2 region, so [check this link](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-optimized_AMI.html) if you are in another region. * **Instance Type:** The instance type of the instances * **Subnet Id:** As explained above, the _element()_ function takes a list and an index and returns the element, even if the index is greater than the number of elements. This is allows us to assign a different subnet id to each instance. * **Key Name:** The name of the key pair. We chose the previously created one * **User Data:** This allows us to assign an initialization script to the instance. In our case, we are running an NGINX Docker Container and exposing it in the port 80. There are better ways to define User Data Scripts, but we’ll keep it simple for now. With the file in place, run: **\> terraform plan** Terraform will perform the following actions: \+ aws\_instance.terraform-blue-green\[0\] ... \+ aws\_instance.terraform-blue-green\[1\] ... \+ aws\_instance.terraform-blue-green\[2\] ... Plan: 3 to add, 0 to change, 0 to destroy. **\> terraform apply** Terraform will perform the following actions: \+ aws\_instance.terraform-blue-green\[0\] ... \+ aws\_instance.terraform-blue-green\[1\] ... \+ aws\_instance.terraform-blue-green\[2\] ... Plan: 3 to add, 0 to change, 0 to destroy. Do you want to perform these actions? Terraform will perform the actions described above. Only 'yes' will be accepted to approve. Enter a value: **yes** .... Apply complete! Resources: 3 added, 0 changed, 0 destroyed. **Outputs:** **instance\_public\_ips = \[ ip1, ip2, ip3 \]** When the command finishes, you will be able to see the instances in your AWS Console  As you see, every of them are on different availability zones. Accessing any of your instances (via their public IP) from a browser should display:   #### Adding a Load Balancer Create a file named load\_balancers.tf with this conent: In this file we’ve created a Load Balancer with * **Name**: Self explanatory * **Subnets**: The subnets the Load Balancer is Available In * **Security Groups:** We’ve added the previously created Security Group to be able to access it * **Instances:** We’ve added the previously created instances * **Listeners:** We’ve added a single listener that listens in the 80 port of the load balancer, and points to the port 80 of the instances * **Healthcheck:** We’ve added a simple HTTP Healthcheck that targets the port 80 of the instance. * **Outputs:** We’ve added an output that displays the Load Balancer Public DNS With the file in place, do a **terraform plan** and a **terraform apply.** When the execution ends, it should output the Load Balancer’s public dns. **\> terraform apply** .... Outputs: ... **load\_balancer\_dns = terraform-blue-green-v1-xxxxxx.us-west-2.elb.amazonaws.com** Accessing the public DNS of the Load Balancer from a browser should display the NGINX page.   Yes, I’ve used the same screenshot twice You should also be able to see it in the AWS Console (under _EC2/Load Balancers_)  **(Optional) Assign A DNS Record to the Load Balancer V1** I’m not going to cover too much of this case, but what I’ve ended up doing in production is creating a DNS Record that points to a specific version of the Load Balancer. An example of this in terraform could be: #### Commit your changes Commit your changes so far **\> git add . \> git commit -m "Version 1"** #### Manually Pointing a DNS record to the Load Balancer **DevOps is not all about Automation**. In some cases, it’s a good practice to have a minimal human interaction. In our case, we will assign a DNS record to the desired version of the infrastructure (via the load balancer). To be able to perform this step you’ll need to have a registered domain and the corresponding Route 53 Hosted Zone. Enter the desired Hosted Zone and create an **A Record** with an **Alias** of the previously created **load balancer** (terraform-blue-green-v1…). This is the entry point of your system and what your clients will be accessing. #### Creating the Infrastructure V2 First, create a new branch in your repository (and I seriously recommend removing the .terraform folder): **\> git checkout -b v2 \> rm -rf .terraform** Now, modify bootstrap.tf with this: As you see, you need to modify both the **infrastructure\_version** variable and the **key of the S3 Bucket.** I’ll be nice if terraform allowed to interpolate the infrastructure\_version variable in the key, but for now it’s not possible. There is an issue in Github though. Now, as you deleted the .terraform folder, you need to reinitialize the state: **\> terraform init** Now modify your instances.tf with this content: (We’ve **changed the instance size from t2.micro to t2.medium**. You can chose whatever you like) Doing a **terraform plan** will reveal that in fact, terraform will create all resources again. **\> terraform plan** ... Plan: 11 to add, 0 to change, 0 to destroy. After doing **terraform apply,** you should end with an entire new infrastructure, without changing the old one.  Instances  Subnets  Security Groups  Load Balancers #### Routing traffic through the new Infrastructure As we did previously with Version 1, point your DNS record to the new load balancer using an ALIAS. #### Deleting the old infrastructure When all traffic starts going to the new Load Balancer, it’s time to delete the Version 1 of the infrastructure. To do this, first commit all the changes in Version 2, and then checkout the old version again. Delete the .terraform folder and initialize the state again. **\> git add . \> git commit -m "Version 2" \> git checkout master \> rm -rf .terraform \> terraform init** Then, simply do: **\> terraform destroy** Terraform will perform the following actions: \- aws\_elb.terraform-blue-green \- aws\_instance.terraform-blue-green\[0\] \- aws\_instance.terraform-blue-green\[1\] \- aws\_instance.terraform-blue-green\[2\] \- aws\_key\_pair.terraform-blue-green \- aws\_security\_group.terraform-blue-green \- aws\_security\_group\_rule.terraform-blue-green-inbound \- aws\_security\_group\_rule.terraform-blue-green-outbound \- aws\_subnet.terraform-blue-green\[0\] \- aws\_subnet.terraform-blue-green\[1\] \- aws\_subnet.terraform-blue-green\[2\] Plan: 0 to add, 0 to change, 11 to destroy. Do you really want to destroy? Terraform will destroy all your managed infrastructure, as shown above. There is no undo. Only 'yes' will be accepted to confirm. Enter a value: **yes** ... Destroy complete! Resources: 11 destroyed. Now, if you go to the AWS console you should see only the V2 resources. For example, this is a screenshot of the Instances after destroying the Version 1:  You can now merge the v2 branch into master. If you are doing this just for fun, please do a **terraform destroy** for v2 ;) Remember that you can see this full example on [https://github.com/santiagopoli/terraform-examples/tree/master/blue-green](https://github.com/santiagopoli/terraform-examples/tree/master/blue-green) #### A note of caution In this guide, we used a DNS Record to select which infrastructure version is the production one. While this works most of the time, there are some cases when some client-side libraries cache DNS Entries, so you should wait some time to get the traffic to drain from the old balancer. You can solve this by maintaining a manually-created load balancer and changing its instances. #### Conclusion Terraform provides a clean and declarative way of defining Infrastrucure as Code. Thanks to that, we can use it to perform things that seemed impossible a few years ago. I want to end this article by saying that this approach has a couple of downsides, and I will write an article in the future explaining how to achieve the same results by using Terraform Modules (those in fact provide better flexibility overall). > Thanks for reading!