Blue/Green Infrastructure with Terraform

Reducing the fear of introducing breaking changes in the cloud

Infrastructure as Code is one of the cool things right now. Every DevOps-related conference in the past two years had a talk or two about the subject, and that’s a good thing.

In the wake of the DevOps movement, HashiCorp emerged as one of the most respected companies in that space. Today I’m going to talk about one of their products: Terraform.

What is Terraform?

Terraform is a tool which allows to easily manage cloud resources in a declarative way. Using a simple Programming Language, it lets you define pretty much the shape of a cloud infrastructure including VPCs, Subnets, Compute Instances, Load Balancers, DNS Records and so on. It works with every major cloud provider, but it’s not cloud-agnostic. That means you can create for example a Load Balancer in AWS or Google Cloud, but the code will be slightly different for each of them.

What is Blue/Green deployment?

Blue/Green deployment is a DevOps practice that aims to reduce downtime on updates by creating a new copy of the desired component, while maintaining the current. 
Given that, you end with two versions of the system: One with the actual version (blue) and another with a newer one (green). When the new version is up and running, you can seamlessly switch traffic to it. This is useful not only to reduce downtime, but also to improve rollback time when something bad happens.

Example 1
Example 2

Blue/Green Infrastructure

While Blue/Green deployment is a technique more commonly used with application deployment, the reduced costs of the cloud, in conjunction with the tools we have right now, make possible to have two copies of an entire cloud infrastructure with little to no pain.

It is important to note that doing Blue/Green deployment of an entire Cloud Infrastructure is not a silver bullet and certainly a bit too much if you are doing small changes (for example, adding a new EC2 Instance to your stack). But for major/breaking changes is a win and I personally recommend it.

Terraform to the rescue!

I’ll be using Amazon Web Services for this tutorial, but the code won’t vary too much with another provider.

After finishing this, you will be able to create an infrastructure containing:

  • A Virtual Private Cloud
  • Three Subnets, each one in a different Availability Zone
  • A Security Group
  • Three EC2 Instances serving an NGINX Server on the Port 80 (each one in a different subnet)
  • A Load Balancer pointing to those Instances

Then, you will be able to:

  • Make changes in the Infrastructure
  • Create an entire new Infrastructure with that change
  • Switch traffic through the new Infrastructure
  • Destroy the Old Infrastructure
  • Profit

The full example can be seen on https://github.com/santiagopoli/terraform-examples/tree/master/blue-green

To follow this tutorial, you need to have your AWS Credentials configured in your Environment, with at least the EC2FullAccess policy attached.

Creating a VPC (Virtual Private Cloud)

I know this is a Terraform tutorial, but a recommended practice is to have a manually created VPC. You can create VPCs with Terraform, but there are a lot of external services that rely on knowing your VPC ID beforehand, so it is better to not create a new one every time on every Blue/Green deployment.

Also, you may have security groups that are created externally by another team in your organization. For that matter, we will be creating a VPC using the AWS Console. You can also create a VPC with the command line by doing:

(change the CIDR block to anything you like)
> aws ec2 create-vpc --cidr-block 10.0.0.0/16
{
"Vpc": {
"VpcId": "vpc-ff7bbf86",
"InstanceTenancy": "default",
"Tags": [],
"CidrBlockAssociations": [
{
"AssociationId": "vpc-cidr-assoc-6e42b505",
"CidrBlock": "10.0.0.0/16",
"CidrBlockState": {
"State": "associated"
}
}
],
"Ipv6CidrBlockAssociationSet": [],
"State": "pending",
"DhcpOptionsId": "dopt-38f7a057",
"CidrBlock": "10.0.0.0/16",
"IsDefault": false
}
}

Note your VpcId, as you will need it in a second.

Installing Terraform

You can download Terraform by either going to this link or by using any Package Manager (brew, apt)

Creating the project

Create a new folder in your workspace and name it terraform_blue_green. Then, initialize a GIT repository, add a simple .gitignore that ignores the .terraform folder and open the folder with your favorite text editor. In my case, I’ll be using Visual Studio Code.

> mkdir terraform_blue_green
> cd terraform_blue_green
> git init
> echo .terraform >> .gitignore
> code .

Initializing the Terraform State

Terraform stores the state of the infrastructure in a JSON File. It is recommended (and required for this tutorial) to store that file on an external backend like Amazon S3. As I’m using AWS for this Tutorial, I’ll stick to S3, but Terraform supports the equivalent in each provider.

First of all, you need to create the S3 bucket in which the state will reside. You can do this either by going to the S3 Console or by doing:

> aws s3api create-bucket --bucket terraform-bluegreen

Then, create a file named bootstrap.tf inside the project folder, with this content.

In this file we have defined

  • The version of the Infrastructure
  • The Cloud provider we will be using (in this case, AWS)
  • The Backend in which the state will be saved (in this case, S3), and the configuration attached to it.

With that file in place, run this command in your project folder:

> terraform init
Initializing the backend...
Successfully configured the backend "s3"! Terraform will automatically
use this backend unless the backend configuration changes.
Initializing provider plugins...
- Checking for available provider plugins on https://releases.hashicorp.com...
- Downloading plugin for provider "aws" (1.11.0)...
Terraform has been successfully initialized!

Using existing resources in Terraform

As we’ll need the ID of the previously created VPC to do anything in our infrastructure, we will be storing it in a variable. To do this, create a file named vpc.tf with this content:

Creating our first resource: Subnets

To do anything useful, we first need subnets. We will create three of them, each in a different Availability Zones. Create a file named subnets.tf, with this content:

In this file we created three subnets specifying:

  • Count: The number of Subnets we want to create
  • Availability zone: In this case we are using the element() function which function takes a list and an index and returns the element, even if the index is greater than the number of elements. This is useful to assign a different availability zone to each subnet.
  • VPC ID
  • CIDR Block: This is probably the most confusing part. We interpolated the previously defined infrastructure_version variable into the CIDR block. This will help in the future, when creating the second version. You may change the CIDR Block with the one you defined in the VPC.
  • Assign a Public IP by default to any Network Interface assigned to this subnet
  • Name: We’ve appended the Infrastructure version into it

With the file in place, first do:

> terraform plan
+ aws_subnet.terraform-blue-green[0]
...
+ aws_subnet.terraform-blue-green[1]
...
+ aws_subnet.terraform-blue-green[2]
...
Plan: 3 to add, 0 to change, 0 to destroy.

The plan command does a dry run and tells you what changes will be done. It is important to plan before doing anything, as you can spot errors. In this case, the plan tells us that it will add three subnets, and that’s what we wanted.

So now we can run this:

> terraform apply
Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.
Enter a value: yes
aws_subnet.terraform-blue-green[0]: Creating...
...
aws_subnet.terraform-blue-green[1]: Creating...
...
aws_subnet.terraform-blue-green[2]: Creating...
...
aws_subnet.terraform-blue-green[0]: Creation complete after 5s
aws_subnet.terraform-blue-green[1]: Creation complete after 5s
aws_subnet.terraform-blue-green[2]: Creation complete after 5s
Apply complete! Resources: 3 added, 0 changed, 0 destroyed.

Now, you can go to the AWS Console (under the VPC/Subnets section) and your subnets should appear

Creating a Security Group

To be able to access our resources in the future, we need to create a Security Group in our VPC. For the sake of simplicity, we will be creating a Security Group that enables all inbound traffic from everywhere.

Create a file named security_groups.tf in your project, with this content:

In this file, we’ve created a security group for our VPC (using its VPC ID) and two Rules: One for Inbound traffic and one for Outbound traffic. The most important parts are:

  • From/To Port: The port range the rule applies for. In this case, we target all possible port ranges
  • Protocol: You can use either HTTP, TCP or “-1”, which applies for both TCP and HTTP
  • CIDR Blocks: A list of CIDR blocks that are enabled by the rule. In our case, we enabled all ipv4 traffic.

With the file in place, run a terraform plan and terraform apply. After that, we should be able to see our security group in the AWS Console (under EC2/Security Groups).

Creating an SSH Key

To be able to access an AWS Instance later in the future, we need to assign an SSH Key to it.

First, create a Key Pair by using ssh-keygen:

> mkdir keypairs
> ssh-keygen -f keypairs/keypair -P ""
Generating public/private rsa key pair.
Your identification has been saved in keypairs/keypair.
Your public key has been saved in keypairs/keypair.pub.

Given this is a tutorial, don’t bother moving the Private Key to a secure place (but you should definitely do it).

Then, create a file named keypairs.tf in the root folder of the project. Give it this content:

Then do:

> terraform plan
+ aws_key_pair.key_pair
...
Plan: 1 to add, 0 to change, 0 to destroy.
> terraform apply
Terraform will perform the following actions:
+ aws_key_pair.terraform-blue-green
...
Plan: 1 to add, 0 to change, 0 to destroy.
Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.
Enter a value: yes
Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

Now, the Key appears on the AWS Console (under EC2/Key Pairs)

Creating (at last!) EC2 Instances

Create a file named instances.tf, and paste the following:

Let’s explain a little bit about this file:

We have created a resource of type aws_instance, with this parameters:

  • Count: The number of resources of this type. In this case, we will create 3 instances
  • AMI: The Amazon Image for the instances. In this case, we chose the Official AWS ECS Image, as we will run a docker container Inside of it. Keep in mind that this AMI only works in the US-WEST-2 region, so check this link if you are in another region.
  • Instance Type: The instance type of the instances
  • Subnet Id: As explained above, the element() function takes a list and an index and returns the element, even if the index is greater than the number of elements. This is allows us to assign a different subnet id to each instance.
  • Key Name: The name of the key pair. We chose the previously created one
  • User Data: This allows us to assign an initialization script to the instance. In our case, we are running an NGINX Docker Container and exposing it in the port 80. There are better ways to define User Data Scripts, but we’ll keep it simple for now.

With the file in place, run:

> terraform plan
Terraform will perform the following actions:
+ aws_instance.terraform-blue-green[0]
...
+ aws_instance.terraform-blue-green[1]
...
+ aws_instance.terraform-blue-green[2]
...
Plan: 3 to add, 0 to change, 0 to destroy.
> terraform apply
Terraform will perform the following actions:
+ aws_instance.terraform-blue-green[0]
...
+ aws_instance.terraform-blue-green[1]
...
+ aws_instance.terraform-blue-green[2]
...
Plan: 3 to add, 0 to change, 0 to destroy.
Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.
Enter a value: yes
....
Apply complete! Resources: 3 added, 0 changed, 0 destroyed.
Outputs:
instance_public_ips = [
ip1,
ip2,
ip3
]

When the command finishes, you will be able to see the instances in your AWS Console

As you see, every of them are on different availability zones.

Accessing any of your instances (via their public IP) from a browser should display:

Adding a Load Balancer

Create a file named load_balancers.tf with this conent:

In this file we’ve created a Load Balancer with

  • Name: Self explanatory
  • Subnets: The subnets the Load Balancer is Available In
  • Security Groups: We’ve added the previously created Security Group to be able to access it
  • Instances: We’ve added the previously created instances
  • Listeners: We’ve added a single listener that listens in the 80 port of the load balancer, and points to the port 80 of the instances
  • Healthcheck: We’ve added a simple HTTP Healthcheck that targets the port 80 of the instance.
  • Outputs: We’ve added an output that displays the Load Balancer Public DNS

With the file in place, do a terraform plan and a terraform apply. When the execution ends, it should output the Load Balancer’s public dns.

> terraform apply
....
Outputs:
...
load_balancer_dns = terraform-blue-green-v1-xxxxxx.us-west-2.elb.amazonaws.com

Accessing the public DNS of the Load Balancer from a browser should display the NGINX page.

Yes, I’ve used the same screenshot twice

You should also be able to see it in the AWS Console (under EC2/Load Balancers)

(Optional) Assign A DNS Record to the Load Balancer V1

I’m not going to cover too much of this case, but what I’ve ended up doing in production is creating a DNS Record that points to a specific version of the Load Balancer. An example of this in terraform could be:

Commit your changes

Commit your changes so far

> git add .
> git commit -m "Version 1"

Manually Pointing a DNS record to the Load Balancer

DevOps is not all about Automation. In some cases, it’s a good practice to have a minimal human interaction. In our case, we will assign a DNS record to the desired version of the infrastructure (via the load balancer).

To be able to perform this step you’ll need to have a registered domain and the corresponding Route 53 Hosted Zone.

Enter the desired Hosted Zone and create an A Record with an Alias of the previously created load balancer (terraform-blue-green-v1…).

This is the entry point of your system and what your clients will be accessing.

Creating the Infrastructure V2

First, create a new branch in your repository (and I seriously recommend removing the .terraform folder):

> git checkout -b v2
> rm -rf .terraform

Now, modify bootstrap.tf with this:

As you see, you need to modify both the infrastructure_version variable and the key of the S3 Bucket. I’ll be nice if terraform allowed to interpolate the infrastructure_version variable in the key, but for now it’s not possible. There is an issue in Github though.

Now, as you deleted the .terraform folder, you need to reinitialize the state:

> terraform init

Now modify your instances.tf with this content:

(We’ve changed the instance size from t2.micro to t2.medium. You can chose whatever you like)

Doing a terraform plan will reveal that in fact, terraform will create all resources again.

> terraform plan
...
Plan: 11 to add, 0 to change, 0 to destroy.

After doing terraform apply, you should end with an entire new infrastructure, without changing the old one.

Instances
Subnets
Security Groups
Load Balancers

Routing traffic through the new Infrastructure

As we did previously with Version 1, point your DNS record to the new load balancer using an ALIAS.

Deleting the old infrastructure

When all traffic starts going to the new Load Balancer, it’s time to delete the Version 1 of the infrastructure.

To do this, first commit all the changes in Version 2, and then checkout the old version again. Delete the .terraform folder and initialize the state again.

> git add .
> git commit -m "Version 2"
> git checkout master
> rm -rf .terraform
> terraform init

Then, simply do:

> terraform destroy
Terraform will perform the following actions:
- aws_elb.terraform-blue-green
- aws_instance.terraform-blue-green[0]
- aws_instance.terraform-blue-green[1]
- aws_instance.terraform-blue-green[2]
- aws_key_pair.terraform-blue-green
- aws_security_group.terraform-blue-green
- aws_security_group_rule.terraform-blue-green-inbound
- aws_security_group_rule.terraform-blue-green-outbound
- aws_subnet.terraform-blue-green[0]
- aws_subnet.terraform-blue-green[1]
- aws_subnet.terraform-blue-green[2]
Plan: 0 to add, 0 to change, 11 to destroy.
Do you really want to destroy?
Terraform will destroy all your managed infrastructure, as shown above.
There is no undo. Only 'yes' will be accepted to confirm.
Enter a value: yes
...
Destroy complete! Resources: 11 destroyed.

Now, if you go to the AWS console you should see only the V2 resources. For example, this is a screenshot of the Instances after destroying the Version 1:

You can now merge the v2 branch into master. If you are doing this just for fun, please do a terraform destroy for v2 ;)

Remember that you can see this full example on https://github.com/santiagopoli/terraform-examples/tree/master/blue-green

A note of caution

In this guide, we used a DNS Record to select which infrastructure version is the production one. While this works most of the time, there are some cases when some client-side libraries cache DNS Entries, so you should wait some time to get the traffic to drain from the old balancer. You can solve this by maintaining a manually-created load balancer and changing its instances.

Conclusion

Terraform provides a clean and declarative way of defining Infrastrucure as Code. Thanks to that, we can use it to perform things that seemed impossible a few years ago.

I want to end this article by saying that this approach has a couple of downsides, and I will write an article in the future explaining how to achieve the same results by using Terraform Modules (those in fact provide better flexibility overall).

Thanks for reading!

More by Santiago Ignacio Poli

Topics of interest

More Related Stories