[Source: https://www.pexels.com/photo/black-and-white-city-electric-train-electrical-wires-414931/]
I have been looking at tools on how can chaos be introduced into applications (Chaos Engineering) so that I can test whether applications are resilient. As I was exploring different tools, I explored the idea of why can’t I leverage Amazon EC2 Systems Manager suite of tools that are already available in AWS to introduce chaos for the applications.
Amazon EC2 Systems Manager is a collection of capabilities that helps you automate management tasks such as collecting system inventory, applying operating system patches, automating the creation of Amazon Machine Images (AMIs), and configuring operating systems and applications at scale. Systems Manager lets you remotely and securely manage the configuration of your managed instances.
More Info at: http://docs.aws.amazon.com/systems-manager/latest/userguide/what-is-systems-manager.html
With this idea in mind, I started to look at all the different ways that I can introduce chaos and look at what tools can I use that EC2 already has instead of building my own tool.
These are the tools that are part of Amazon Systems Manager that I picked to perform Chaos Engineering
An Amazon EC2 Systems Manager Document defines the actions that Systems Manager performs on your managed instances. Systems Manager includes more than a dozen pre-configured documents that you can use by specifying parameters at runtime. Documents use JavaScript Object Notation (JSON), and they include steps and parameters that you specify. Steps execute in sequential order.
More info at: : http://docs.aws.amazon.com/systems-manager/latest/userguide/sysman-ssm-docs.html
Systems Manager Run Command lets you remotely and securely manage the configuration of your managed instances. A managed instance is any Amazon EC2 instance or on-premises machine in your hybrid environment that has been configured for Systems Manager. Run Command enables you to automate common administrative tasks and perform ad hoc configuration changes at scale. You can use Run Command from the EC2 console, the AWS Command Line Interface, Windows PowerShell, or the AWS SDKs.
More info at: https://aws.amazon.com/ec2/run-command/
Let’s walkthrough the setup that is required for us to run the Chaos Engineering Experiment
To get started with Amazon EC2 Systems Manager, verify prerequisites, configure AWS Identity and Access Management (IAM) roles, and install the SSM Agent on managed instances.
This document talks about how to configure the IAM roles and the installation steps for SSM Agent: http://docs.aws.amazon.com/systems-manager/latest/userguide/systems-manager-setting-up.html
Using the above the steps, install the SSM Agent on Amazon Linux EC2 instances on which you want to perform the Chaos Experiment.
We will create a SSM Document to run our different Chaos Engineering experiments. Let’s start with creating one document to Blackhole an instance on a specified port for a specified amount of time. I will follow up with additional blog posts with different SSM Document templates to perform different Chaos Engineering experiments.
This page talks about how you can create a SSM Document: http://docs.aws.amazon.com/systems-manager/latest/userguide/create-ssm-doc.html
For our use case of Blackhole a port on an instance, you will create the Document using the below information:
Name: Chaos-Blackhole
Document Type: Command
Content:
As you can see above, I using aws:runShellScript
action to execute commands on the instance to Blackhole a port on the instance. Let’s walkthrough the commands that I’m using.
iptables -A INPUT -p tcp — destination-port {{ port }} -j DROP
Adds a iptables rule to drop the packets on the specified port
sleep {{ duration }}
Waits for the specified duration
iptables -D INPUT -p tcp — destination-port {{ port }} -j DROP
Deletes the iptable rule to drop the packets on the specified port
Port and Duration are both parameters to the Document and these parameters are filled with values when the Run Command is executed.
Now that the Systems Manager Agent is installed on an EC2 instance and SSM Document is created it is time to run our Chaos Engineering Experiment, let’s look at how to run the Chaos Blackhole experiment. For our experiment we will install nginx on an EC2 instance which has SSM Agent installed and will run our Chaos Experiment to Blackhole port 80 which nginx uses on the instance. When the experiment is running, we should be unable to access nginx on that instance via browser on port 80. If we are unable to access the port 80 on the instance via browser then it means that the experiment is successful.
yum install nginx -y
service nginx restart
curl [http://localhost](http://localhost)
and see if you get response backNow that the EC2 instance has nginx running on port 80, we can run our Chaos Blackhole experiment and see if our experiment succeeds. For this walkthrough I’m using AWS Management console, however all the steps that I have mentioned in the document can be run using AWS CLI or AWS SDK.
Run Command
Run a command
buttonRun a command
window, filter the available commands to Owned by me
and select Chaos-Blackhole
document
Selecting Chaos-Blackhole Document from the list
Run
button in the bottom to start our experiment
Nginx was not accessible during the execution of the run command
Run Command Execution
Feel free to provide feedback on the approach and ways on how to improve it.