14,455 reads

Do As I Say, Not As I Do: Get your EC2 Instance Name Without Breaking Your Infrastructure

by Alexandra JohnsonDecember 29th, 2017

Too Long; Didn't Read

My first attempt to log EC2 instance names to <a href="http://pagerduty.com" target="_blank">PagerDuty</a> and <a href="http://airbrake.io" target="_blank">Airbrake</a> broke most of our infrastructure. I failed to account for unpublished AWS rate limits, and when an unexpected volume of errors caused my code hit those rate limits, insufficient error handling led to an infinite loop when errors were thrown in our exception loggers.

Companies Mentioned

featured image - Do As I Say, Not As I Do: Get your EC2 Instance Name Without Breaking Your Infrastructure

A Step by Step Guide with Python Code Snippets

My first attempt to log EC2 instance names to PagerDuty and Airbrake broke most of our infrastructure. I failed to account for unpublished AWS rate limits, and when an unexpected volume of errors caused my code hit those rate limits, insufficient error handling led to an infinite loop when errors were thrown in our exception loggers.

I hope that this tutorial can save you some of my headache. I’ll walk you through how to use the boto3 Python client to access the name of a running EC2 instance from that instance, and along the way I’ll include caveats and gotchas that will help you avoid some of my mistakes.

Pre-Requisites

Boto3. This tutorial assumes that you are familiar with using AWS’s boto3 Python client, and that you have followed AWS’s instructions to configure your AWS credentials.
Requests, a Python HTTP library.

Get the Instance Id and Region

Most information about the instance is accessible with the boto3 Instance resource. To create that resource, we first need to retrieve the instance id and instance region.

AWS provides Instance Metadata and User Data via the url http://169.254.169.254, which you can request from any running EC2 instance. In particular, we are interested in the Instance Identity Document, which is accessible at http://169.254.169.254/latest/dynamic/instance-identity/document.

import requests

r = requests.get("http://169.254.169.254/latest/dynamic/instance-identity/document")response_json = r.json()region = response_json.get('region')instance_id = response_json.get('instanceId')

If you are not familiar with the requests library, I would recommend checking out Response Status Codes, particularly the raise_for_status function, as a starting point for error handling.

Get the Instance Resource

We can then use the instance id and region to retrieve the boto3 Instance resource.

import boto3

ec2 = boto3.resource('ec2', region_name=region)instance = ec2.Instance(instance_id)

Validate region and instance_id before passing them to boto3

The first step of boto3 error handling is to catch ClientError and BotoCoreError, both found in the botocore.exceptions package.

In my experience, the boto3 client has pretty confusing error handling for invalid or None region or instance ids. In addition to the errors mentioned above, None values in either field will raise the Python built-in ValueError. I would recommend that you do not attempt to use theboto3client if region && instance_id is false.

Get the Name

An instance’s “Name” is really an instance tag with the key “Name”. You can retrieve tags from the instance resource, and filter for Name tags.

tags = instance.tags or []names = [tag.get('Value') for tag in tags if tag.get('Key') == 'Name']name = names[0] if names else None

Because attributes are lazy-loaded, some invalid instance ids throw errors here

According to the boto3 documentation, resource attributes are lazy-loaded, meaning that the first API call is made when the attribute is first accessed. This means that while None or empty strings are validated when creating the ec2.Instance resource, non-empty string ids that are the right type but the wrong value will be validated here, with the first DescribeInstances call. To combat this, you’ll want to attempt to catch the botocore.exceptions Exceptions from the last section.

Gotcha!

From the Open Guide To AWS section on EC2 gotchas and limitations:

❗If the EC2 API itself is a critical dependency of your infrastructure (e.g. for automated server replacement, custom scaling algorithms, etc.) and you are running at a large scale or making many EC2 API calls, make sure that you understand when they might fail (calls to it are rate limited and the limits are not published and subject to change) and code and test against that possibility.

The boto3 client loads information about an instance with the DescribeInstances API call. If you, for instance, make this API call to retrieve the instance name every time you log an error, you could easily hit the DescribeInstances rate limit.

In addition to the error handling mentioned above, you will want to consolidate your calls to the AWS API to avoid hitting the unpublished AWS rate limits. Our solution was to fetch the instance name once at the startup of the API server, and cache the result in a global data structure. Instead of calling the EC2 API every time we need to log an error, we now call it only once when deploying new code to a machine.