Analyze your Amazon Web Services (AWS) Bills w/Google BigQuery & Data Studio

Why go outside when you are stuck inside writing Macros for a 12GB spreadsheet trying to figure out why the EBS volumes in your R&D environment that are not in use are costing you $74.95/month?

Before you head down the path of evaluating AWS cost analysis tools (to the tune of a couple percent of your monthly spend) or even crazier try to build it yourself: let’s give Google’s BigQuery and Data Studio a try.

By the end of this tutorial you will build an interactive report with Google Data Studio:

Data Studio Report

To preface, I have already written about analyzing Google Cloud Platform (GCP) bills here. This will go deeper, and walk through the full setup and configuration necessary to accomplish this multi-cloud task (you can then tell your CTO that you are going multi-cloud too!).

Let’s get one big question out of the way: Why are you not doing this w/X on AWS? The short answer is you cannot do it as cheaply (some conditions apply, see cost summary at the bottom) on AWS as you can on GCP. For example, the cheapest Redshift instance is $180/m and you need to go out and buy a separate BI tool (even the simplest, AWS Quicksight is $9-18/user/month + $0.25–0.38/GB/month). This solution will be far cheaper and more extensible for most use cases and we will dive into the details of that statement below.

Data Flow

The data flow will look like this:

S3 Billing Bucket: The bucket where your detailed billing reports are written.
Transfer Service: Hands-off Storage Transfer service offered by Google to scheduler S3 > GCS bucket replication.
Unzip: Google Cloud Function to unzip the replicated zip’d detailed billing reports.
Cloud Storage: Google’s S3-equivalent where the detailed billing reports will be copied to by the Transfer Service.
AppEngine: Cron entry that will pull from Cloud Storage and load the data into BigQuery daily.
BigQuery: The data warehouse for your billing data at Google.
Data Studio: The interactive BI tool to analyze the data in BigQuery.

Do not worry if you haven’t used these tools before. I promise this will be fairly painless to learn (at least the basics and initial implementation).

This is somewhat overdone to be able to purposely use many tools available on GCP. If you were to set this up “for real” you may choose to remove some of the components here. Then again, the value in using services such as the Transfer Service, Cloud Functions, and KMS is that you will likely have use cases for them in the future and becoming familiar with their APIs and capabilities at a smaller scale could be useful. Also, I do not see enough tutorials on using many of these services so this is my attempt to add one more.

Assumptions

This tutorial makes the following assumptions:

You have an AWS account setup.
You have Detailed Billing enabled on AWS.
You have a Google Cloud account setup.

Re: #1, you probably wouldn’t be here if you didn’t have an AWS account setup but nonetheless if you go to https://aws.amazon.com/ and find the Create an AWS Account button; you can create an account. Once you are spending money, this tutorial could potentially be useful.

Re: #2, if you do not have Cost and Usage reports enabled (aka. Billing Reports, aka. Detailed Billing), follow the guide from AWS here. Make sure under Billing > Preferences > Receive Billing Reports that you are configured as follows (substituting your billing bucket name):

Billing Preference Panel

For this tutorial we will be using the, Detailed billing report with resources and tags. The other checkboxes are not strictly needed but may be useful to you.

Enabling these checkboxes costs money because new objects will be written to S3 by AWS. How much money depends on how big your bill is which is a function of service utilization and most critically, tags. If you have hundreds, thousands, or millions of tags, it could potentially be significant. Pay attention to how and why you tag things: a topic for another time.

Re: #3, if you don’t have a Google Cloud account setup, head over to https://cloud.google.com/ and find the Try It Free button; you can sign-in with your Google Mail account which you likely already have. Create a new Project and call it AWS Bill Analyzer (or something similar).

S3 Billing Bucket

Take note of your S3 Billing Bucket name as defined previously. We will need to setup a user and permissions (via IAM) to be utilized by Google’s Cloud Transfer service that will replicate the bucket content to GCS.

This is a good time to setup the AWS CLI tools if you have not done this yet. The setup documentation is here: https://aws.amazon.com/cli/. Once completed, you should be able to list your buckets:

And you should also be able to list your files in the bucket:

The file with the longest name is the one we will be caring about today.

User & Permissions Setup

AWS IAM & Google IAM

There are two “users” we will need to setup.

AWS IAM User to allow pulling of Billing Files on S3 from Google’s Cloud Transfer Service.
Google Key Management Service (KMS) Service Account to allow storing/fetching of AWS credentials from #1 for potential use later on.

We will use the AWS CLI and Google Cloud CLI to perform both of these tasks.

AWS User Setup

Let’s create the user, create an access key / secret key for API access, a group for the user (if one doesn’t exist) and add the user to it the group.

Finally, we will add a policy to the group in-turn granting the new user the permissions required.

We need to create a basic policy that grants Get and List access. I am sure this could be further locked down but this will suffice for now:

Save this file locally, to something like billing_ro_access.json and run the following command to create a Managed Policy:

Why did we create a managed policy and not an inline policy with the put-group-policy command? A managed policy is versioned (and thus roll back capabilities), allows delegation, conforms with AWS standards, are reusable and will be easier to manage in the future. I highly recommend only using managed policies at this point. Read more here.

And lastly, add the policy to the group and list the policies attached to the group to confirm (using the “Arn” from the previous command):

To test that everything was done properly you can add the users access key and secret access key as a profile to your local CLI to test that it has the necessary access.

You can append the necessary profile and of course remove that from your history:

And perform the test:

Now, for Google credentials.

Google Service Account Setup

If you have not installed the Google Cloud SDK, install it from https://cloud.google.com/sdk/downloads.

Once completed, run through the following to create a service account, a key file (JSON), a keyring and cryptokey (for KMS), and enable it to encrypt/decrypt credentials.

Make sure to change the project, service account, keyring, and cryptokey to your preferred names.

User & Permissions Setup Summary

We created a user, group, and managed policy on AWS that enables a user with the API keys (that we also created) to get and list our AWS bills on S3.

We created a service account and necessary policy on Google KMS to encrypt/decrypt the API keys for our AWS account. In the real world you would likely separate the encrypt and decrypt functionality for separation of privileges but in this case I am one person with one computer to protect.

The last step is to do the encryption of our AWS API keys and store the blobs in Google Cloud Storage for Google KMS to be able to decrypt them.

Google KMS

Google’s Key Management Service allows us to securely store secrets and fetch them as needed. The details of this service are here and the cost is $0.06/m for active key versions, and $0.03 per 10,000 operations. This will be a total of $0.09/m: seems like a reasonable cost to store secrets.

I don’t know why but Amazon charges an order of magnitude more at $1.00 for the CMK (details) and $1.00/m for each yearly rotated CMK which seems exorbitant (+ usage).

To make the encryption easy, clone the Python samples for GCP (which will be useful in other contexts) from here.

Hop into the kms > api-client directory and create a doit.sh file with the following content:

Before you can run it, you need to set the location of the JSON key file that you generated for the service account that is authorized to encrypt and decrypt key materials:

export GOOGLE_APPLICATION_CREDENTIALS=<SOME_LOCATION>/key.json

Filling in the AWS Keys that you saved early and run it now:

We will now store the encrypted keys in Google Cloud Storage to make it easy to fetch later from App Engine.

Google Cloud Storage Transfer Service

Wo, what is this (docs)? This service from Google allows us to setup a mirror’ing job for our S3 Billing Bucket (or any S3 Bucket). By setting this up, GCS will have a copy of everything in our S3 Billing Bucket making it very easy to import into BigQuery.

You could have an intermediary step of copying over to Google Cloud and uploading to BigQuery but not if you want to use a scheduled task (cron) on AppEngine. The AppEngine Standard Environment doesn’t have a local disk so you would have to do something else (maybe streaming or static instances/disks) which starts to create far more infrastructure and general complication than is needed for this tutorial.

There are two ways to configure the mirror’ing job: the UI and of course the API. Since I am generally allergic to clicking buttons, I will show how to do this via the API.

To use this service, we first need to get Google Service Account used with GCSTS. To do this, copy/paste the Python snippet from here or use this slightly modified one that takes the projectId as an argument:

If you run the script with your projectId, it will return the Google Service Account that you need:

With this email we will enable Writer permissions in Google IAM to a new bucket that we will create just for the purpose of being a Data Sink. We are creating a separate bucket than is used by our keys (via KMS) and potentially other metadata in the future.

Create the bucket and enable the Google Service Account for Storage Transfer to write to it:

The last command checks the current ACLs and should list the Google Service Account that you added.

To setup the transfer job, copy the code from here (Python) and run it as such:

This will create a one-time job that will allow us to test but to make it last forever, you would change the scheduleEndDate in the job and update the script as follows to take additional parameters:

Once you have your job submitted, you probably want to check on it. This can be done with the transfer_check.py script in the same aforementioned documentation (bottom of the page).

Give it the “name” from the output of the original request:

You will likely get an empty response on the status for several hours as I did on the first job. I am not sure why this is the case but I am guessing that these are relatively low priority non-peak jobs that are scheduled as capacity is available. Then again, I may have misunderstood some part of the API as it is a little confusing regarding start/end dates (Google Cloud: you could really improve this API and the docs surrounding it, date specification in US Pacific? Weird cases of execution 24hrs after creation depending on whether you set the date in the Past? What?). Then again, you set this up once and you are done forever so if the API is a little icky, whatever.

The second response shown above is after several hours showing that “done”: true.

Google Cloud Function

To be able to load the detailed bills from AWS into BigQuery we will need to unzip them. Technically, you can load gzip-compressed files but the files from AWS are zip format. Since BigQuery will load uncompressed files faster (doc) we will uncompress the files once transferred with a Cloud Function (e.g. AWS Lambda but on Google and more sane).

This would not be a good approach if your bills are huge for lot’s of reasons but for “reasonably” sized bills this is a perfectly fine and cheap (likely free) approach for most folks.

If you don’t have the Beta components installed locally, now would be a good time to make sure you have them and they are up to date since Cloud Functions are in Beta (Google labels many things with “Beta” when they have been out for years).

gcloud components update &&gcloud components install beta

Following the tutorial steps here (which I will also include here), let’s get a basic function setup and then we will update the code to do what we need.

You may need to enable the various APIs (and billing) through the Console if you have not used these before. Once done, you can continue.

Create two GCS buckets, one to stage your Function and one to upload a test file:

gsutil mb gs://functions-awsbills-staginggsutil mb gs://functions-awsbills-test

Setup a new local directory for the project:

mkdir awsbills-functioncd !$

And create an index.js that contains the content from their tutorial:

/*** Background Cloud Function to be triggered by Cloud Storage.** @param {object} event The Cloud Functions event.* @param {function} The callback function.*/exports.helloGCS = function (event, callback) {const file = event.data;

if (file.resourceState === 'not_exists') {console.log(`File ${file.name} deleted.`);} else if (file.metageneration === 1) {// metageneration attribute is updated on metadata changes.// on create value is 1console.log(`File ${file.name} uploaded.`);} else {console.log(`File ${file.name} metadata updated.`);}

callback();};

Deploy it:

gcloud beta functions deploy helloGCS --stage-bucket functions-awsbills-staging --trigger-bucket functions-awsbills-test

Create a local test file and upload it to test the function:

touch testfile.txtgsutil cp testfile.txt gs://functions-awsbills-test

And check the logs:

gcloud beta functions logs read --limit 50

If this is what it looks like for you, excellent! You have a working cloud function. Now, to spruce this up a bit for our use case.

You can delete it if you would like:

gcloud beta functions delete helloGCS

One thing you might notice if you have ever used AWS Lambda is how much less painful this was than working with it. More on this in a later section.

Requirements:

Use aforementioned bucket, in the case of the tutorial our data sink bucket is: aws-bill-analyzer-data-sink.
Decompress the zip file of the specific pattern we care about (no need to unzip any others).
Write the decompressed file back to the bucket (we may want to copy it to another bucket later for other reasons).

Our code will be pretty simple:

Upload the function and test it!

gcloud beta functions deploy decompresserator --stage-bucket functions-awsbills-staging --trigger-bucket functions-awsbills-test

If a file gets passed to it that we don’t want to process, the logs will show the following:

And then you can test with a real file:

gsutil cp 3256XXXXXXXX-aws-billing-detailed-line-items-with-resources-and-tags-2017-05.csv.zip gs://functions-awsbills-test

And it should run as follows:

Fantastic, so we have a working decompresserator Cloud Function. The last step is to configure it for the bucket we care about.

Cleanup and re-deploy:

gcloud beta functions delete decompresserator

gcloud beta functions deploy decompresserator --stage-bucket functions-awsbills-staging --trigger-bucket aws-bill-analyzer-data-sink

Lastly, let’s kick-off a one-time Storage Transfer to make sure that all is working as expected (see previous section). Assuming that is completed, check your logs and make sure it ran properly in your real-world scenario.

The logs should look something like this:

This shows the start of an execution and an exit for each non-matching bill (not the one we care about). For the files that do match the pattern, they are unzipped and placed in the Cloud Storage bucket.

As a future enhancement, you could modify the function to only run on the bills for the current month and 7 days into the next month (just as we will do with the AppEngine function). I have not included that here on purpose because I don’t think this approach will scale well and would like folks to really consider a different approach to this problem instead of copy/pasting it.

Google AppEngine

We will use AppEngine to launch a scheduled task (cron) daily that will load the GCS data (copied via GCSTS) into BigQuery.

Since this isn’t a tutorial on AppEngine (see posts here and here for more info) I will just jump straight to the code.

We have a main.py:

And a helper file to make it easier to read:

For the full, updated code, please see the GitHub repo here.

There is a bunch happening here but the basics are:

If it is the first 7 days of the month, the previous month’s bill is processed in addition to the current month’s bill.
Each AWS bill is loaded into BigQuery: first into a temporary table, and then copied to a new user-friendly named table.

Running the Code

Here is how you get this code running on AppEngine.

First, create a dataset to hold your tables:

$ bq mk aws_bill_analyzer

A Starter BigQuery Schema File is included in the GitHub link above (and defined in code).

Some notes about the Schema File:

Your schema will be different than mine. Adjust the field names to match the first row of your bill (download one and look at it). You may have “BlendedRate” and “BlendedCost” instead of “Rate” and “Cost” for example. Your user-specified tags will also be different. Include each one with “user_<TAG>” as I did with “user_billing”.
You can convert Strings to other data types later. Note the header row and summary row will need to be stripped prior to changing data types otherwise you will get import errors.
Doc here on tables in BigQuery.

With the additional files from the repository, you should be able to run:

$ pip install -t lib -r requirements.txt$ gcloud --project aws-bill-analyzer app deploy$ gcloud --project aws-bill-analyzer app deploy cron.yaml

To test it, visit the URL of your project adding /run of course. It will return Completed if everything has been setup properly. If so, you will need to review the logs on App Engine.

Google BigQuery

We won’t spend much time on this section because it just isn’t as interesting as the next section. BigQuery is quite powerful and has a wonderful Web Console straight from 1998: available at https://bigquery.cloud.google.com/.

If you are up to this step, you should have your dataset and this month’s bill loaded into a table that you can query.

BigQuery Console

If you select the Preview tab you should be able to see the first 50 rows.

Table Preview

You could use this interface to test queries but we are not going to do that. If you are interested in this, I would recommend checking out PowerUpCloud’s GitHub repo here.

Google Data Studio

Well, you made it here. The big finally. The moment you have been waiting for this entire tutorial!

Head over to https://datastudio.google.com/.
Select the new button (+) in the bottom-right.
Select CREATE NEW DATA SOURCE in the bottom-right.
Select BigQuery.
Choose your Project, the Dataset (aws_bill_analyzer), and this month’s table.
Choose CONNECT in the top-right.
All of the fields should be detected with the two date fields getting the appropriate data format:

Connection Review

Once reviewed, choose ADD TO REPORT in the top-right and again in the prompt.

You have a blank canvas, it is time to paint some cost analytics!

Let’s start by doing three things:

Give the report a name: AWS Cost Summary
Change the Theme to Simple Dark
Create your first element, a Cost by Day Time series

Name Report

Theme Selection

Time Series

Create a box that covers the top of the page by drag/dropping:

Time Series Drawn

Select the box we created, select Invalid Metric on the right-column.

At the bottom, select CREATE NEW METRIC.

In the aggregation column, change Cost to a Sum and in the Type column change it to Number:

Select DONE and select Cost now in the Metric picker.

You may also want to change the Date/Time dimension to UsageEndDate since that is technically more correct.

You should now have a cost by day Time Series for your AWS Bill!

You can now start to add other types of charts and aggregations that you see fit.

For example, you could start to add Bar charts for Cost by Product:

It only take a couple clicks to change ProductName as the Dimension to something else such as UsageType or AvailabilityZone:

Adding Area Charts is also pretty useful:

You can start to style the different charts by selecting them and choosing STYLE in the right-menu:

Once you have a first page configured, you can create more in this same report by selecting Add new page in the top-left.

You can give each page a name which is quite useful:

Add a Page

Including / Excluding Values

Let’s say you want to include or exclude some particular type of values, this is easy, just add a Filter.

Add a Filter

Select the graph you want to add a Filter on and click Add a filter on the right-menu.

CREATE A FILTER in the bottom-right.

In this menu you can create boolean expressions to include and/or exclude anything that you can come up with.

For example, let’s say you had a billing tag for each team and on the Phil Collin’s Team they use phil_collins at their billing tag, you would only want to include their costs.

Phil Collin’s Team Filter

This filter is now selected and available for all of your charts!

Filter Picker

If you add this filter to each chart, you can quickly build per-team (or any other dimension) pages and reports.

Phil Collins Page

One more very cool feature that I will show is the Filter Control (not shown but there is a similar Date Picker Control). This allows you to publish an interactive report for users.

Select the Filter Control from the menu bar and add it to the bottom-right of the Executive Summary we created:

Filter Control

Executive Summary w/Filter

Change the Dimension to AvailabilityZone, and in the Style menu change it to Expandable:

Filter Control on AZ

You now have an Availability Zone filter that we can use to control either single charts on the page or all of them. By default, it will control all of them (which is a sensible default). If you would like your filter to only control a single graph, select the filter control and shift/cmd-select each graph, right-click and choose Group. This grouping will marry the filter control and the graph together.

To view this in action, select VIEW in the top-right and once rendered click on your Filter control:

Filter Control

This control is extremely useful for allowing interactive filtering.

You can probably see by this point how extensible and usable Data Studio will be at analyzing your AWS Bills (or any BigQuery dataset honestly) once you figure out what is most important to you.

Pricing

I said in the opening that you cannot do this as cheaply on AWS (and to clarify, I meant with managed services). To go a bit deeper on this, let’s prove it out by reviewing all of the costs.

S3 Cost

The S3 Storage cost for the bills you pay either way, so we should not consider them as part of the calculation.

The S3 Data Transfer Out will be your one cost for S3. This has two factors to calculate: 1) the size of your bill, and 2) the size of all of your other data transfer. Using the worst-case pricing scenario and a daily bill update from AWS (and thus a daily scheduled job on GCSTS), it would result in the following:

> 1GB and < 10TB / month: $0.09/GB/month

Daily update from AWS and the worst-case pricing scenario:

100MB bill: $0.27/m1GB bill: $2.70/m10G bill: $27/m100G bill: $270/m (contrived upper-bound)

Google Cloud Transfer Service

The service itself is free but you will pay for storage in GCS.

Assuming the worst-case pricing scenario for 6mos (longer than this is likely not needed IMO):

Using 0.026/GB/m for Multi-Regional.

100MB bill * 6mos = 600MB = $0.026/m1GB bill * 6mos = 6GB = $0.156/m10G bill * 6mos = 60GB = $1.56/m100G bill * 6mos = 600GB = $15.60/m

I am not going to take into account the uncompressed files that we created a Google Cloud Function to process because Storage is usually not a large enough fraction of the cost of this project to matter (it is real dollars but the larger cost will be AppEngine for most users).

I am also not going to take into account the PUT request cost of $0.05/10,000 operations for the same reason.

AppEngine

We will be using the Standard Environment (see post here for more info).

We will assume use of the smallest instance class (configured in job above) along with various egress traffic costs.

B1 Instance: $0.05/hr

Assuming your jobs will take on the low-order minutes to low-order hours, it would come out to $0.05/day * 30 = $1.50.

Egress Traffic, we are telling BigQuery to load from GCS so we do not incur a cost here is my understanding. We cover the BigQuery costs below.

BigQuery

BigQuery Loading data: Free!

BigQuery Storage: $0.02/GB.

100MB - 1GB bill: $0.02/m10GB bill: $0.20/m100GB bill: $2.00/m

BigQuery Queries: $5 per TB (with first 1 TB free). If you query your 100MB dataset daily: free, 1GB dataset daily: free, 10GB dataset daily: free, 100GB dataset daily: $10/month. Your dataset won’t be exact in sizing so if you are going to query it many times a day and you have a double-digit GB bill size on AWS I would recommend going much deeper on this (and all of the costing).

Data Studio

The cost of Google Data Studio is free!

Summary

Based on the above math:

100MB bill: < $2/month1GB bill: < $5/m10GB bill: < $30/m100GB bill: < $300/m (up at this level, the number is far more uncertain due to obvious scaling issues with this solution)

As the bill size grows into double-digit GBs, I would highly recommend piloting this and monitoring costs since there is far more risk related to cost.

Why didn’t I use S3 Event Notifications + Lambda?

<heavy opinion territory>

There are several answers to this question that I will review.

Lambda does not scale well for this purpose. It is a misuse of AWS Lambda (and Functions) in general which was designed for lots of short executions. These are long requests with significant I/O. The same should be argued here for Google Cloud Functions.
There are significant limits in Lambda for this purpose. There is heavy dependence on /tmp in a Lambda function if you were pulling the bill locally and then pushing to Google. The default limit is 512MB and you would need to contact AWS if you wanted to increase it. In addition, the default execution duration per request is 300 seconds. This is significant for most folks. The same should be argued here for Google Cloud Functions.
Control. If AWS screws up and writes your bill too many times a day you would cause unnecessary churn, cost, and loads into BigQuery. But mistakes never happen, so don’t worry about this.
Configuration. I actually started this tutorial planning to use S3 Event Notifications + Lambda. After writing up how to configure the Role Trust Policy, CLI Commands, and all of the require parameters it was multiple pages and I think I had re-written the AWS documentation. This seems entirely unnecessary.

As an example, doesn’t this seem unnecessarily complex (I dare you to find this in the AWS docs clearly stated)?:

Deploy / Test Strategy & Cycle Time. The AWS Lambda packaging, deployment, and testing story is disgusting. As an example, this is the cleanest I could come up with:

I am sure there are tools somewhere outside of AWS to do this but how do you test this without re-deploying and re-running it? This cycle time is minutes and that is unacceptable.

You then have to create test events and invoke, after that ^^:

And view the logs:

No, thanks, I am good.

On AppEngine (and Google Cloud Functions) you have a development server / a sane test setup, that you can launch in seconds to execute a scheduled job just as if it were long-running. That seems much more reasonable and sane. The credentials are done with a service account and syntax that is a bit more sane.

</heavy opinion territory>

Thanks

I hope this was helpful in some way. Feel free to reach out to me on Twitter @mediocrity.

There is some prior art here but not much. Primarily by RK Kuppala here. It was a good walk-through and the GitHub-published queries are fantastic! It doesn’t automate enough of the pieces (uses CLI tools to publish daily) to my liking and focuses on ReDash vs. Google Data Studio, hence this tutorial. The less tools and infrastructure that I have to maintain (e.g. ReDash on a separate instance), the better/safer we all are!

Also a shoutout to the alestic blog which had a similar tutorial to that of AWS but their CLI commands were quite helpful in simplifying some of the commands.