Analyze your Amazon Web Services (AWS) Bills w/Google BigQuery & Data Studio

Why go outside when you are stuck inside writing Macros for a 12GB spreadsheet trying to figure out why the EBS volumes in your R&D environment that are not in use are costing you $74.95/month? Before you head down the path of evaluating cost analysis tools (to the tune of a couple percent of your monthly spend) or even crazier try to build it yourself: let’s give Google’s BigQuery and Data Studio a try. AWS By the end of this tutorial you will build an interactive report with Data Studio: Google Data Studio Report To preface, I have already written about analyzing Google Cloud Platform (GCP) bills . This will go deeper, and walk through the setup and configuration necessary to accomplish this multi-cloud task (you can then tell your CTO that you are going multi-cloud too!). here full Let’s get one big question out of the way: The short answer is you (some conditions apply, see cost summary at the bottom) on AWS as you can on GCP. For example, the cheapest Redshift instance is $180/m and you need to go out and buy a separate BI tool (even the simplest, AWS Quicksight is $9-18/user/month + $0.25–0.38/GB/month). This solution will be far cheaper and more extensible for most use cases and we will dive into the details of that statement below. Why are you not doing this w/X on AWS? cannot do it as cheaply Data Flow The data flow will look like this: S3 Billing Bucket: The bucket where your detailed billing reports are written. Transfer Service: Hands-off Storage Transfer service offered by Google to scheduler S3 > GCS bucket replication. Unzip: Google Cloud Function to unzip the replicated zip’d detailed billing reports. Cloud Storage: Google’s S3-equivalent where the detailed billing reports will be copied to by the Transfer Service. AppEngine: Cron entry that will pull from Cloud Storage and load the data into BigQuery daily. BigQuery: The data warehouse for your billing data at Google. Data Studio: The interactive BI tool to analyze the data in BigQuery. Do not worry if you haven’t used these tools before. I promise this will be fairly painless to learn (at least the basics and implementation). initial This is somewhat overdone to be able to purposely use many tools available on GCP. If you were to set this up “for real” you may choose to remove some of the components here. Then again, the value in using services such as the Transfer Service, Cloud Functions, and KMS is that you will likely have use cases for them in the future and becoming familiar with their APIs and capabilities at a smaller scale could be useful. Also, I do not see enough tutorials on using many of these services so this is my attempt to add one more. Assumptions This tutorial makes the following assumptions: You have an AWS account setup. You have Detailed Billing enabled on AWS. You have a Google Cloud account setup. Re: #1, you probably wouldn’t be here if you didn’t have an AWS account setup but nonetheless if you go to and find the button; you can create an account. Once you are spending money, this tutorial could potentially be useful. https://aws.amazon.com/ Create an AWS Account Re: #2, if you do not have Cost and Usage reports enabled (aka. Billing Reports, aka. Detailed Billing), follow the guide from AWS . Make sure under that you are configured as follows (substituting your billing bucket name): here Billing > Preferences > Receive Billing Reports Billing Preference Panel For this tutorial we will be using the, . The other checkboxes are not strictly needed but may be useful to you. Detailed billing report with resources and tags Enabling these checkboxes because new objects will be written to S3 by AWS. How much money depends on how big your bill is which is a function of and most critically, . If you have hundreds, thousands, or millions of tags, it could . Pay attention to how and why you tag things: a topic for another time. costs money service utilization tags potentially be significant Re: #3, if you don’t have a Google Cloud account setup, head over to and find the button; you can sign-in with your Google Mail account which you likely already have. Create a new Project and call it AWS Bill Analyzer (or something similar). https://cloud.google.com/ Try It Free S3 Billing Bucket S3 Billing Bucket Take note of your S3 Billing Bucket name as defined previously. We will need to setup a user and permissions (via IAM) to be utilized by Google’s Cloud Transfer service that will replicate the bucket content to GCS. This is a good time to setup the AWS CLI tools if you have not done this yet. The setup documentation is here: . Once completed, you should be able to list your buckets: https://aws.amazon.com/cli/ And you should also be able to list your files in the bucket: The file with the longest name is the one we will be caring about today. User & Permissions Setup AWS IAM & Google IAM There are two “users” we will need to setup. AWS IAM User to allow pulling of Billing Files on S3 from Google’s Cloud Transfer Service. Google Key Management Service (KMS) Service Account to allow storing/fetching of AWS credentials from #1 for potential use later on. We will use the AWS CLI and Google Cloud CLI to perform both of these tasks. AWS User Setup Let’s create the user, create an access key / secret key for API access, a group for the user (if one doesn’t exist) and add the user to it the group. Finally, we will add a to the in-turn granting the new user the permissions required. policy group We need to create a basic policy that grants and access. I am sure this could be further locked down but this will suffice for now: Get List Save this file locally, to something like and run the following command to create a : billing_ro_access.json Managed Policy Why did we create a policy and not an policy with the put-group-policy command? A managed policy is versioned (and thus roll back capabilities), allows delegation, conforms with AWS standards, are reusable and will be easier to manage in the future. I highly recommend at this point. Read more . managed inline only using managed policies here And lastly, add the policy to the group and list the policies attached to the group to confirm (using the “Arn” from the previous command): To test that everything was done properly you can add the users access key and secret access key as a profile to your local CLI to test that it has the necessary access. You can append the necessary profile and of course remove that from your history: And perform the test: Now, for Google credentials. Google Service Account Setup If you have not installed the Google Cloud SDK, install it from . https://cloud.google.com/sdk/downloads Once completed, run through the following to create a service account, a key file (JSON), a keyring and cryptokey (for KMS), and enable it to encrypt/decrypt credentials. Make sure to change the project, service account, keyring, and cryptokey to your preferred names. User & Permissions Setup Summary We created a user, group, and managed policy on AWS that enables a user with the API keys (that we also created) to get and list our AWS bills on S3. We created a service account and necessary policy on Google KMS to encrypt/decrypt the API keys for our AWS account. In the real world you would likely separate the encrypt and decrypt functionality for separation of privileges but in this case I am one person with one computer to protect. The last step is to do the encryption of our AWS API keys and store the blobs in Google Cloud Storage for Google KMS to be able to decrypt them. Google KMS Google’s Key Management Service allows us to securely store secrets and fetch them as needed. The details of this service are and the cost is $0.06/m for active key versions, and $0.03 per 10,000 operations. This will be a total of $0.09/m: seems like a reasonable cost to store secrets. here I don’t know why but Amazon charges an order of magnitude more at $1.00 for the CMK ( ) and $1.00/m for each yearly rotated CMK which seems exorbitant (+ usage). details To make the encryption easy, clone the Python samples for GCP (which will be useful in other contexts) from . here Hop into the directory and create a file with the following content: kms > api-client doit.sh Before you can run it, you need to set the location of the JSON key file that you generated for the service account that is authorized to encrypt and decrypt key materials: export GOOGLE_APPLICATION_CREDENTIALS= /key.json Filling in the AWS Keys that you saved early and run it now: We will now store the encrypted keys in Google Cloud Storage to make it easy to fetch later from App Engine. Google Cloud Storage Transfer Service Wo, what is this ( )? This service from Google allows us to setup a mirror’ing job for our S3 Billing Bucket (or any S3 Bucket). By setting this up, GCS will have a copy of everything in our S3 Billing Bucket making it very easy to import into BigQuery. docs You could have an intermediary step of copying over to Google Cloud and uploading to BigQuery but not if you want to use a scheduled task (cron) on AppEngine. The AppEngine Standard Environment doesn’t have a local disk so you would have to do something else (maybe streaming or static instances/disks) which starts to create far more infrastructure and general complication than is needed for this tutorial. There are two ways to configure the mirror’ing job: the UI and of course the API. Since I am generally allergic to clicking buttons, I will show how to do this via the API. To use this service, we first need to get Google Service Account used with GCSTS. To do this, copy/paste the Python snippet from or use this slightly modified one that takes the projectId as an argument: here If you run the script with your projectId, it will return the Google Service Account that you need: With this email we will enable permissions in Google IAM to a new bucket that we will create just for the purpose of being a Data Sink. We are creating a separate bucket than is used by our keys (via KMS) and potentially other metadata in the future. Writer Create the bucket and enable the Google Service Account for Storage Transfer to write to it: The last command checks the current ACLs and should list the Google Service Account that you added. To setup the transfer job, copy the code from (Python) and run it as such: here This will create a job that will allow us to test but to make it last forever, you would change the in the job and update the script as follows to take additional parameters: one-time scheduleEndDate Once you have your job submitted, you probably want to check on it. This can be done with the script in the same aforementioned documentation (bottom of the ). transfer_check.py page Give it the “name” from the output of the original request: You will likely get an empty response on the status for several hours as I did on the first job. I am not sure why this is the case but I am guessing that these are relatively low priority non-peak jobs that are scheduled as capacity is available. Then again, I may have misunderstood some part of the API as it is a little confusing regarding start/end dates ( ). Then again, you set this up once and you are done forever so if the API is a little icky, whatever. Google Cloud: you could really improve this API and the docs surrounding it, date specification in US Pacific? Weird cases of execution 24hrs after creation depending on whether you set the date in the Past? What? The second response shown above is after several hours showing that . “done”: true Google Cloud Function Google Cloud Function To be able to load the detailed bills from AWS into BigQuery we will need to unzip them. Technically, you can load gzip-compressed files but the files from AWS are zip format. Since BigQuery will load uncompressed files faster ( ) we will uncompress the files once transferred with a Cloud Function (e.g. AWS Lambda but on Google and more sane). doc This would not be a good approach if your bills are huge for lot’s of reasons but for “reasonably” sized bills this is a perfectly fine and cheap (likely free) approach for most folks. If you don’t have the Beta components installed locally, now would be a good time to make sure you have them and they are up to date since Cloud Functions are in Beta (Google labels many things with “Beta” when they have been out for years). gcloud components update &&gcloud components install beta Following the tutorial steps (which I will also include here), let’s get a basic function setup and then we will update the code to do what we need. here You may need to enable the various APIs (and billing) through the Console if you have not used these before. Once done, you can continue. Create two GCS buckets, one to stage your Function and one to upload a test file: gsutil mb gs://functions-awsbills-staginggsutil mb gs://functions-awsbills-test Setup a new local directory for the project: mkdir awsbills-functioncd !$ And create an that contains the content from their tutorial: index.js /*** Background Cloud Function to be triggered by Cloud Storage.** {object} event The Cloud Functions event.* {function} The callback function.*/exports.helloGCS = function (event, callback) {const file = event.data; @param @param if (file.resourceState === 'not_exists') {console.log(`File ${file.name} deleted.`);} else if (file.metageneration === 1) {// metageneration attribute is updated on metadata changes.// on create value is 1console.log(`File ${file.name} uploaded.`);} else {console.log(`File ${file.name} metadata updated.`);} callback();}; Deploy it: gcloud beta functions deploy helloGCS --stage-bucket functions-awsbills-staging --trigger-bucket functions-awsbills-test Create a local test file and upload it to test the function: touch testfile.txtgsutil cp testfile.txt gs://functions-awsbills-test And check the logs: gcloud beta functions logs read --limit 50 If this is what it looks like for you, excellent! You have a working cloud function. Now, to spruce this up a bit for our use case. You can delete it if you would like: gcloud beta functions delete helloGCS One thing you might notice if you have ever used AWS Lambda is how much less painful this was than working with it. More on this in a later section. Requirements: Use aforementioned bucket, in the case of the tutorial our data sink bucket is: . aws-bill-analyzer-data-sink Decompress the zip file of the specific pattern we care about (no need to unzip any others). Write the decompressed file back to the bucket (we may want to copy it to another bucket later for other reasons). Our code will be pretty simple: Upload the function and test it! gcloud beta functions deploy decompresserator --stage-bucket functions-awsbills-staging --trigger-bucket functions-awsbills-test If a file gets passed to it that we don’t want to process, the logs will show the following: And then you can test with a real file: gsutil cp 3256XXXXXXXX-aws-billing-detailed-line-items-with-resources-and-tags-2017-05.csv.zip gs://functions-awsbills-test And it should run as follows: Fantastic, so we have a working Cloud Function. The last step is to configure it for the bucket we care about. decompresserator Cleanup and re-deploy: gcloud beta functions delete decompresserator gcloud beta functions deploy decompresserator --stage-bucket functions-awsbills-staging --trigger-bucket aws-bill-analyzer-data-sink Lastly, let’s kick-off a one-time Storage Transfer to make sure that all is working as expected (see previous section). Assuming that is completed, check your logs and make sure it ran properly in your real-world scenario. The logs should look something like this: This shows the start of an execution and an exit for each non-matching bill (not the one we care about). For the files that do match the pattern, they are unzipped and placed in the Cloud Storage bucket. As a future enhancement, you could modify the function to run on the bills for the current month and 7 days into the next month (just as we will do with the AppEngine function). I have not included that here because I don’t think this approach will scale well and would like folks to really consider a different approach to this problem instead of copy/pasting it. only on purpose Google AppEngine Google AppEngine We will use AppEngine to launch a scheduled task (cron) daily that will load the GCS data (copied via GCSTS) into BigQuery. Since this isn’t a tutorial on AppEngine (see posts and for more info) I will just jump straight to the code. here here We have a : main.py And a helper file to make it easier to read: For the full, updated code, please see the GitHub repo . here There is a bunch happening here but the basics are: If it is the first 7 days of the month, the previous month’s bill is processed in addition to the current month’s bill. Each AWS bill is loaded into BigQuery: first into a temporary table, and then copied to a new user-friendly named table. Running the Code Here is how you get this code running on AppEngine. First, create a to hold your tables: dataset $ bq mk aws_bill_analyzer A Starter BigQuery Schema File is included in the GitHub link above (and defined in code). Some notes about the Schema File: Your schema be different than mine. Adjust the field names to match the first row of your bill (download one and look at it). You may have “BlendedRate” and “BlendedCost” instead of “Rate” and “Cost” for example. Your user-specified tags will also be different. Include each one with “user_ ” as I did with “user_billing”. will You can convert Strings to other data types later. Note the header row and summary row will need to be stripped prior to changing data types otherwise you will get import errors. Doc on tables in BigQuery. here With the additional files from the repository, you should be able to run: $ pip install -t lib -r requirements.txt$ gcloud --project aws-bill-analyzer app deploy$ gcloud --project aws-bill-analyzer app deploy cron.yaml To test it, visit the URL of your project adding of course. It will return if everything has been setup properly. If so, you will need to review the logs on App Engine. /run Completed Google BigQuery We won’t spend much time on this section because it just isn’t as interesting as the next section. BigQuery is quite powerful and has a wonderful Web Console straight from 1998: available at . https://bigquery.cloud.google.com/ If you are up to this step, you should have your dataset and this month’s bill loaded into a table that you can query. BigQuery Console If you select the Preview tab you should be able to see the first 50 rows. Table Preview You could use this interface to test queries but we are not going to do that. If you are interested in this, I would recommend checking out ’s GitHub repo . PowerUpCloud here Google Data Studio Well, you made it here. The big finally. The moment you have been waiting for this entire tutorial! Head over to . https://datastudio.google.com/ Select the new button (+) in the bottom-right. Select CREATE NEW DATA SOURCE in the bottom-right. Select BigQuery. Choose your Project, the Dataset (aws_bill_analyzer), and this month’s table. Choose CONNECT in the top-right. All of the fields should be detected with the two date fields getting the appropriate data format: Connection Review Once reviewed, choose ADD TO REPORT in the top-right and again in the prompt. You have a blank canvas, it is time to paint some cost analytics! Let’s start by doing three things: Give the report a name: AWS Cost Summary Change the Theme to Simple Dark Create your first element, a Cost by Day Time series Name Report Theme Selection Time Series Create a box that covers the top of the page by drag/dropping: Time Series Drawn Select the box we created, select Invalid Metric on the right-column. At the bottom, select CREATE NEW METRIC. In the aggregation column, change Cost to a Sum and in the Type column change it to Number: Select DONE and select Cost now in the Metric picker. You may also want to change the Date/Time dimension to since that is technically more correct. UsageEndDate You should now have a cost by day Time Series for your AWS Bill! You can now start to add other types of charts and aggregations that you see fit. For example, you could start to add Bar charts for : Cost by Product It only take a couple clicks to change as the to something else such as or : ProductName Dimension UsageType AvailabilityZone Adding Area Charts is also pretty useful: You can start to style the different charts by selecting them and choosing STYLE in the right-menu: Once you have a first page configured, you can create more in this same report by selecting in the top-left. Add new page You can give each page a name which is quite useful: Add a Page Including / Excluding Values Let’s say you want to include or exclude some particular type of values, this is easy, just add a . Filter Add a Filter Select the graph you want to add a Filter on and click on the right-menu. Add a filter in the bottom-right. CREATE A FILTER In this menu you can create boolean expressions to include and/or exclude anything that you can come up with. For example, let’s say you had a billing tag for each team and on the Phil Collin’s Team they use at their billing tag, you would only want to include their costs. phil_collins Phil Collin’s Team Filter This filter is now selected and available for all of your charts! Filter Picker If you add this filter to each chart, you can quickly build per-team (or any other dimension) pages and reports. Phil Collins Page One more very cool feature that I will show is the (not shown but there is a similar Date Picker Control). This allows you to publish an interactive report for users. Filter Control Select the Filter Control from the menu bar and add it to the bottom-right of the Executive Summary we created: Filter Control Executive Summary w/Filter Change the Dimension to , and in the menu change it to : AvailabilityZone Style Expandable Filter Control on AZ You now have an Availability Zone filter that we can use to control either single charts on the page or all of them. By default, it will control all of them (which is a sensible default). If you would like your filter to only control a single graph, select the filter control and shift/cmd-select each graph, right-click and choose Group. This grouping will marry the filter control and the graph together. To view this in action, select VIEW in the top-right and once rendered click on your Filter control: Filter Control This control is extremely useful for allowing interactive filtering. You can probably see by this point how extensible and usable Data Studio will be at analyzing your AWS Bills (or any BigQuery dataset honestly) once you figure out what is most important to you. Pricing I said in the opening that you cannot do this as cheaply on AWS (and to clarify, I meant with services). To go a bit deeper on this, let’s prove it out by reviewing all of the costs. managed S3 Cost The S3 cost for the bills you pay either way, so we should not consider them as part of the calculation. Storage The S3 will be your one cost for S3. This has two factors to calculate: 1) the size of your bill, and 2) the size of all of your other data transfer. Using the pricing scenario and a daily bill update from AWS (and thus a daily scheduled job on GCSTS), it would result in the following: Data Transfer Out worst-case > 1GB and < 10TB / month: $0.09/GB/month Daily update from AWS and the pricing scenario: worst-case 100MB bill: $0.27/m1GB bill: $2.70/m10G bill: $27/m100G bill: $270/m (contrived upper-bound) Google Cloud Transfer Service The service itself is free but you will pay for storage in GCS. Assuming the worst-case pricing scenario for 6mos (longer than this is likely not needed IMO): Using 0.026/GB/m for Multi-Regional. 100MB bill * 6mos = 600MB = $0.026/m1GB bill * 6mos = 6GB = $0.156/m10G bill * 6mos = 60GB = $1.56/m100G bill * 6mos = 600GB = $15.60/m I am not going to take into account the uncompressed files that we created a Google Cloud Function to process because Storage is usually not a large enough fraction of the cost of this project to matter (it is real dollars but the larger cost will be AppEngine for most users). I am also not going to take into account the PUT request cost of $0.05/10,000 operations for the same reason. AppEngine We will be using the Standard Environment (see post for more info). here We will assume use of the smallest instance class (configured in job above) along with various egress traffic costs. B1 Instance: $0.05/hr Assuming your jobs will take on the low-order minutes to low-order hours, it would come out to $0.05/day * 30 = . $1.50 Egress Traffic, we are telling BigQuery to load from GCS so we do not incur a cost here is my understanding. We cover the BigQuery costs below. BigQuery BigQuery : Free! Loading data BigQuery : $0.02/GB. Storage 100MB - 1GB bill: $0.02/m10GB bill: $0.20/m100GB bill: $2.00/m BigQuery : $5 per TB (with first 1 TB free). If you query your 100MB dataset daily: free, 1GB dataset daily: free, 10GB dataset daily: free, 100GB dataset daily: $10/month. Your dataset won’t be exact in sizing so if you are going to query it many times a day and you have a double-digit GB bill size on AWS I would recommend going much deeper on this (and all of the costing). Queries Data Studio The cost of Google Data Studio is free! Summary Based on the above math: 100MB bill: There are several answers to this question that I will review. It is a misuse of AWS Lambda (and Functions) in general which was designed for lots of short executions. These are long requests with significant I/O. The same should be argued here for Google Cloud Functions. Lambda does not scale well for this purpose. There is heavy dependence on in a Lambda function if you were pulling the bill locally and then pushing to Google. The is 512MB and you would need to contact AWS if you wanted to increase it. In addition, the default execution duration per request is 300 seconds. This is significant for most folks. The same should be argued here for Google Cloud Functions. There are significant limits in Lambda for this purpose. /tmp default limit If AWS screws up and writes your bill too many times a day you would cause unnecessary churn, cost, and loads into BigQuery. But mistakes never happen, so don’t worry about this. Control. I actually started this tutorial planning to use S3 Event Notifications + Lambda. After writing up how to configure the Role Trust Policy, CLI Commands, and all of the require parameters it was multiple pages and I think I had re-written the AWS documentation. This seems entirely unnecessary. Configuration. As an example, doesn’t this seem unnecessarily complex (I dare you to find this in the AWS docs clearly stated)?: . The AWS Lambda packaging, deployment, and testing story is . As an example, this is the cleanest I could come up with: Deploy / Test Strategy & Cycle Time disgusting I am sure there are tools somewhere outside of AWS to do this but how do you test this without and it? This cycle time is minutes and that is unacceptable. re-deploying re-running You have to create test events and invoke, after that ^^: then And view the logs: No, thanks, I am good. On AppEngine (and Google Cloud Functions) you have a development server / a sane test setup, that you can launch in seconds to execute a scheduled job just as if it were long-running. That seems much more reasonable and sane. The credentials are done with a service account and syntax that is a bit more sane. Thanks I hope this was helpful in some way. Feel free to reach out to me on Twitter . @mediocrity There is some prior art here but not much. Primarily by . It was a good walk-through and the GitHub-published queries are fantastic! It doesn’t automate enough of the pieces (uses CLI tools to publish daily) to my liking and focuses on ReDash vs. Google Data Studio, hence this tutorial. The less tools and infrastructure that I have to maintain (e.g. ReDash on a separate instance), the better/safer we all are! RK Kuppala here Also a shoutout to the alestic which had a similar to that of AWS but their CLI commands were quite helpful in simplifying some of the commands. blog tutorial