✨ an AWS experiment ✨ Motivation We started a big project few months ago: . migrate our application to GraphQL With and with many nested ressources, the maintainability and performance of the application were getting worse every month. 300+ React Component 20 API endpoints Once this project finished, we wanted to ensure that GraphQL fulfil his promises, so we decided to track migrated components “perceived performance”. We wanted to watch the performance of our GraphQL Ruby API over time and avoid UX/performance regression. Considering that we already use a lot of external SASS (NewRelic, Cloudinary, AWS, Segment, OAuth0, …), instead of building a full AWS data ELK stack, we decided to build a simple and low-cost performance “dashboard”. Since our Segment account save all raw data in a S3 bucket, we decided to use this data by sending a custom event . Perfomance.QueryLoadTime Note: is a SASS Analytics API that provide integrations with 200+ services like Hubspot, custom webhooks or AWS. Segment The Architecture Of course, Segment can easily be replaced by AWS and AWS Kinesis Firehose API Gateway The front React application is sending a custom event to Segment each time a specific GraphQL request end.The event is then stored in S3 as JSON raw data files. Perfomance.QueryLoadTime Then, every week, the data is fetched from S3 and consolidated, ready to be used in Google Sheets. Let’s see how to make it work out ! Send performance data — setup Apollo with Segment is a GraphQL client compatible with React Apollo Data For this, we need to write a that intercept some filtered queries and send performance data to Segment. custom link You’ll see below a service.Don’t worry, this is our Segment client wrapper. JasonBourne It’s all.All you have to do is to add this to your instance. link ApolloClient export const client = new ApolloClient({link: ApolloLink.from([perfMonitorLink, networkLink]),cache: apolloCache}); You’re all set for the front part. Build consolidated data — Setup AWS Athena Athena is a service of Amazon that allow to run SQL queries against S3 files. Configuring Athena is quite easy, you need to: create a database create a table by specifying a source and the data structure create a named query and save it The data Here is the raw data structure that Segment store to S3(for the event) Perfomance.QueryLoadTime You’ll notice that many field are not relevant for data analysis, so we’re gonna only keep a subset. Create a database and a table The “Create table” UI available on AWS Athena do not allow to create table with complex data structure (nested field, union type field, etc), so we need to write a create table query, by hand ✍️ Note: Athena use a language to describe data, here is the wanted query : Apache HIVE CREATE TABLE The query expose the data format (JSON), data structure, the source and the destination To do so, we can use a wonderful tool called .This is an open source project — recommended by AWS official documentation — to generate HIVE query from JSON raw data. [hive-json-schema](https://github.com/quux00/hive-json-schema) In short, given a JSON example file, it generate a corresponding query. CREATE TABLE , it’s really simple (take 5 minutes maximum). Read the doc Write and save a SELECT Query The table is now ready, we can write a query.Here’s ours: SELECT SELECTproperties.ellapsedMs, properties.operationName, sentAtFROM "segment"."perf" WHEREevent = 'Performance.QueryLoadTime'ORDER BY sentAt DESC; Since it’s plain SQL, you can select all the field available on this table. Run the query and save it. NB: each time a query is executed, Athena store the CSV result on your S3.To know where (which bucket), go to Settings. Problem Athena do not propose to run a saved query periodically.However, we can use the AWS Athena API to run a query remotely.Next step, the lambda. Keep data fresh — Setup a periodic AWS Lambda AWS Lambda is a service that allow to create . serverless functions Serverless architectures refer to applications that significantly depend […] on custom code that’s run in ephemeral containers (Function as a Service or “FaaS”)[…] . By using these ideas, […], such architectures remove the need for the traditional ‘always on’ server system sitting behind an application. Depending on the circumstances, such systems can significantly reduce operational cost and complexity at a cost of vendor dependencies and (at the moment) immaturity of supporting services. Martin Fowler — Serverless Architectures We don’t want to go on AWS Athena UI every week to run the saved query manually, so we need a Lambda that run our query very week. Here’s how to do so: Create a periodic lambda Ensure lambda have sufficient rights to call Athena and store results to S3 Create a “periodic lambda” Go to “Create Function” and select “Node 6.10” and “Create a custom role”.You’ll be redirect to , click on “Allow”. AWS IAM Please write down the name given to the IAM role, it will be used later. Then click on “Create Function”. Select a trigger You’ll arrive a page like this. In order to run, a lambda need to be invoked.We want to invoke our lambda on a weekly basis, like a CRON task. offer this feature. AWS Cloudwatch On the left, select “CloudWatch Events”, then use this configuration : The lambda will be invoked by a Cloudwatch event every 7 days. The lambda IAM role This is the nifty part, you’ll need to go to IAM and find the role you just created. Then ensure that the role has the following permissions: : , Athena GetNamedQuery StartQueryExecution : , , , S3 ListBucket CreateBucket PutObject ListAllMyBuckets This will allow the lambda to find run a query and store it results on S3. NB: I highly encourage you to dig in the AWS IAM documentation in order to understand all the implications. The lambda code The lambda must : Get the named query by id Start the query Terminate lambda execution NB: To get your Athena Query ID, open the query from Athena “Saved Queries” and copy the id from the URL Displaying data — Configure Google Spreadsheet Now the consolidated data is fresh and available, we need to : get the consolidated data format date field — if any build a pivot table build a chart Copy the content of the last CSV file created by AWS Athena You’ll find the file in the bucket specified in the Lambda source code. Remember, we specified a custom output location to the AWS Athena start query call. Copy the content of the file, and paste it in a new sheet. Format data In case you’re dealing with date field, here’s a trick. Google Spreadsheet do not understand the ISO date format, we need to “format” it.Here is the formula, apply it to a whole new column for each needed field. = DATEVALUE(MID(C2,1,10)) + TIMEVALUE(MID(C2,12,8)) ⤴️ Here you’ll get a “MM/DD/YYYY HH:MM:SS” format Remember to change the column format to “Date time” Build the “pivot table” The consolidated CSV data is no yet usable.Since the data use many dimensions (sentAt, operationName) for a single value (elapsedMs), we need to build a “matrix/pivot table”. For this, create a new sheet and go to “Data>Pivot table …” Here is an example of pivot table configuration, it’s very “data-specific”. Configure the chart The configuration of the chart is very personal and depend on what type of data you have, here’s an example of time-based performance data. My favorite chart type for time based performance data is ⤵️ Scatter chart Or the classic Smooth line chart with average valuesinstead of detailed values ⤵️ This helps us to see the trend and maximum values in a glance ✨ Conclusion Pros only pay for Lambda execution and Athena query once a week very flexible configuration geeky 🤓 Cons Solutions already exists : ELK, AWS Quicksight, Tableau Copy-paste data every week not configurable for a non-tech Future and improvements 📈 CVS file data pasting step This step is hard to automate easily because AWS Athena create a unique .csv file in S3 each time a query end.A workaround could be to do 2 things : update the lambda to save the result of the query in a file named and on S3. latest.csv save it publicly Then use awesome function in Google Spreadsheet IMPORTDATA() This way, the data will always be the freshest one! Support for many queries We may want to create many Athena queries (examples: BI or advanced cross analytics-tools reporting) For this we can update the Lambda function to update all or many queries. What about Tableau, AWS Quicksight or Datadog ? Of course, there is a lot of battle-tested and professional solutions.This blogpost expose a solution to a specific context with a particular financial and technologic constraints.We could totally configure Datadog or pay a Tableau licence, but it would not be that fun! This is of course an experiment and temporary solution. Thanks for reading! 🌞 I hope you learnt some things about AWS or GraphQL.Please feel free to drop a comment if i missed anything!