paint-brush
How to Master Real-Time Analytics With AWS: Timestream and Beyondby@ravilaudya
310 reads
310 reads

How to Master Real-Time Analytics With AWS: Timestream and Beyond

by Ravi LaudyaDecember 4th, 2024
Read on Terminal Reader
Read this story w/o Javascript

Too Long; Didn't Read

This article explores how to build a scalable and reliable real-time analytics platform. It covers data ingestion, processing, visualization and querying to transform event data into actionable insights using AWS platform components SQS, Lambda and Timestream.
featured image - How to Master Real-Time Analytics With AWS: Timestream and Beyond
Ravi Laudya HackerNoon profile picture

In my last article, we discussed the exponential growth of events in today’s data-driven world. With so many apps, smart devices, and machines all around us, the amount of data created is massive. We also explored how an orchestration platform can help deliver these events to the right applications. However, delivering events may not be enough for businesses to make an impact.


By analyzing these events to understand the behavior of the users, businesses can serve their customers better by making smarter decisions. A real-time analytics platform can help convert the event’s data into meaningful intelligence.


This article explores how to build a real-time analytics platform using AWS, evaluating possible solutions, and providing a step-by-step guide to implementing a scalable and reliable platform. Building this platform involves three steps: ingesting data, processing, and querying. Real-time analytics often focuses on trends and patterns over time - whether it’s user behavior or system performance.


Time-series data naturally organizes events in sequence, making it easy to analyze the data from moment to moment. Time-series storage aligns perfectly with this need, allowing applications to compute the metrics. AWS offers tools like SQS, Lambda, Timestream, and Quicksight that work seamlessly together to build this platform.


There are three major parts involved in building a real-time analytics platform

  • Data Ingestion: The events published from the applications can flow into AWS via Amazon Kinesis or directly to SQS.


  • Data Processing: AWS Lambda can process the data and write it to Timesream.


  • Visualization & Querying: Custom dashboards or Quicksight can integrate with Timestream for visualization and insights


    The below diagram shows the architecture of the analytics platform


Analytics with AWS Timestream


Timestream

Amazon Timestream, AWS’s time-series database, is designed to meet the challenges of processing and analyzing vast amounts of data efficiently. Timestream is serverless, scalable, and ideal for applications requiring real-time data analytics. Its key features include:


  • Automatic scaling: It can process trillions of events per day, scaling automatically to meet demand.
  • Performance: It offers up to 1000 times faster query performance compared to relational databases.
  • Serverless: It is completely managed by AWS, reducing overhead.
  • Smart Storage Tiers: Optimized storage tiers for recent (in-memory) and historical (magnetic storage) data.
  • SQL query support: It supports SQS queries for performing complex queries, aggregations, and time-series analytics.
  • Integrations: It supports seamless integration with other AWS services.

Implementation

The cloud formation (CFN) template for all resources needed can be found in Github Repo.


  • Data Ingestion

    Services can publish relevant events to the SQS queue, which serves as a message queue for the system. To enhance reliability, a dead-letter queue (DLQ) is configured alongside the primary SQS queue. The events that fail to be processed go to DLQ after the retry threshold. These events further can be used for debugging, handling failures, and investigations.


    Below is a snippet of the CFN template to create the SQS queue and its associated DLQ:


      EventQueueDLQ:
        Description: 'An DLQ queue that receives failed events'
        Type: AWS::SQS::Queue
        Properties:
          FifoQueue: true
          ContentBasedDeduplication: true
          QueueName: !Ref DLQQueueName
          SqsManagedSseEnabled: true
          VisibilityTimeout: 240
    
      EventQueue:
        Description: 'An SQS queue that receives events'
        Type: 'AWS::SQS::Queue'
        Properties:
          QueueName: !Ref SQSQueueName
          FifoQueue: true
          ContentBasedDeduplication: true
          KmsMasterKeyId: alias/aws/sqs
          VisibilityTimeout: 240
          RedrivePolicy:
            deadLetterTargetArn: !Sub ${EventQueueDLQ.Arn}
            maxReceiveCount: 5
    


  • Data processing

    The AWS Lambda function is configured as the data processor, responsible for handling and processing events published to the SQS queue. It pushes the events as metrics to the Timestream database after the processing.


    Below is a snippet of the CFN template for Lambda and its mapping to SQS:


      EventProcessorLambda:
        Type: 'AWS::Lambda::Function'
        Description: 'Lambda function that processes events from the SQS queue and writes to Timestream.'
        Properties:
          FunctionName: !Ref LambdaFunctionName
          Handler: 'index.lambda_handler'
          Role: !GetAtt LambdaExecutionRole.Arn
          Runtime: 'python3.12'
          MemorySize: 1024
          Timeout: 120
          Environment:
            Variables:
              TIMESTREAM_DATABASE_NAME: !Ref EventsDatabaseName
              TIMESTREAM_TABLE_NAME: !Ref EventsTableName
          Code:
            ZipFile: |
              # Lambda function code goes here
    
    
      SQSToLambdaEventSourceMapping:
        Type: 'AWS::Lambda::EventSourceMapping'
        Description: 'Maps the SQS queue as the event source for the Lambda function.'
        Properties:
          BatchSize: 10
          EventSourceArn: !GetAtt EventQueue.Arn
          FunctionName: !GetAtt EventProcessorLambda.Arn
          Enabled: 'True'
    
    


  • Data Store

    Amazon Timestream serves as the primary data store for all events generated across various services. The setup includes:

    • Database: Acts as a logical container for one or more tables
    • Table: Within the database, tables store the actual metrics data


Below is the snippet for the CFN template to create the Timestream database and the table:

 EventsDatabase:
    Description: 'Timestream database to store event data'
    Type: 'AWS::Timestream::Database'
    Properties:
      DatabaseName: !Ref EventsDatabaseName
      KmsKeyId: alias/aws/timestream

  EventsTable:
    Description: 'Timestream table that stores event metrics'
    Type: 'AWS::Timestream::Table'
    DependsOn: EventsDatabase
    Properties:
      DatabaseName: !Ref EventsDatabase
      TableName: !Ref EventsTableName
      RetentionProperties:
        MemoryStoreRetentionPeriodInHours: 72
        MagneticStoreRetentionPeriodInDays: 365


  • Visualization & Querying

    • Query

      Timestream offers a Query console that allows users to run queries against the table, example:

      -- Get the 10 most recent metrics in the past 15 minutes. 
      SELECT * FROM "events-db"."events-metrics" WHERE time between ago(15m) and now() ORDER BY time DESC LIMIT 10 
      


    • Visualization

      AWS provides many out-of-the-box integrations (e.g. QuickSight, Grafana) with Timestream, making it easy to analyze, visualize, and derive insights.

Testing

Services can publish the event in the following format to SQS that triggers the whole processing flow:

{
  "order_id": "test-order-1",
  "customer_id": "test-customer-1",
  "event_type": "order_success",
  "metric_value": 1
}


Conclusion

This architecture offers a simple and efficient way to build a scalable and reliable analytics platform. There are other alternatives depending on specific needs, including AWS Kinesis Streams for event processing, Prometheus for a data store, and S3+Athena for batch processing and analytics.