In my last article, we discussed the exponential growth of events in today’s data-driven world. With so many apps, smart devices, and machines all around us, the amount of data created is massive. We also explored how an orchestration platform can help deliver these events to the right applications. However, delivering events may not be enough for businesses to make an impact.
By analyzing these events to understand the behavior of the users, businesses can serve their customers better by making smarter decisions. A real-time analytics platform can help convert the event’s data into meaningful intelligence.
This article explores how to build a real-time analytics platform using AWS, evaluating possible solutions, and providing a step-by-step guide to implementing a scalable and reliable platform. Building this platform involves three steps: ingesting data, processing, and querying. Real-time analytics often focuses on trends and patterns over time - whether it’s user behavior or system performance.
Time-series data naturally organizes events in sequence, making it easy to analyze the data from moment to moment. Time-series storage aligns perfectly with this need, allowing applications to compute the metrics. AWS offers tools like SQS, Lambda, Timestream, and Quicksight that work seamlessly together to build this platform.
There are three major parts involved in building a real-time analytics platform
Visualization & Querying: Custom dashboards or Quicksight can integrate with Timestream for visualization and insights
The below diagram shows the architecture of the analytics platform
Amazon Timestream, AWS’s time-series database, is designed to meet the challenges of processing and analyzing vast amounts of data efficiently. Timestream is serverless, scalable, and ideal for applications requiring real-time data analytics. Its key features include:
The cloud formation (CFN) template for all resources needed can be found in Github Repo.
Data Ingestion
Services can publish relevant events to the SQS queue, which serves as a message queue for the system. To enhance reliability, a dead-letter queue (DLQ) is configured alongside the primary SQS queue. The events that fail to be processed go to DLQ after the retry threshold. These events further can be used for debugging, handling failures, and investigations.
Below is a snippet of the CFN template to create the SQS queue and its associated DLQ:
EventQueueDLQ:
Description: 'An DLQ queue that receives failed events'
Type: AWS::SQS::Queue
Properties:
FifoQueue: true
ContentBasedDeduplication: true
QueueName: !Ref DLQQueueName
SqsManagedSseEnabled: true
VisibilityTimeout: 240
EventQueue:
Description: 'An SQS queue that receives events'
Type: 'AWS::SQS::Queue'
Properties:
QueueName: !Ref SQSQueueName
FifoQueue: true
ContentBasedDeduplication: true
KmsMasterKeyId: alias/aws/sqs
VisibilityTimeout: 240
RedrivePolicy:
deadLetterTargetArn: !Sub ${EventQueueDLQ.Arn}
maxReceiveCount: 5
Data processing
The AWS Lambda function is configured as the data processor, responsible for handling and processing events published to the SQS queue. It pushes the events as metrics to the Timestream database after the processing.
Below is a snippet of the CFN template for Lambda and its mapping to SQS:
EventProcessorLambda:
Type: 'AWS::Lambda::Function'
Description: 'Lambda function that processes events from the SQS queue and writes to Timestream.'
Properties:
FunctionName: !Ref LambdaFunctionName
Handler: 'index.lambda_handler'
Role: !GetAtt LambdaExecutionRole.Arn
Runtime: 'python3.12'
MemorySize: 1024
Timeout: 120
Environment:
Variables:
TIMESTREAM_DATABASE_NAME: !Ref EventsDatabaseName
TIMESTREAM_TABLE_NAME: !Ref EventsTableName
Code:
ZipFile: |
# Lambda function code goes here
SQSToLambdaEventSourceMapping:
Type: 'AWS::Lambda::EventSourceMapping'
Description: 'Maps the SQS queue as the event source for the Lambda function.'
Properties:
BatchSize: 10
EventSourceArn: !GetAtt EventQueue.Arn
FunctionName: !GetAtt EventProcessorLambda.Arn
Enabled: 'True'
Data Store
Amazon Timestream serves as the primary data store for all events generated across various services. The setup includes:
Below is the snippet for the CFN template to create the Timestream database and the table:
EventsDatabase:
Description: 'Timestream database to store event data'
Type: 'AWS::Timestream::Database'
Properties:
DatabaseName: !Ref EventsDatabaseName
KmsKeyId: alias/aws/timestream
EventsTable:
Description: 'Timestream table that stores event metrics'
Type: 'AWS::Timestream::Table'
DependsOn: EventsDatabase
Properties:
DatabaseName: !Ref EventsDatabase
TableName: !Ref EventsTableName
RetentionProperties:
MemoryStoreRetentionPeriodInHours: 72
MagneticStoreRetentionPeriodInDays: 365
Visualization & Querying
Query
Timestream offers a Query console that allows users to run queries against the table, example:
-- Get the 10 most recent metrics in the past 15 minutes.
SELECT * FROM "events-db"."events-metrics" WHERE time between ago(15m) and now() ORDER BY time DESC LIMIT 10
Visualization
AWS provides many out-of-the-box integrations (e.g. QuickSight, Grafana) with Timestream, making it easy to analyze, visualize, and derive insights.
Services can publish the event in the following format to SQS that triggers the whole processing flow:
{
"order_id": "test-order-1",
"customer_id": "test-customer-1",
"event_type": "order_success",
"metric_value": 1
}
This architecture offers a simple and efficient way to build a scalable and reliable analytics platform. There are other alternatives depending on specific needs, including AWS Kinesis Streams for event processing, Prometheus for a data store, and S3+Athena for batch processing and analytics.