Building A Real Time Event-Driven Access Log System Using Docker, Python, Amazon SNS & SQS

Written by eon01 | Published 2017/01/25
Tech Story Tags: docker | devops | aws | microservices | cloud-computing

TLDRvia the TL;DR App

This post is the part II of a series of practical posts that I am writing to help developers and architects understand and build service-oriented architecture and microservices.

I wrote other stories in the same context like these are some links:

This article is also part of my book that I am “lean-publishing” called Painless Docker: Unlock The Power Of Docker & Its Ecosystem. Painless Docker is a practical guide to master Docker and its ecosystem based on real world examples.

http://painlessdocker.com

In my last post (Benchmarking Amazon SNS & SQS For Inter-Process Communication In A Microservices Architecture), I tested the messaging mechanism using SNS/SQS and even if benchmarks was done from my Laptop (and not EC2 instance), results were good.

The last article was featured on many newsletters, so I decided to continue my tests and publish this post.

Event-driven architecture (EDA) (or message-driven architecture), is a software architecture pattern that promotes the production and the consumption of messages while evoking a specific event/reaction in response to a consumed message.

A classic system architecture will promote reading and reacting to data after saving it to a data store (mysql, postgresql, mongodb ..etc) but this is not really the best thing to do, especially if you are doing real time or near real time processing, unless you want to spend time and many building an instantaneous reactive system, please don't use databases, STREAM DATA INSTEAD.

I created two machines (you can use one for both publisher and subscriber, since it doesn’t change nothing in the networking)

Used EC2 Machines

This is the simplified architecture and I was the AB Load Tester. Both machines and services are hosted in the eu-west-1 region.

In order to minimize the transfer time, it is recommended to use the publisher and the consumer machines in the same region.

Load Testing ?

Let’s consider the example of a web server writing access logs to an EC2 disk.

In the first machine, I installed Nginx:

apt-get -y install nginx

Tested Page

For simplicity sake, I kept the default Nginx page, our test is about networking not a Nginx load test.

From left to right:

  • in the first machine I started the publisher container that will read the access logs and sends them to SNS.
  • in the second machine I started the consumer container that will read the data sent from SNS to SQS (it is directly connected to SQS service)
  • in the third machine, my localhost, I have done a load test and as you can see I sent out 1000 requests with a concurrency level of 5

I used Apache Benchmarking for load testing my server:

ab -n 1000 -c 5 http://ec2-34-248-177-221.eu-west-1.compute.amazonaws.com/

Once again, my test is primarily about networking and data sent from :

publisher -> SNS -> SQS -> Consumer

If I wanted to test Nginx I will probably set higher the concurrency level.

This is another useful infomation about the request:

curl -I http://ec2-34-248-177-221.eu-west-1.compute.amazonaws.comHTTP/1.1 200 OKServer: nginx/1.10.0 (Ubuntu)Date: Wed, 25 Jan 2017 22:17:27 GMTContent-Type: text/htmlContent-Length: 612Last-Modified: Wed, 25 Jan 2017 21:53:57 GMTConnection: keep-aliveETag: “58891e75–264”Accept-Ranges: bytes

And of course my test:

Benchmarking ec2-34-248-177-221.eu-west-1.compute.amazonaws.com (be patient)Completed 100 requestsCompleted 200 requestsCompleted 300 requestsCompleted 400 requestsCompleted 500 requestsCompleted 600 requestsCompleted 700 requestsCompleted 800 requestsCompleted 900 requestsCompleted 1000 requestsFinished 1000 requests

Server Software: nginx/1.10.0Server Hostname: ec2-34-248-177-221.eu-west-1.compute.amazonaws.comServer Port: 80

Document Path: /Document Length: 612 bytes

Concurrency Level: 5Time taken for tests: 14.823 secondsComplete requests: 1000Failed requests: 0Total transferred: 854000 bytesHTML transferred: 612000 bytesRequests per second: 67.46 [#/sec] (mean)Time per request: 74.114 [ms] (mean)Time per request: 14.823 [ms] (mean, across all concurrent requests)Transfer rate: 56.26 [Kbytes/sec] received

Connection Times (ms)min mean[+/-sd] median maxConnect: 31 37 13.3 34 176Processing: 32 37 10.2 34 141Waiting: 31 37 10.2 34 141Total: 63 74 17.7 69 209

Percentage of the requests served within a certain time (ms)50% 6966% 7175% 7380% 7590% 8595% 9898% 15099% 180100% 209 (longest request)

To run the publisher container I started my container log-publisher:

docker run -it --name publisher -v /var/log/nginx/access.log:/logs -e AWS_ACCESS_KEY_ID=xxx -e AWS_SECRET_ACCESS_KEY=xxx -e SNS_TOPIC_ARN=arn:aws:sns:eu-west-1:xxxx:test -e TAG=vm1 -e REGION=eu-west-1 eon01/log-publisher:latest

Same thing for the subscriber:

docker run -it --name subscriber -e AWS_ACCESS_KEY_ID=xxx -e AWS_SECRET_ACCESS_KEY=xxx -e SQS_QUEUE_NAME=test -e REGION=eu-west-1 eon01/log-subscriber:latest

You may redirect the output to a file, since these two containers are made to be verbose.

Using Python/SNS To Create A Publisher

This is the primary code that I’ve used to publish any file mapped to /logs (from outside the container) to SNS and line by line using tailer lib.

Since Docker support environment variables, I used this feature to make my program use also the same variables that I used in the Docker Run command.

import boto.sns, time, json, loggingfrom datetime import datetime

import osimport tailer

aws_access_key_id = os.environ['AWS_ACCESS_KEY_ID']aws_secret_access_key = os.environ['AWS_SECRET_ACCESS_KEY']region = os.environ['REGION']sns_topic_arn = os.environ["SNS_TOPIC_ARN"]tag = os.environ["TAG"]

file_path = "/logs"

logging.basicConfig(filename="sns-publish.log", level=logging.DEBUG)c = boto.sns.connect_to_region(region, aws_access_key_id = aws_access_key_id, aws_secret_access_key=aws_secret_access_key)

while 1:for body in tailer.follow(open(file_path)):subject = str(time.time()) + " " + tagprint str(time.time())publication = c.publish(sns_topic_arn, body, subject)

Using Python/SQS To Create A Subscriber

This piece of code uses also boto in order to connect to the right SQS and print the date just after getting the sent message.

I used the same thing like Python/SQS for environment variables in this script.

import boto.sqs, time, jsonimport osfrom datetime import datetime

aws_access_key_id = os.environ['AWS_ACCESS_KEY_ID']aws_secret_access_key = os.environ['AWS_SECRET_ACCESS_KEY']region = os.environ['REGION']sqs_queue_name = os.environ["SQS_QUEUE_NAME"]

conn = boto.sqs.connect_to_region(region, aws_access_key_id = aws_access_key_id, aws_secret_access_key=aws_secret_access_key)queue = conn.get_queue(sqs_queue_name)

x = 0

while 1:try:result_set = queue.get_messages()if result_set != []:message = result_set[0]print str(time.time())message_body = message.get_body()m = json.loads(message_body)subject = m["Subject"]body = m["Message"]message_id = m["MessageId"]conn.delete_message(queue, message)except IndexError:pass

Benchmarking Results

I used Google Sheets to calculate the difference between the two timestamps :

  • Time just before sending to SNS (H)
  • Time just after receiving the message from SQS (I)

And this is the chart that show the time between I and J (J = I -HI).

The test lasted 14.823 seconds and during it 1000 requests were sent with a concurrency level of 5 requests. IMHO, these are good results as the highest response time was 0.28 second and the lowest was 0.009 second.

This is the distribution of different response times are below:

Response Time Disctribution

This another chart where I put the highest, the lowest and the average transportation time:

Average / Min / Max of SNS->SQS Networking Time

That’s all folks, the part III is coming soon. For more updates, follow me using these links ↓

Connect Deeper

Microservices are changing how we make software but one of its drawbacks is the networking part that could be complex sometimes and messaging is impacted directly by the networking problems. Using SNS/SQS and a pub/sub model seems to be a good solution to create an inter-service messaging middleware. The publisher/subscriber scripts that I used are not really optimised for load and speed but they are a good use case.

If you resonated with this article, please join more than 1000 passionate DevOps engineers, Developers and IT experts from all over the world and subscribe to DevOpsLinks.

You can find me on Twitter, Clarity or my website and you can also check my books and trainings : SaltStack For DevOps, Practical AWS & Painless Docker.

If you liked this post, please recommend and share it to your followers.

Don’t forget to check my training Practical AWS


Written by eon01 | I help developers learn and grow by keeping them up with what matters!
Published by HackerNoon on 2017/01/25