Hackernoon logoBuilding A Real Time Event-Driven Access Log System Using Docker, Python, Amazon SNS & SQS by@eon01

Building A Real Time Event-Driven Access Log System Using Docker, Python, Amazon SNS & SQS

Aymen Hacker Noon profile picture

Aymen

This post is the part II of a series of practical posts that I am writing to help developers and architects understand and build service-oriented architecture and microservices.

I wrote other stories in the same context like these are some links:

This article is also part of my book that I am “lean-publishing” called Painless Docker: Unlock The Power Of Docker & Its Ecosystem. Painless Docker is a practical guide to master Docker and its ecosystem based on real world examples.

http://painlessdocker.com

In my last post (Benchmarking Amazon SNS & SQS For Inter-Process Communication In A Microservices Architecture), I tested the messaging mechanism using SNS/SQS and even if benchmarks was done from my Laptop (and not EC2 instance), results were good.

The last article was featured on many newsletters, so I decided to continue my tests and publish this post.

Event-driven architecture (EDA) (or message-driven architecture), is a software architecture pattern that promotes the production and the consumption of messages while evoking a specific event/reaction in response to a consumed message.

A classic system architecture will promote reading and reacting to data after saving it to a data store (mysql, postgresql, mongodb ..etc) but this is not really the best thing to do, especially if you are doing real time or near real time processing, unless you want to spend time and many building an instantaneous reactive system, please don't use databases, STREAM DATA INSTEAD.

I created two machines (you can use one for both publisher and subscriber, since it doesn’t change nothing in the networking)

Used EC2 Machines

This is the simplified architecture and I was the AB Load Tester. Both machines and services are hosted in the eu-west-1 region.

In order to minimize the transfer time, it is recommended to use the publisher and the consumer machines in the same region.

Load Testing ?

Let’s consider the example of a web server writing access logs to an EC2 disk.

In the first machine, I installed Nginx:

apt-get -y install nginx
Tested Page

For simplicity sake, I kept the default Nginx page, our test is about networking not a Nginx load test.

From left to right:

  • in the first machine I started the publisher container that will read the access logs and sends them to SNS.
  • in the second machine I started the consumer container that will read the data sent from SNS to SQS (it is directly connected to SQS service)
  • in the third machine, my localhost, I have done a load test and as you can see I sent out 1000 requests with a concurrency level of 5

I used Apache Benchmarking for load testing my server:

ab -n 1000 -c 5 http://ec2-34-248-177-221.eu-west-1.compute.amazonaws.com/

Once again, my test is primarily about networking and data sent from :

publisher -> SNS -> SQS -> Consumer

If I wanted to test Nginx I will probably set higher the concurrency level.

This is another useful infomation about the request:

curl -I http://ec2-34-248-177-221.eu-west-1.compute.amazonaws.com
HTTP/1.1 200 OK
Server: nginx/1.10.0 (Ubuntu)
Date: Wed, 25 Jan 2017 22:17:27 GMT
Content-Type: text/html
Content-Length: 612
Last-Modified: Wed, 25 Jan 2017 21:53:57 GMT
Connection: keep-alive
ETag: “58891e75–264”
Accept-Ranges: bytes

And of course my test:

Benchmarking ec2-34-248-177-221.eu-west-1.compute.amazonaws.com (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Completed 1000 requests
Finished 1000 requests
Server Software:        nginx/1.10.0
Server Hostname: ec2-34-248-177-221.eu-west-1.compute.amazonaws.com
Server Port: 80
Document Path:          /
Document Length: 612 bytes
Concurrency Level:      5
Time taken for tests: 14.823 seconds
Complete requests: 1000
Failed requests: 0
Total transferred: 854000 bytes
HTML transferred: 612000 bytes
Requests per second: 67.46 [#/sec] (mean)
Time per request: 74.114 [ms] (mean)
Time per request: 14.823 [ms] (mean, across all concurrent requests)
Transfer rate: 56.26 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 31 37 13.3 34 176
Processing: 32 37 10.2 34 141
Waiting: 31 37 10.2 34 141
Total: 63 74 17.7 69 209
Percentage of the requests served within a certain time (ms)
50% 69
66% 71
75% 73
80% 75
90% 85
95% 98
98% 150
99% 180
100% 209 (longest request)

To run the publisher container I started my container log-publisher:

docker run -it --name publisher -v /var/log/nginx/access.log:/logs -e AWS_ACCESS_KEY_ID=xxx -e AWS_SECRET_ACCESS_KEY=xxx -e SNS_TOPIC_ARN=arn:aws:sns:eu-west-1:xxxx:test -e TAG=vm1 -e REGION=eu-west-1  eon01/log-publisher:latest

Same thing for the subscriber:

docker run -it --name subscriber -e AWS_ACCESS_KEY_ID=xxx -e AWS_SECRET_ACCESS_KEY=xxx -e SQS_QUEUE_NAME=test -e REGION=eu-west-1  eon01/log-subscriber:latest

You may redirect the output to a file, since these two containers are made to be verbose.

Using Python/SNS To Create A Publisher

This is the primary code that I’ve used to publish any file mapped to /logs (from outside the container) to SNS and line by line using tailer lib.

Since Docker support environment variables, I used this feature to make my program use also the same variables that I used in the Docker Run command.

import boto.sns, time, json, logging
from datetime import datetime
import os
import tailer
aws_access_key_id = os.environ['AWS_ACCESS_KEY_ID']
aws_secret_access_key = os.environ['AWS_SECRET_ACCESS_KEY']
region = os.environ['REGION']
sns_topic_arn = os.environ["SNS_TOPIC_ARN"]
tag = os.environ["TAG"]
file_path = "/logs"
logging.basicConfig(filename="sns-publish.log", level=logging.DEBUG)
c = boto.sns.connect_to_region(region, aws_access_key_id = aws_access_key_id, aws_secret_access_key=aws_secret_access_key)
while 1:
for body in tailer.follow(open(file_path)):
subject = str(time.time()) + " " + tag
print str(time.time())
publication = c.publish(sns_topic_arn, body, subject)

Using Python/SQS To Create A Subscriber

This piece of code uses also boto in order to connect to the right SQS and print the date just after getting the sent message.

I used the same thing like Python/SQS for environment variables in this script.

import boto.sqs, time, json
import os
from datetime import datetime
aws_access_key_id = os.environ['AWS_ACCESS_KEY_ID']
aws_secret_access_key = os.environ['AWS_SECRET_ACCESS_KEY']
region = os.environ['REGION']
sqs_queue_name = os.environ["SQS_QUEUE_NAME"]
conn = boto.sqs.connect_to_region(region, aws_access_key_id = aws_access_key_id, aws_secret_access_key=aws_secret_access_key)
queue = conn.get_queue(sqs_queue_name)
x = 0
while 1:
try:
result_set = queue.get_messages()
if result_set != []:
message = result_set[0]
print str(time.time())
message_body = message.get_body()
m = json.loads(message_body)
subject = m["Subject"]
body = m["Message"]
message_id = m["MessageId"]
conn.delete_message(queue, message)
except IndexError:
pass

Benchmarking Results

I used Google Sheets to calculate the difference between the two timestamps :

  • Time just before sending to SNS (H)
  • Time just after receiving the message from SQS (I)

And this is the chart that show the time between I and J (J = I -HI).

The test lasted 14.823 seconds and during it 1000 requests were sent with a concurrency level of 5 requests. IMHO, these are good results as the highest response time was 0.28 second and the lowest was 0.009 second.

This is the distribution of different response times are below:

Response Time Disctribution

This another chart where I put the highest, the lowest and the average transportation time:

Average / Min / Max of SNS->SQS Networking Time

That’s all folks, the part III is coming soon. For more updates, follow me using these links ↓

Connect Deeper

Microservices are changing how we make software but one of its drawbacks is the networking part that could be complex sometimes and messaging is impacted directly by the networking problems. Using SNS/SQS and a pub/sub model seems to be a good solution to create an inter-service messaging middleware. The publisher/subscriber scripts that I used are not really optimised for load and speed but they are a good use case.

If you resonated with this article, please join more than 1000 passionate DevOps engineers, Developers and IT experts from all over the world and subscribe to DevOpsLinks.

You can find me on Twitter, Clarity or my website and you can also check my books and trainings : SaltStack For DevOps, Practical AWS & Painless Docker.

If you liked this post, please recommend and share it to your followers.

Don’t forget to check my training Practical AWS

Tags

Join Hacker Noon

Create your free account to unlock your custom reading experience.