In every episode of Sherlock, there’s a scene where our namesake character rapidly breaks down a complex web of tangled, seemingly unrelated observations into a single clear analysis. Just like Sherlock, you’ve likely stared into the abyss of a seemingly untraceable bug, exception, or performance problem.
Often, they look like this:
Digging through logs via tail
, awk
, grep
, etc. for clues is a tedious and frequently futile effort. Fortunately, there is a way to quickly breakdown a complex problem with Sherlock-like authority: Kibana and the rest of the ELK stack.
Let’s setup an investigation super platform for a Rails app.
We’ll utilize a couple open-source external services to process and visualize our data: Logstash, Elasticsearch, and Kibana. Within our Rails app, we’ll need to log data in a Logstash-compatible format. Finally, we’ll need to ship the data to Logstash.
Let’s break these parts down.
We’ll send our log data to Logstash. Logstash will extract data from our Rails logs (controller actions, response times, hostnames, IP addresses, and more) and store it in Elasticsearch.
We’ll use Kibana to query and visualize the log data stored in Elasticsearch.
Together, Elasticsearch, Logstash, and Kibana are referred to as the ELK stack. These services should be installed on a dedicated host: processing and querying event data is resource-intensive.
At Scout, we’ve used an AWS m4.xlarge to host the ELK stack. This gives us 7 days of logs for $215/mo.
For a small organization, $215 isn’t a trivial expense. However, we’ll shortly see how Kibana can track down problems in ways tail -f
, awk
, and grep
can’t.
Shameless Promotion: Want an investigative platform without the moving parts? I’m helping develop an approachable, hosted Kibana alternative. Signup for early access.
We’ll install three gems to log events in a Logstash-compatible format.
Gem #1: [**Lograge**](https://github.com/roidrage/lograge)
. Lograge lets us shrink the large number of lines generated by the default Rails logger to log a single request into a single line:
method=GET path=/jobs/833552.json format=json controller=JobsController action=show status=200 duration=58.33 view=40.43 db=15.26
Lograge supports multiple output formats, one of which is Logstash. Here’s how a single request looks in the Logstash format:
To use the Logstash format with Lograge, install gem #2: **logstash-event**
.
Finally, gem #3:[**logstash-logger**](https://github.com/dwbutler/logstash-logger)
. This lets us log every message in a Logstash-compatible format. By themselves, lograge
+logstash-event
only log request events in Logstash format.
Finally, we need to take the logged data and ship it to Logstash. We’ll use [node-logstash](https://github.com/bpaquet/node-logstash)
to do this. node-logstash
is a NodeJS service that needs to be installed on each host our Rails app runs on.
Here’s an example of a node-logstash config file.
We’ll focus on the Rails configuration vs. setting up the ELK parts. There are many tutorials available for this.
Add the following to your Gemfile
:
Configure Lograge via aconfig/initializers/lograge.rb
file:
There’s one last step: for production, log everything in the Logstash format. This isn’t necessary for development. Modify config/environments/production.rb
:
Our plain vanilla log files have been upgraded to a flexible, query-able, structured format. Access Kibana and you’ll see a stream of events and possible filters in the sidebar:
Here’s some handy search queries:
The default config is useful by itself, but the true power comes via custom fields. We can log the users, companies, database shards, etc associated with each request. Basically, anything that can impact performance that can’t be parsed from the default data sent by Lograge.
Here’s how we do this @ Scout via 2 parts:
We accumulate important fields to store in a Thread local variable via a before_filter
. Here’s how we store the current user:
These events are sent when Rails completes serving a request. We subscribe to these, adding additional fields collected while processing the request.
We were a step forward just running queries and viewing counts. However, viewing a list of matching events isn’t that helpful when you’re trying to understand the composition of your app’s behavior. For example, which users are seeing the most 1 second+ response times?
We can do that via Kibana’s Visualize feature and aggregating by user_email
:
At this point, you should be excited about the ways Kibana can improve your detective skills: keeping an eye on the experience of key customers, response times by host, ensuring trials users aren’t hitting exceptions, and more.
In short, Kibana breaks gives you the power to break a complex problem into into a collection of simple observations. Just like Sherlock.
Shameless Promotion #2: Want an investigative platform without the work? I’m helping develop an approachable Kibana alternative via a simple gem install. Signup for early access.