Logs are a critical part of any system, they give you deep insights about your application, what your system is doing and what caused the error, when something wrong happens. Virtually every system generates logs in some form or another, these logs are written to files on local disks. When you’re building enterprise level application, your system goes to multiple hosts, managing the logs across multiple hosts can be complicated. Debugging the error in the application across hundreds of log files on hundreds of servers can be very time consuming and complicated. A common approach to this problem is building a centralized logging application which can collect and aggregate different types of logs in one central location.
There are many tools available to which can solve some part of the problem but we need to build a robust application using all these tools.
There are total four parts in centralized logging application — Collect logs, transport, store and analyse. We are going to look at each of this parts in depth and see how we can build an application.
All the applications create logs in different ways, some applications log through syslogs and other logs directly in files. When you see a typical web application running on a Linux server, there will be a dozen of more log files in
/var/log and also a few application-specific logs in the home directories and other locations. Basically, there will be logs generated by different applications at a different place.
Now, consider you have a web application running on the server and if something goes down, your developers or operations team need to access log data quickly in order to troubleshoot live issues, you would need a solution which can monitor the changes in the log files in almost real-time. To solve this issue, you can follow replication approach,
Replication approach would be good for analytics, if you need to analyze log data offline for calculating metrics or other batch related work, replication approach might be a good fit.
If you have multiple hosts running then logs data can accumulate quickly. There should be an efficient and reliable way to transport this data to the centralized application and ensure data is not lost.
There are many frameworks available to transport log data. One way is directly plug input sources and framework can start collecting logs and another way is to send log data via API, application code is written to log directly to these sources it reduces latency and improves reliability.
If you want to provide a number of input sources you can use:
These frameworks provide input sources but also support natively tailing files and transporting them reliably. These frameworks are a better fit for more general application.
To log data via APIs, which is generally a more preferred way to log data to a central application, these are following frameworks that can be used.
So this was about the transport, now let’s what would be the efficient way to store such a large amount logs data.
Now we have transport in place, logs will need a destination, a storage where all the log data will be saved. The system should be highly scalable as the data will keep on growing and it should be able to handle the growth over time. Logs data will depend on the how huge your applications are if your application is running on multiple servers or in many containers it will generate more logs.
There are a couple of things, that we need to keep in mind while deciding the storage.
Logs are meant for analysis and analytics. Once your logs are stored in a centralized location, you need a way to analyze them. There are many tools available for log analysis, if you need a UI for analysis, you can parse all the data in ElasticSearch and use Kibana or Greylog2 to query and inspect the data. Grafana and Kibana can be used to show real-time data analytics.
This is the last component in the centralized logging application. It’s nice to have an alerting system which will alert us to any change in the log patterns or calculated metrics.
Logs are very useful for troubleshooting errors. It’s far better to have some alerting build in the logging application system which will send an email or notify us then to have someone keep watching logs for any changes. There are many error reporting tools available, you can use Sentry or Honeybadger. These aggregates repetitive exceptions which give you an idea of how frequently an error is happening.
Alerting is also useful for monitoring hundreds of servers, logs will be sending the status of different applications and you can setup alert system to check whether your system is up or down. Alerting is really useful in error troubleshooting, monitoring and threshold reporting. Riemann is very good software for monitoring and alerting.
So in part 1, we talked about all the available softwares and components we need to build a centralized logging application, in Part 2, we will start building our application, starting with Transport, we will see how to setup Transport component for a simple NodeJS application which will send logs to a central system
If you liked the article, don’t forget to show some love and follow me to receive the updates on Part 2 of this series.
And with that, I will this article. I am open for suggestions and feedback on the technical details of the blog post. As always, I’m always looking to work on amazing projects. If you are working on something interesting, let’s talk! You can comment here to share what you think. Stay tuned for part 2 :)
Also Hey, if you like what you just read, please like this resource by hitting the green “Recommend” icon, share it on Twitter or Facebook so that other people may also stumble upon this.