268 reads

Logging in Observability - Part 2

by Denis MatveevSeptember 18th, 2022

Too Long; Didn't Read

This is the 2nd part of an article dedicated to logging in observability. In this part, I want to describe how to work with logs and analyze them in the command line and using modern approaches.

Companies Mentioned

featured image - Logging in Observability - Part 2

How to work with logs

This is the 2nd part of the article dedicated to logging in observability. In this part, I want to describe how to work with logs, analyze them in the command line, and we will consider new modern tools to visualize logs.

The previous part is available here.

CLI tools

So, we now start with logging analysis using standard Linux CLI. You should know them: grep, cat, uniq, sed etc, Linux has powerful tools to work with text information. Historically, all logs are text, only new logs produced by journald have a binary format.

Let’s take an example of an Nginx log, the most popular web server. Nginx logs have the following format (but, of course, this is configurable):

209.141.00.000 - - [11/Sep/2022:06:28:40 +0000] "GET /favicon.ico HTTP/1.1" 404 22 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36"

As you can see, the log consists of a few columns:

<ip address><timestamp><HTTP method><URI><response code> <user agent>

Usually, we want to make a selection by IP address or get distribution by ip address, top of ip addresses in requests.

To do this, here is a sample script:

# cat  /var/log/nginx/access.log | cut -d" " -f1 | sort -n | uniq -c | sort -n -r | head
     44 116.203.000.000
     39 198.50.000.00
      3 222.186.00.000
      2 8.45.00.00
      2 209.141.00.000
      2 209.141.00.000
      2 152.89.00.000
      1 92.118.00.86
      1 66.240.000.34
      1 45.83.00.166

We’ll receive a top of IP addresses, from which server gets the most requests.

Command cat in the beginning, allows us to pass many files to be read and then send them to a pipe:

# zcat /var/log/nginx/access.log.*.gz | cut -d" " -f1 | sort -n | uniq -c | sort -n -r | head
  14323 192.168.12.41
   1868 116.203.000.00
    119 186.84.000.00
     92 20.211.000.000
     65 63.33.000.000
     65 184.169.000.000
     56 45.33.000.000
     46 135.181.00.000
     39 167.114.000.000
     36 20.198.000.000

Wow, this is a different result and we can assess how many requests server has and which user performs http flood and must be banned (in ipset) !

Of course, there is a way to sort by server response code:

# zcat /var/log/nginx/access.log.*.gz | cut -d" " -f9 | sort -n | uniq -c
     63 "-"
      2 SP1
      2 Win64;
      2 \x2200000001\x22,
      2 \x22id\x22:
     19 150
  16743 200
     89 400
   1057 404
     35 405
      2 408

You should know only the number of a field to be sorted. It’s inconvenient and not flexible. In the example above, there are many waste symbols in column 9. Moreover, in case of the number of servers is more than one, it will be complicated to do such an analysis. In this case, a script is more complex and must have logic to find the necessary field in a string of a log. Then, you spend lots of time processing logs in this approach. When you read logs in a command line interface, it does not improve the observability of a system you have.

Too many problems, isn’t it?

I suppose it’s clear that this led to the evolution of logs analysis and now there are many more convenient tools to store, process logs, and accelerate work with massive amounts of information.

Log management solutions

As was told in the previous part, there are log shippers, which transfer logs to remote storage.

What is remote storage and how does it help us? Usually, remote storage is a database (time series or traditional SQL). Before putting logs in the database you have chosen as storage, logs should be processed and unified. Thus, multicomponent systems have been created to work with logs. Also, developers, DevOps, and SRE want to visualize information from logs that were extracted and there are web interfaces that give us graphs, beautiful visualizations of distributions, and awesome histograms.

Open source popular log management systems for extracting data from application output, storing, and visualizing:

ELK or EFK
Grafana Loki
NXLog

The most popular log shippers are:

filebeat, auditbeat, and other beats from Elastic.
Fluentbit and fluentd
rsyslog can work as a log shipper
Promtail

A well-known log management tool is ELK stack, which means Elastic, Logstash, and Kibana. EFK is Elastic, Fluentd(or Fluentbit) and Kibana. Fluentbit and Fluentd are similar and were made by the same company, but Fluentbit is an extremely highly efficient log shipper and processor, it has a size of only 450 Kbytes(!), but is slightly poor in its capabilities.

Filebeat and other beats are written in go, lightweight, and resource-efficient.

These are three components from which this system was built:

Elastic is a search engine, including storage with its own query language.
Logstash - logs processing pipeline on a server
Kibana - web interface for Elastic

Logstash is written in Java with its advantages and disadvantages, as a log shipper is now replaced by filebeat and other beats (depending on your purposes). Now logstash is used as a proxy and a processor before ingesting data to Elastic. It is not a mandatory layer.

There are many articles, tutorials about installation and configuring, and tons of support messages on the official site. Elastic also has a commercial license with a few additional features.

Another popular open-source solution is Grafana Loki.

Grafana Loki was inspired by Prometheus and created by the same crew. Since it was created by the Prometheus crew, Loki can be integrated with Prometheus very easily. It also works very well with cloud-based applications and Kubernetes. Loki has all the necessary components to get started. It is powerful and flexible.

Both ELK and Loki are horizontally scalable and able to store a significant amount of data. Both support high availability out of the box.

NXLog is a less popular solution for working with collecting logs. But it has both pros and cons. Firstly, it is written in C and has high performance. It can work with different storage, including SQL DBMS.

But this one has no web interface for querying the database and building a dashboard.

NXLog also has:

HA and load balancing out of the box
Extendable (has a modular structure)
Free and open source
Supports many storages
High performance and lightweightness
Flexibility

Grafana Loki and ELK have their own storage for centralized indexed data. NXLog can work with a number of storages and has no its own. NXLog integrated with DB Raijin which supports SQL. Moreover, it’s possible to forward data to PostgreSQL with the om_odbc module. Then, PostgreSQL turns into a time series DB just by Installing the extension TimescaleDB. Profit!

As for visualization, there are third-party applications, let’s say for work with Raijin, it’s possible to bind with Apache superset or Grafana. For timescaleDB based on PostgreSQL, you can configure, let’s say, Grafana.

If you are familiar with SQL, it may be an advantage to choose NXLog with PostgreSQL or Raijin.