paint-brush
Logging in Observability - Part 1by@disson
436 reads
436 reads

Logging in Observability - Part 1

by Denis MatveevAugust 30th, 2022
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

What is logging and how does it help me in enhancing observability of the system I have? This article gives an answer how logging works, why it is nice idea to transfer log to a remote storage.

Company Mentioned

Mention Thumbnail

Coin Mentioned

Mention Thumbnail
featured image - Logging in Observability - Part 1
Denis Matveev HackerNoon profile picture

What is logging?


Today I want to consider one important component of observability. If monitoring is a pretty clear thing, now I want to focus on considering logging, talk about how to use logs' information, how to work with and aggregate events.


In my past article, we have already discussed the difference between observability and monitoring. You can find this article here: https://hackernoon.com/observability-vs-monitoring-whats-the-difference


Let us make a brief review of how Linux (and other Unix-like systems too) writes messages into files.


Logs - text information that is generated by a running program. Just imagine. You run your own program written in any programming language, you want to see what your application is doing at the moment. For this purpose you can add strings like:


printf("Hello World\n");


in C language
or


print("Hello world")


in python and so on.


Real programs have hundreds of such 'print' and output lots of information.


It's OK, when you run your program you'll see what you put as an argument to print() function; But what about daemons? They don't have stdout or stderr. All interesting information should be written into a file called a log file. Traditionally, Linux has a special system for logging: syslog


To write there, use a special syscall syslog() or syslog module in python or logger in bash.

There are special facilities, severity levels and so on that are used to differentiate messages. It's a pretty powerful system.


If necessary you can find a detailed description in man syslog. Most logs are stored under /var/log directory

/var/log as a place for logs is not a requirement, it is allowed to write logs everywhere. Many applications write logs to their own locations, but syslog is a convenient approach to writing logs, which allows to separate files and locations, including transferring logs to remote storage over a network (Honestly, it depends on implementation. All modern systems use rsyslog which has such support)


Because logs are text, there are many utilities to work with text in Linux. Grep, uniq, sed, awk, tail, head etc. You must be familiar with them.


This is very nice, we have a set of utilities and can analyze logs, search for necessary info, then create different tops of something. You should understand, it is only once and next time you need to create this again. It is annoying.


Syslog

As was said above, Linux traditionally has a logging system called syslog. Syslog is a Unix subsystem for delivering messages to files. Also, syslog is a main system journal.

Depending on Linux flavor, syslog is located at

  • /var/log/syslog (for Debian based distros)

or

  • /var/log/messages(for Red Hat like distros)

But for full understanding, there are many other predefined files to be written:

  • /var/log/auth.log or /var/log/security.log - authorization related messages
  • /var/log/dmesg - kernel messages
  • /var/log/cron - for cron jobs

and others.


Let’s take a closer look at syslog, because this is the most well-known place for logging.

The system call syslog() allows developers not to think about timestamps, what file logs are written to:


syslog(LOG_LOCAL0, "%s%s%s\n", strerr, ": ", strerror(err));


 By default, messages are written into syslog with a prefix of timestamp, hostname, and application name:


Aug 23 13:28:17 vds swd: Parsing config file /etc/swd/swd.cfg
Aug 23 13:28:17 vds swd: Port number = 80
Aug 23 13:28:17 vds swd: Setting rootdir = /var/www
Aug 23 13:28:17 vds swd: Listen to 0.0.0.0
Aug 23 13:28:17 vds swd: Number of workers = 2
Aug 23 13:28:17 vds swd: Started OK, My PID = 26385


Of course, though syslog has an application name, sometimes file becomes hard to read and grows too fast. To facilitate this, there are at least two options:

  • Redirect writing of a specific application log into its own file
  • Use logrotate to rotate logs and compress


The best practice is the following: for every application use redirecting to separate file and then rotate.


You can find pretty examples in

/etc/syslog/syslog.d/50-default.conf

like this:

kern.*                          -/var/log/kern.log


which means to write all messages with facility kern and with all levels into /var/log/kern.log

Levels are also different:


  • emerg
  • alert
  • crit
  • err
  • warning
  • notice
  • info
  • debug

For your application the most fit facilities are **

  • local0 - local7
  • user


Messages may came asynchronously to syslog


Anyway, there is still the option to create custom logs wherever you prefer (even in home directory)

Despite this, a recommended place for custom logs is /var/log/


Logrotate is useful to prevent eating all disk space, in case you store files locally. Nowadays, most systems transfer their logs to remote storage for many reasons. At the moment, we only notice, rsyslog (a modern implementation of syslog) can also send logs over the network to remote storage.


RSyslog

RSyslog - is an abbreviation of ‘Rocket-fast system for log processing’

This system is very advanced in log processing:


  • Multithreaded
  • Supports TCP, UDP, TLS
  • Possible to store logs in database like MySQL, PostgreSQL, Oracle, Elasticsearch
  • Filter any part of log
  • Customizable output format


To enhance functionality, Rsyslog has modules:


  • input - collect info from different sources
  • output - redirect messages, destination may be either local file or remote storage
  • parse - parse messages
  • modification - modify messages
  • string generator - generates string based on message

Moreover, rsyslog allows creating rules based on filters and actions:


:msg,contains,"[UFW " /var/log/ufw.log


which filters messages (:msg property in syslog) containing [UFW and writes such messages to a specific file /var/log/ufw.log


There are a huge number of different ways to create rulesets (or just rules). You can modify logs of your application, which writes messages to syslog (or just to a file), as you need.

For full flexibility, Rsyslog has scripting, which allows creating complex rules for processing messages.

Rsyslog has queues inside its architecture to improve performance in multithreaded mode and allows creating queues in config files for actions. On the one hand, such an approach can increase performance significantly, on the other hand can also call for performance degradation.


Conclusion of using Rsyslog

As shown above, rsyslog is a high performance and advanced system to work with logging. This Unix subsystem allows developers writing programs, but rely on the reliable system for logging. For administrators, modern syslog is a useful tool to configure log flow as it is preferred in distributed systems, including such popular storage like Elasticsearch for further analysis.


Journald

Most people already use journald, but don’t suspect this. Look at the command:


systemctl status nginx


will show the status of nginx web server and its tail of log. This is an example of using journals.


Almost all Linux distros havesystemd instead of systemV for many years and shipped with journald as a default tool to work with logs. journald is a part of systemd.

Features

  • Binary logs (forgery protection)
  • Does not require special set-up
  • Supports multi-line, multi-field logs
  • Indexed data
  • Centralized storage
  • Supports both local storages: disk, memory
  • Journald has very rich functionality to work with compressing, freeing space, forwarding messages

What types of logs does journald take?

  • syslog
  • systemd units logs
  • auditd logs
  • submitting logs via Journal API
  • kernel logs kmsg

Auditd


There is another important log - auditd. This system registers kernel events (configured in special files) and writes them into a log. There are many use cases for auditd. For more details I invite you to read my article at https://medium.com/p/dda085551798

Log shippers

We’ve considered two modern log systems in Linux, which allows transferring data to a centralized storage. These are rsyslog, and journald, they are present by default. They have both pros and cons.


There are many resources, which give information about detailed comparison rsyslogd and journald in terms of remote transferring data, their performance and so on.


But I would like to focus on a new approach to store logs remotely for further analysis. I mean log shippers - lightweight processes which take file logs as an input, process them if necessary, extract required info and/or transform to specific format, then ingest this to a remote storage. A well-known example is filebeat by Elastic. Filebeat is not the only one, there are many implementations from different developers. If you ask me regarding rsyslog vs journald vs filebeat for transferring messages of a specific application, I’ll reply as follows: “My choice is filebeat”.


In my opinion, it is easier to configure and does only one thing. Of course, it is not a versatile solution, you should find yours for your tasks.

Pros and cons of remote storing

Pros for storing files remotely:


  • Systems don’t spend disk space for log files

  • Storing logs remotely, we can conveniently analyze logs from all servers and build dashboards

  • In such case we prevent logs from being removed accidentally or being made fake


If logs are stored locally:


  • Inconvenient to analyze, especially if the system has hundreds of application instances and they are distributed
  • There is a risk to remove logs in case if a server is compromised
  • Logs create additional load on the system
  • If remote server is inaccessible, there is a risk to lose messages (depending on implementation)