## What is logging? \ Today I want to consider one important component of observability. If monitoring is a pretty clear thing, now I want to focus on considering logging, talk about how to use logs' information, how to work with and aggregate events. \ In my past article, we have already discussed the difference between observability and monitoring. You can find this article here: <https://hackernoon.com/observability-vs-monitoring-whats-the-difference> \ Let us make a brief review of how Linux (and other Unix-like systems too) writes messages into files. \ Logs - text information that is generated by a running program. Just imagine. You run your own program written in any programming language, you want to see what your application is doing at the moment. For this purpose you can add strings like: \ ```clike printf("Hello World\n"); ``` \ in C language \n or \ ```python print("Hello world") ``` \ in python and so on. \ Real programs have hundreds of such 'print' and output lots of information. \ It's OK, when you run your program you'll see what you put as an argument to print() function; But what about daemons? They don't have `stdout` or `stderr`. All interesting information should be written into a file called a log file. Traditionally, Linux has a special system for logging: `syslog` \ :::tip To write there, use a special syscall `syslog()` or `syslog` module in python or `logger` in bash. ::: There are special facilities, severity levels and so on that are used to differentiate messages. It's a pretty powerful system. \ If necessary you can find a detailed description in `man syslog`. Most logs are stored under `/var/log` directory `/var/log` as a place for logs is not a requirement, it is allowed to write logs everywhere. Many applications write logs to their own locations, but syslog is a convenient approach to writing logs, which allows to separate files and locations, including transferring logs to remote storage over a network (Honestly, it depends on implementation. All modern systems use rsyslog which has such support) \ Because logs are text, there are many utilities to work with text in Linux. `Grep`, `uniq`, `sed`, `awk`, `tail`, `head` etc. You must be familiar with them. \ This is very nice, we have a set of utilities and can analyze logs, search for necessary info, then create different tops of something. You should understand, it is only once and next time you need to create this again. It is annoying. \ ## Syslog As was said above, Linux traditionally has a logging system called syslog. Syslog is a Unix subsystem for delivering messages to files. Also, syslog is a main system journal. Depending on Linux flavor, syslog is located at * `/var/log/syslog` (for Debian based distros) or * `/var/log/messages`(for Red Hat like distros) \n But for full understanding, there are many other predefined files to be written: * `/var/log/auth.log` or `/var/log/security.log` - authorization related messages * `/var/log/dmesg` - kernel messages * `/var/log/cron` - for cron jobs and others. \ Let’s take a closer look at syslog, because this is the most well-known place for logging. The system call `syslog()` allows developers not to think about timestamps, what file logs are written to: \ ```clike syslog(LOG_LOCAL0, "%s%s%s\n", strerr, ": ", strerror(err)); ``` \ By default, messages are written into syslog with a prefix of timestamp, hostname, and application name: \ ```bash Aug 23 13:28:17 vds swd: Parsing config file /etc/swd/swd.cfg Aug 23 13:28:17 vds swd: Port number = 80 Aug 23 13:28:17 vds swd: Setting rootdir = /var/www Aug 23 13:28:17 vds swd: Listen to 0.0.0.0 Aug 23 13:28:17 vds swd: Number of workers = 2 Aug 23 13:28:17 vds swd: Started OK, My PID = 26385 ``` \n Of course, though syslog has an application name, sometimes file becomes hard to read and grows too fast. To facilitate this, there are at least two options: * Redirect writing of a specific application log into its own file * Use logrotate to rotate logs and compress \ The best practice is the following: for every application use redirecting to separate file and then rotate. \ You can find pretty examples in ``` /etc/syslog/syslog.d/50-default.conf ``` like this: ``` kern.* -/var/log/kern.log ``` \ which means to write all messages with facility `kern` and with all levels into `/var/log/kern.log` Levels are also different: \ * emerg * alert * crit * err * warning * notice * info * debug \n For your application the most fit facilities are \*\* \n * local0 - local7 * user \ :::warning Messages may came asynchronously to syslog ::: \ Anyway, there is still the option to create custom logs wherever you prefer (even in home directory) Despite this, a recommended place for custom logs is `/var/log/` \ `Logrotate` is useful to prevent eating all disk space, in case you store files locally. Nowadays, most systems transfer their logs to remote storage for many reasons. At the moment, we only notice, rsyslog (a modern implementation of syslog) can also send logs over the network to remote storage. \ ## RSyslog RSyslog - is an abbreviation of ‘Rocket-fast system for log processing’ This system is very advanced in log processing: \ * Multithreaded * Supports TCP, UDP, TLS * Possible to store logs in database like MySQL, PostgreSQL, Oracle, Elasticsearch * Filter any part of log * Customizable output format \ To enhance functionality, Rsyslog has modules: \ * input - collect info from different sources * output - redirect messages, destination may be either local file or remote storage * parse - parse messages * modification - modify messages * string generator - generates string based on message \n Moreover, rsyslog allows creating rules based on filters and actions: \ ``` :msg,contains,"[UFW " /var/log/ufw.log ``` \ which filters messages (`:msg` property in syslog) containing `[UFW` and writes such messages to a specific file `/var/log/ufw.log` \ There are a huge number of different ways to create rulesets (or just rules). You can modify logs of your application, which writes messages to syslog (or just to a file), as you need. For full flexibility, Rsyslog has scripting, which allows creating complex rules for processing messages. Rsyslog has queues inside its architecture to improve performance in multithreaded mode and allows creating queues in config files for actions. On the one hand, such an approach can increase performance significantly, on the other hand can also call for performance degradation. \ ### Conclusion of using Rsyslog \n As shown above, rsyslog is a high performance and advanced system to work with logging. This Unix subsystem allows developers writing programs, but rely on the reliable system for logging. For administrators, modern syslog is a useful tool to configure log flow as it is preferred in distributed systems, including such popular storage like Elasticsearch for further analysis. \ ## Journald Most people already use journald, but don’t suspect this. Look at the command: \ ```bash systemctl status nginx ``` \ will show the status of `nginx` web server and its tail of log. This is an example of using journals. \ Almost all Linux distros have`systemd` instead of `systemV` for many years and shipped with `journald` as a default tool to work with logs. `journald` is a part of `systemd`. ### Features * Binary logs (forgery protection) * Does not require special set-up * Supports multi-line, multi-field logs * Indexed data * Centralized storage * Supports both local storages: disk, memory * Journald has very rich functionality to work with compressing, freeing space, forwarding messages ### What types of logs does journald take? * syslog * systemd units logs * auditd logs * submitting logs via Journal API * kernel logs kmsg ## Auditd \ There is another important log - `auditd`. This system registers kernel events (configured in special files) and writes them into a log. There are many use cases for `auditd`. For more details I invite you to read my article at __<https://medium.com/p/dda085551798>__ ## Log shippers \n We’ve considered two modern log systems in Linux, which allows transferring data to a centralized storage. These are rsyslog, and journald, they are present by default. They have both pros and cons. \ There are many resources, which give information about detailed comparison rsyslogd and journald in terms of remote transferring data, their performance and so on. \ But I would like to focus on a new approach to store logs remotely for further analysis. I mean log shippers - lightweight processes which take file logs as an input, process them if necessary, extract required info and/or transform to specific format, then ingest this to a remote storage. A well-known example is `filebeat` by Elastic. Filebeat is not the only one, there are many implementations from different developers. If you ask me regarding rsyslog vs journald vs filebeat for transferring messages of a specific application, I’ll reply as follows: “My choice is filebeat”. \ In my opinion, it is easier to configure and does only one thing. Of course, it is not a versatile solution, you should find yours for your tasks. ## Pros and cons of remote storing ### **Pros for storing files remotely:** \ * Systems don’t spend disk space for log files * Storing logs remotely, we can conveniently analyze logs from all servers and build dashboards * In such case we prevent logs from being removed accidentally or being made fake \ ### **If logs are stored locally:** \ * Inconvenient to analyze, especially if the system has hundreds of application instances and they are distributed * There is a risk to remove logs in case if a server is compromised * Logs create additional load on the system * If remote server is inaccessible, there is a risk to lose messages (depending on implementation) \ \