“If you can’t measure, you can’t improve it” - some famous person
Understanding the differences
Monitoring — understand the state of the system, based on gathering predefined sets of metrics or logs.
Observability — infer the state of a system, based on exploring properties and patterns not defined in advance.
Why do we need monitoring?
Monitoring should address two questions: what’s broken, and why? What vs. why is one of the most important distinctions in doing good monitoring with maximum signal and minimum noise.
Why do we need observability?
Basically, monitoring relies on capturing and displaying the data providing a restricted view of the system, whereas observability can anticipate the system's health based on the data it generates (logs, metrics, traces).
Lots of software jobs (especially SRE) include different monitoring tech stacks, one might argue that you can make a living only from mastering those specific technologies.
The tooling landscape might seem daunting. And at a first glance, it looks overwhelming, especially since each technology comes with a specific nomenclature like forwarder, indexer, exporter, data-source, controller, etc. When navigating through all these matters, we need to know the basics.
System metrics vs application metrics
Usually, system metrics capture infrastructure-related metrics such as CPU and memory consumption, disk I/O, network I/O, whereas application metrics refer to error rates, requests per minute, average response times.
Agent vs agentless
At times it might be needed that some kind of agent to be deployed on your system (e.g. Splunk forwarder, AppDynamics app agents), and in some cases there’s no need for an agent, for example, Prometheus which uses an HTTP pull model to populate a time-series database.
Push vs. Pull monitoring
Push model, the agents push their data to the monitoring system whereas pull model the system pulls data from the agents. The key difference is that in the push-based approach (Nagios, Zabbix) the central monitoring system knows quite a lot about the metrics whereas in the pull-based approach (Prometheus, Datadog) the main monitoring system knows nothing or very little about the metrics which are coming in.
Tooling landscape
Monitoring can mean a lot of things
As a piece of advice, it’s important to understand that monitoring might be different from one company to another, nothing is written in stone.
One way to measure the observability in an organization is to check the following aspects: