Monitoring vs observability – is there even a difference and is your monitoring system observable? Observability has gained a lot of popularity in recent years. Modern DevOps paradigms encourage building robust applications by incorporating automation, Infrastructure as Code, and agile development. To assess the health and “robustness” of IT systems, engineering teams typically use logs, metrics, and , which are used by various developer tools to facilitate observability. But what is observability exactly, and how does it differ from monitoring? traces What is observability? is a measure of how well internal states of a system can be inferred from knowledge of its external outputs.” — “Observability Wikipedia An observable system allows us to assess how the system works without interfering or even interacting with it. Simply by looking at the outputs of a system ( ), we can assess how this system is performing. such as logs, metrics, traces Monitoring vs Observability One of the best explanations about monitoring and observability I’ve seen was provided in an online course, by Morgan Willis, a Senior Cloud Technologist at AWS. “Building Modern Python Applications on AWS”, “ . What types of data we collect, what we do with the data, and if that data is readily analyzed or available is a different story. This is where observability comes into play. Observability is not a verb, it’s not something you do. Instead, is more of a .” — Monitoring is the act of collecting data observability property of a system Morgan Willis According to this explanation, tools such as or X-Ray can be viewed as monitoring or tracing tools. They allow us to collect logs and metrics about our system and send alerts about errors and incidents. that will help us assess the health of our system and how its different components work together. Once we establish monitoring that continuously collects logs, system outputs, metrics, and traces, our . CloudWatch Therefore, monitoring is an active part of collecting data system becomes observable As a data engineer, I like to think of monitoring as the data ingestion part of ETL (extract, transform, load). Meaning, you gather data from multiple sources ( ) and put them into a data lake. Once all this data is available, a skilled analyst can gain insights from that data and build beautiful dashboards that tell a story that this data conveys. logs, traces, metrics That’s the observability part — gaining insights from the collected data. And observability platforms such as Dashbird play the role of a skilled analyst . They provide you with visualizations and insights about the health of your system. Monitoring will get you and let you know if there’s a failure, while Observability where and why that failure happened, and what caused it. information about your system grants an easy way of understanding Monitoring is a prerequisite for observability. A system that we don’t monitor is not observable. Monitoring vs Observability examples Monitoring The ultimate purpose of monitoring is to control a system’s health by error logs and system metrics and then leveraging those to alert about incidents. This means: actively collecting tracking and about them as soon as they happen, tracking about or to later observe whether specific compute resources are healthy or not,reacting to and through alerting, alarms, and notifications. errors alerting metrics CPU utilization network traffic outages security incidents Even though monitoring is an active process, AWS takes care of that automatically when we use CloudWatch or X-Ray. Observability The purpose of observability is to . Examples: use the system’s outputs to gather insights and act on them identify the across all function or container invocations, percentage of errors identify in microservices by observing traces that show latency between individual function calls and transition between components, bottlenecks identify of when the errors or bottlenecks occur and use the insights to take action in order to prevent such scenarios in the future, patterns measure and assess the of an entire application, performance identify , cold starts identify how much does your application consume, memory identify when and your code runs, how long identify how much are incurred per specific resource, costs identify — ex. specific function invocation that took considerably longer than usual, outliers identify how to one component affect other parts of the system, changes identify and troubleshoot the flowing through our microservices, flow of traffic identify — how many invocations of each function do we see , and how many of them are successful. how the system performs over time per day, per week, or per month Observability of serverless microservices Although serverless microservices offer a in terms of decoupling, reducing dependencies between individual components, and overall faster development cycles, the biggest challenge is to . It’s highly impractical, if not impossible, to track all microservices by manually looking up the logs, metrics, and traces scattered across different cloud services. myriad of benefits ensure that all those small “moving parts” are working well together When looking at AWS, you would have to go to AWS to see the logs, find your Lambda function’s log group, then find the logs you are really interested in. Then, to see the corresponding API traces, you would go either to X-Ray or to CloudTrail and again search across potentially hundreds of components to find the one you want to investigate. As you can see, finding and accessing the logs and traces of every single component is quite time-consuming. Additionally, debugging single parts doesn’t give you the “big-picture” view of how those components work together. To put it simply, you get in your application by with while having a clear that provides clarity for your data. Missing just one of these aspects will leave you at a great disadvantage, chasing your tail trying to figure out what went wrong within your app. It’s to be notified every time something breaks down. observability knitting together monitoring alerting debugging solution not enough Neither is having the insight of knowing when something is about to. You have to be able to pinpoint the issue within your platform efficiently. With a growing architecture of microservices, we need an easier ( ) way to add observability to the serverless ecosystem. automated How is Twitter doing it? Here’s an example of a service we’re all too familiar with – Twitter. As you might imagine a product like and when something it can be or . Imagine having 350 million active users that interact with each other through your system, tweeting, liking, dm-ing, retweeting, and so on. Twitter has a lot of moving parts breaks down difficult to understand why what caused the problem That’s a and if you’ve ever worked on a platform this size you can imagine the kind of effort it would take to figure out why a tweet isn’t posted or a message takes too long to be delivered. lot of information to follow Before they made the switch from a monolithic application to a distributed system, finding out why something doesn’t work was, at times, as simple as . opening an error log file and seeing what went wrong When you have hundreds maybe thousands of small services communicating asynchronously with each other, saying that debugging a simple thing like a tweet not firing would be hard is a complete understatement. They’ve posted a really cool post about their migration to microservices in 2013. Read the post . here With distributed systems (read microservices), especially , having , it’s a requirement that can’t be circumvented by using only alerting or by only looking at logs. You need an environment that provides visibility to a microscopic level in order to have the right information on which to act upon. at scale observability into your platform is more than a necessity Twitter’s observability system is humongous and took years to develop into the well-oiled machine it is today. “The Observability Engineering team at Twitter provides full-stack libraries and multiple services to our internal engineering teams to monitor service health, alert on issues, support root cause investigation by providing distributed systems call traces, and support diagnosis by creating a searchable index of aggregated application/system logs.” – Anthony Asta in Observability in Twitter part I requests per minute, stores 4.5 petabytes of time series data, and handles 25,000 query requests per minute Our time series metric ingestion service handles more than 2.8 billion write help? How can a serverless observability platform Understandably, not all businesses have the resources and time to build their own observability systems. With a 2-minute setup, you can sign up to and add observability to your serverless AWS architecture immediately. Each serverless component in your AWS account, on which you enabled CloudWatch logs and X-Ray or CloudTrail traces, is automatically with those tools. But until you do something with this collected data. Dashbird monitored it’s not yet observable The true benefit of Dashbird is that it any code changes and . It simply , i.e., data for which you already enabled monitoring with AWS-native services designed for that purpose. doesn’t require any effort on your side uses the data that already exists As a serverless observability platform, Dashbird allows you to accomplish all of the points addressed when discussing examples of insights gathered from an observable system: be notified about and as they happen via custom observe the across all invocations and identify potential outliers,find out how much does your application consume, as well as when and your code runs,identify how much are incurred per specific resource,…and so much more. incidents, cold starts, errors alerting, percentage of errors memory how long costs Dashbird project view — image by the author Wrapping up While monitoring tools allow you to collect application logs as well as metrics about resource utilization and network traffic, or traces of HTTP requests made to specific services, observability is a thereby allowing you to improve your application lifecycle by gathering insights about the underlying system. property of a system that analyzes and visualizes collected data, Furthermore, You have to have it and it’s not a quantifiable attribute, meaning you can’t have some observability or too much of it. You either do or don’t. observability in the serverless space is non-negotiable. Previously published at https://dashbird.io/blog/monitoring-vs-observability/