Principal Software Architect, Microservices & Cloud Computing enthusiast, Hands-on Java Developer
You are reading this content, which means that you are not novice to the microservices field. So let me just scratch the surface of it before moving to Observable Microservices. Once upon a time Monolith Application was now transformed into Microservices based application.
Microservices ain’t easy, but it’s necessary. Distributed systems are pathologically unpredictable. Some things actually become more difficult. An obvious area where it adds complexity is communications between services.
A primary microservices challenge is trying to understand how individual pieces of the overall system are interacting. A single transaction can flow through many independently deployed microservices, or pods, and discovering where performance bottlenecks have occurred provides valuable information. Walking through a typical flow, we can quickly get a sense of this complexity.
When the team tries to fix and to understand the root cause of the issue, wait, where do they start with? What do they search for? Yes, they look into the logs, the de facto choice of tools to debug in production environment.
Immutable discrete timestamped event, what happened, at what time, what has been requested and what has been sent etc…
If there are 100s or 1000s of lines, looking at them manually would be fair enough. Remember the fact that we are not dealing with a development environment instead production environment logs where millions of events are recorded. So manual scanning is impossible, isn’t it?
Observability is not a new term. It has a long history stemming from engineering and control theory
Not so clear, uh, let us look at a few examples, where they are coded for observability.
It is a reverse proxy, distributing application traffic across a number of application servers.
What it also does is, routinely monitoring the application server instance's health. So that if an instance is down, it avoids sending the requests to the failed instance. Once the failed instance resumes, load balancer targets the instance. Load balancer uses health check pings to measure the health of the instances. All right, we know this. What is observability here?
Load balancer does not know anything about the internals of the application instead it knows the state of the system, meaning the health of the instances, with the help of the external outputs, that is the health check pings.
So, load balancer is enabled with observable code, agree?
Auto-Scaling helps us ensure that the right number of instances are available to handle the application traffic. It can launch or terminate instances based on the traffic.
In the below example, the scaling policy is configured in such a way that if the load crosses 65%, launch the new instances as configured. When it is 33%, it does nothing. When the utilisation crosses 65%, it scales out, launching the new instances.
Again, we all know this. But what is observability here?
It constantly observes the utilisation by monitoring. And scales out & in based on the scaling policy configured. So, auto-scaling is enabled with observable code.
Observability might mean different things to different people. So, is Observability the New Monitoring? Like any IT trend, it is difficult to perceive, as many conclude without analysing much. For some, it’s the old wine of monitoring in a new bottle.
But observability is not APM - Application performance Monitoring
So, what is observability? - Logs, Metrics & Traces are often known as the three pillars of observability.There are many powerful tools in the open source and commercial markets like ELK, Prometheus, Zipkin etc…
Plainly having these tools configured does not mean that our application is observable. They generate a myriad of events and logs. What needs to be observed? So that our application is resilient.
So, in short, observability is not a panacea, but is the ability of the usage of the inferred data collected from these tools.
So, how to use the inferred data?
Resilience4j is a lightweight fault tolerance library inspired by Netflix Hystrix. I like its lightweight and modular structure where I can pull in specific modules for specific capabilities such as circuit-breaking, rate-limiting, retry, and bulkhead and coded the observable microservices in our organisation.
The goal of observable microservices is not to collect logs, not to collect traces & metrics. It is to build a culture of engineering based on facts and feedback.
Observability is about being data driven especially during debugging and there by it helps the SRE/ developer team with simplified monitoring in place.
Previously published at https://www.linkedin.com/pulse/microserviceaddobservability-kayalvizhi-kandasamy