Logs, metrics, and traces are the three pillars of the Observability world. The distributed tracing world, in particular, has seen a lot of innovation in recent months, with OpenTelemetry standardization and with Jaeger open source project graduating from the CNCF incubation.
According to the recent DevOps Pulse report, Jaeger is used by over 30% of those practicing distributed tracing. Many companies realize the need for distributed tracing to gain better observability into their systems and troubleshoot performance issues, especially when dealing with elaborate microservices architectures.
Starting with Jaeger, the first step is to instrument your code to send traces to Jaeger. The second part is setting up the Jaeger backend to collect, process and visualize your traces. In this post I’ll go over what it takes to deploy and manage Jaeger backend in production. I’ll cover:
When deploying Jaeger Tracing, you’ll need to address the following components:
There are obviously many more details on each component and other optional Jaeger components, but I’ll keep it simple for the sake of this discussion. Let’s see how to deploy the agent, collector and query components in various setups and strategies.
Depending on your deployment strategy (see below), Jaeger may make use of other (non-Jaeger) components, primarily a persistent backend storage (Elasticsearch, Cassandra or others) and a streaming ingestion queue (Kafka). These services are typically deployed independently and you’ll just need to point Jaeger to the relevant endpoints, though you can also have Jaeger self-provision them.
You may want to deploy Jaeger on many different systems ranging from from your own laptop for development purposes to large scale and high load production environments. Here are some useful deployment strategies you can use:
The all-in-one setup is easy to start with, and comes with an executable bundle to launch. If you want to start experimenting with it, check out this tutorial on how to do it in conjunction with Elasticsearch backend as well as Kibana which you can use for extra visualization.
For the rest of this post, I’ll focus on the deployment for production and the considerations and options involved in this process.
The agent needs to reside with any instance of your application. If you run an elaborate microservices architecture, with multiple agents needed, you may find yourself wondering if you can avoid the agent.
The short answer to that would be: don’t go agentless.
The longer answer is that technically you can make your Jaeger client libraries send the span data directly to the Collector, but you would need to handle various aspects yourself such as the lookup of the Collector, traffic control and tagging the spans with additional metadata based on local system information.
While using Jaeger Agent is the recommended deployment, there are scenarios in which you cannot deploy an agent. For example, if your application runs as AWS lambda function or similar serverless frameworks where you cannot control pod deployment and Agent co-location. Agent will also be ineffective if you use Zipkin instrumentation. In such cases, the spans should be submitted directly to the Jaeger Collector.
The agent needs to reside together with your application, so the Jaeger client libraries can access it on localhost and send it the data over UDP without the risk of data loss due to network hiccups (unlike TCP, UDP transmission protocol doesn’t include data loss protection, but is therefore faster and more economical).
The ways to achieve co-location in Kubernetes environments are either as a sidecar or as a daemonset. Let’s look at the options:
Jaeger Agent as a DaemonSet
Installing the agent as a deamonset is the simplest and most economical option. This will provide one Agent instance on the node, serving all the pods on that node.
This strategy may, however, prove too simple for production environments that involve multi-tenancy, security segregation requirements or multiple instances of Jaeger for different applications. If this is your case, consider deploying as a sidecar (below).
Jaeger Agent as a Sidecar
The sidecar option means the agent will run as an additional container with each pod. This setup can support a multi-tenant environment, where each tenant has its respective Jaeger Collector, and each agent can be configured to ship to its relevant Collector.
You can also get more control over memory allocation, which can prevent cannibalization by specific tenants. Security configuration is simpler as well when running in the same pod. The sidecar approach naturally comes with the overhead of the additional containers. Some installation tools can auto-inject the agent sidecar and simplify management.
Now that we know which components we should deploy, as well as the strategy, let’s see which tools can help us put this plan into action:
Until now we’ve talked about Jaeger spans ingestion. But there are quite a few systems with Zipkin instrumentation out there, so it’s worth noting that Jaeger can accept spans in Zipkin formats as well, namely Thrift, JSON v1/v2 and Protobuf.
If your Jaeger backend deployment is meant to ingest Zipkin protocols:
Jaeger is a fairly young project, born in the Kubernetes sphere, with a strong community providing Kubernetes deployment best practices and automation. However, as a young project, the best practices for managing it in production are still shaping up, and it takes some careful consideration to run it in production in a way that suits your organization, while catching up on community updates.
We at Logz.io offer distributed tracing as a service based on Jaeger, as we do with log management based on open source ELK Stack and with metrics based on open source grafana, so you can adopt the leading open source Observability projects without having to operate them yourself.
Previously published at https://thenewstack.io/best-practices-for-deploying-jaeger-on-kubernetes-in-production/