Running systems in production involve requirements for high availability, resilience and recovery from failure. When running cloud-native applications this becomes even more critical, as the base assumption in such environments is that compute nodes will suffer outages, Kubernetes nodes will go down and microservices instances are likely to fail, yet the service is expected to remain up and running. In a recent post, I presented the different Jaeger components and best practices for . In that post, I mentioned that Jaeger uses external services for ingesting and persisting the span data, such as Elasticsearch, Cassandra and Kafka. This is due to the fact that the Jaeger Collector is a stateless service and you need to point it to some sort of storage to which it will forward the span data. deploying Jaeger in production In this post, I’d like to discuss how to ingest and persist Jaeger trace data in production to ensure resilience and high availability, and the external services you need to set up for that. I’ll cover: Standard persistent storage for Jaeger with Elasticsearch and Cassandra Alternative persistent storage with gRPC plugin Handling high load tracing data streams with Kafka Jaeger persistence during development with jaegertracing all-in-one Deploying Jaeger with Elasticsearch, Kafka or other External Services Jaeger deployments may involve additional services such as Elasticsearch, Cassandra and Kafka. But do these services come as part of Jaeger’s installation and how are these services deployed? The Jaeger Operator and Jaeger’s Helm chart (see Jaeger’s deployment tools on ) offer the option of a self-provisioned Elasticsearch/Cassandra/Kafka cluster (in which Jaeger deployment also deploys these clusters), as well as the option of connecting to an existing cluster. this post The self-provisioned option offers a good starting point, but you may prefer to deploy these services independently for better flexibility and control over the way these clusters are deployed, managed, monitored, upgraded and secured, in accordance with your team’s DevOps practices. In particular, if you are already running a Kafka or Elasticsearch cluster, it may make more sense to re-use these infrastructure components rather than maintain a separate cluster. Elasticsearch vs. Cassandra as Jaeger Backend Storage For production deployments, Jaeger currently provides built-in support for two storage solutions, both of which are very popular open-source NoSQL databases: Elasticsearch and Cassandra. The Jaeger collector and query service need to be configured with the storage solution of choice so they can write to it and query it. You can pass the desired storage type and the database endpoint via environment variables. For example, a basic Elasticsearch setup will define the following environment variables: =elasticsearch =&lt;...&gt; SPAN_STORAGE_TYPE ES_SERVER_URLS Caption: Illustration of direct-to-storage architecture. Source: jaegertracing.io So which storage backend should you use: Elasticsearch or Cassandra? The Jaeger team provides a clear recommendation to use Elasticsearch as the storage backend over Cassandra. And they have very good : reasons Cassandra is a key-value database, so it is more efficient for retrieving traces by trace ID, but it does not provide the same powerful search capabilities as Elasticsearch. Effectively, the Jaeger backend implements the search functionality on the client-side, on top of k-v storage, which is limited and may produce inconsistent results (see issue-166 for more details). Elasticsearch does not suffer from these issues, resulting in better usability. Elasticsearch can also be queried directly, e.g. from Kibana dashboards, and provide useful analytics and aggregations. Based on past performance experiments we observed single writes to be much faster in Cassandra than Elasticsearch, which might suggest that it may sustain higher write throughput. However, because the Jaeger backend needs to implement search capability on top of k-v storage, writing spans to Cassandra is actually subject to large write amplification: in addition to writing a record for the span itself, Jaeger performs extra writes for service name and operation name indexing, as well as extra index, writes for every tag. In contrast, saving a span to Elasticsearch is a single write, and all indexing takes place inside the ES node. As a result, the overall throughput to Cassandra is comparable with Elasticsearch. One benefit of Cassandra backend is simplified maintenance due to its native support for data TTL. In Elasticsearch the data expiration is managed through index rotation, which requires additional setup (see ). Elasticsearch Rollover Alternative Persistent Storage for Jaeger In addition to Jaeger’s built-in support for Elasticsearch and Cassandra, Jaeger supports a ( ) which enables developing custom plugins to other storage types. The Jaeger community currently offers integrations with about a dozen persistent storage types, four of which are defined as ‘available’ at present: ScyllaDB, InfluxDB, Couchbase and (disclaimer: I work at Logz.io). gRPC plugin SPAN_STORAGE_TYPE=grpc-plugin Logz.io Other integrations, which are not yet available, include NoSQL data stores from the big cloud vendors such as Amazon DynamoDB, Azure CosmosDB and Google BigTable, as well as popular SQL databases MySQL and PostgreSQL. You can check out the list of additional storage backends and updated status on this Jaeger GitHub . issue Using Kafka to Ingest High-Load Jaeger Span Data If you monitor many microservices, if you have a high volume of span data, or if your system generates data bursts on occasions, then your external backend storage may not be able to handle the load and may become a bottleneck, impacting the overall performance. In such cases you should employ the streaming deployment strategy that I mentioned in the previous post which uses Kafka between the Collector and the storage to buffer the span data from the Jaeger Collector. Caption: Illustration of architecture with Kafka as intermediate buffer. Source: jaegertracing.io In this case, you configure Kafka as the target for Jaeger Collector ( ) as well as the relevant Kafka brokers, topic and other parameters. SPAN_STORAGE_TYPE=kafka I’d like to stress that Kafka is not an alternative backend storage (although the setting may be confusing). Your Jaeger backend still needs a backend storage as described in the previous sections, with Kafka serving as a buffer to take off the pressure. SPAN_STORAGE_TYPE=kafka To support the streaming deployment Jaeger project also offers the Jaeger Ingester service, which can asynchronously read from Kafka topic and write to the storage backend (Elasticsearch or Cassandra). Of course, you can choose to implement your own service to do the same, if you need a particular target storage or ingestion strategy. Jaeger Persistence During Development with JaegerTracing All-in-One Until now I've discussed production deployment. However, if you are exploring Jaeger or are doing a small PoC or development, then you are probably using Jaeger’s installation, and you may be wondering how this is applicable to you. All-in-One All-in-one is a single node installation, in which you don’t trouble yourself with non-functional requirements such as resilience or scalability. In an all-in-one deployment, Jaeger uses in-memory persistence by default. Alternatively, you can choose to use Badger, which provides an ephemeral storage based on a temporary filesystem. You can find more details on using Badger . here Bear in mind that both in-memory and Badger are meant for all-in-one deployments only, and are not suitable for production deployments. Endnote When deploying , you need to address data persistence, high availability and scalability concerns. In order to address these concerns, you need to deploy additional services. Jaeger in production First of all, you should deploy and configure an external persistence storage for your span data. The recommended persistence storage for Jaeger in production is Elasticsearch. Secondly, when dealing with a high load of span data, you should deploy Kafka in front of the storage to handle the ingestion and provide backpressure. Running in production entails many other considerations not covered in this post, such as upgrades to Jaeger components as well as Elasticsearch, Kafka or any additional service in the deployment; monitoring the different services, and securing access to these services. There’s another option: using Jaeger as a managed service so that you can leverage the best open source for distributed tracing without having to deal with its deployment and maintenance overhead. We at Logz.io did that with Log Analytics, taking the and offering it as a fully managed service, and then with open source grafana for infrastructure monitoring. Now we offer that with Jaeger, which comes with alerting and logs-traces correlation for full observability. Join our and try it out. ELK Stack Beta program Previously published at https://logz.io/blog/jaeger-persistence/

Amazon

Google

Target

Trace

Modern Day Challenges to Monitoring Microservices

A Guide to Deploying Jaeger on Kubernetes in Production

Check Out My Blog

Read My Stories

Check Out My Podcast

Too Long; Didn't Read

Jaeger Persistent Storage with Elasticsearch, Cassandra and Kafka

Jaeger Persistent Storage with Elasticsearch, Cassandra and Kafka

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

A Guide to Deploying Jaeger on Kubernetes in Production

2019 Database Trends – SQL vs. NoSQL, Top Databases, Single vs. Multiple Database Use

A Comprehensive Guide to Apache Cassandra Architecture

ACID Transactions Are Coming To Apache Cassandra: Here's Why We're Excited

Answering Apache Cassandra FAQs

Apache Cassandra – An Essentials Guide

A Guide to Deploying Jaeger on Kubernetes in Production

2019 Database Trends – SQL vs. NoSQL, Top Databases, Single vs. Multiple Database Use

A Comprehensive Guide to Apache Cassandra Architecture

ACID Transactions Are Coming To Apache Cassandra: Here's Why We're Excited

Answering Apache Cassandra FAQs

Apache Cassandra – An Essentials Guide

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps