If you use both Consul and Datadog, you’ll want this.
I’ve written a small program that maintains a list of Consul service counts in your datacenter(s) called consul2dogstats. It periodically polls the Consul service catalog and health endpoints to collect an inventory, then publishes the results to Datadog. It’s Apache 2.0-licensed.
One of Datadog’s most powerful features is dimensional tagging. Briefly, dimensional tags are key-value pairs you can attach to time-series metrics for the purpose of querying and aggregation.
Suppose you wanted to track the request rates of the web servers in your organization. Let’s call the name of the metric
http_requests. Some useful tags we could apply would be:
In Datadog, you would assign a dimension key to each of those. In the examples above, those might be
Once you start tagging metrics, aggregating and filtering them is a piece of cake. You can perform powerful queries to answer questions in real time such as:
5xxresponses per minute am I serving across all of production?
2xxresponses am I sending per minute in my production environment for the
This is a tremendous leap in functionality and flexibility over older solutions that have no dimension other than the server host (which itself is a decreasingly useful dimension in cloud environments, since hosts are often ephemeral now and their canonical names meaningless).
Consider a traditional dot-separated form of metric names such as
requests.test.gatekeeper.4xx. These pose some significant challenges for analysts.
First, selecting the metrics to query is an O(n) operation because the metric name is opaque (the system has no semantic awareness of the naming structure you choose). If you’re collecting a lot of metrics (and you should be!), that’s a very expensive operation.
Second, adding new dimensions is a management nightmare, because you have to create a new metric if you want to add a dimension. Suppose we wanted to add a new dimension like
shard to our metrics above. In a system that supports dimension tagging out of the box, that’s no problem: any previous graphs will continue to render as before. But if you’re using flat metrics, you’ll start creating new metrics like
requests.tests.gatekeeper.shard1.4xx. The history of the previous metric data will end once you replace the old metrics with new ones, which creates a discontinuity that will make it more challenging to perform temporal analysis. (You can, of course, continue to submit metrics under the old name in parallel with the new, more granular metrics. But you’ll be double-counting, which could cause analytical errors, and submitting more metrics to the system than necessary, which could cause capacity issues.)
The good news is that even if you prefer managing metrics yourself over using a SaaS provider such as Datadog, there are now open-source metrics systems that support dimensional tagging. There’s no need to suffer with flat metrics anymore.
Typically, operational metrics systems are intended to show the real-time state and health of your systems and applications. In order to maintain good performance, it’s important not to make metric dimensions too fine-grained. A good rule of thumb is to ensure that the possible values for a particular key are fewer than the number of servers you have. So tagging metrics by environment, hostname, or function is good. But tagging them by customer or user ID is a very bad idea; it will create so many tables on the backend that performance will suffer terribly.
Put bluntly: Don’t use your operational metrics system as your analytics system. There are lots of great analytics databases and providers out there, including Vertica, Redshift, Clickhouse, and BigQuery.
Datadog was the first platform, commercial or otherwise, that offered a compelling product that supported dimensional metric tagging. But there are lots more options out there now. Check them out and discover the one that’s best for you: