“Dashboard anything. Observe everything.”
Recently I’ve started working in a project heavy focused on observability and monitoring in which the Prometheus configuration and all the Grafana dashboards and alarms were implemented by someone else.
Basically I was blindfolded working with a black-box in terms of application know-how and monitoring perspective.
Some important concepts before going further:
Prometheus stores data in a time-series data model, basically streams of values associated with a timestamp.
Prometheus scrapes endpoints also known as instances and a collection of instances form a job.
Every time-series is uniquely identified by its metric name and optional key-value pairs called labels.
#<metric name>{<label name>=<label value>, ...}
kube_node_labels{cluster="aws-01", label_kubernetes_io_role="master"}
Grafana supports many different storage backends for your time-series data (data source). We will focus on Prometheus.
Get all labels
As I said we know close to zero things about our metrics, therefore some probing (in terms of metrics and labels) is required. For this we can create a new ephemeral dashboard, and tinker with query type variable.
So let’s say we want to get all the labels…kind of greedy.
We have the following options:
label_names() Returns a list of label names.
label_values(label) Returns a list of label values for the label in every metric.
label_values(metric, label) Returns a list of label values for the label in the specified metric.
metrics(metric) Returns a list of metrics matching the specified metric regex.
query_result(query) Returns a list of Prometheus query result for the query.
Keep in mind that label
function doesn’t support queries and isn't a part of Prometheus functions, but a part of Grafana templating (that’s why we created a new dashboard).
We can get all the labels also using Prometheus API endpoint /api/v1/labels:
Extract all the values for a particular label
Let’s assume that we want all the values for dockerVersion label we can do this using Grafana label_values
function or using Prometheus API endpoint /api/v1/label/<label_name>/values.
Get all the jobs
Prometheus API exposes the following endpoint /api/v1/label/job/values in order to gather all the jobs scraped by that particular instance.
Quick check if the instances are healthy/reachable - automatically generated labels and time-series
When Prometheus scrapes a target, it attaches some labels automatically to the scraped time series which serve to identify the scraped target.
# up time series is useful for instance availability monitoring
up{job="<job-name>", instance="<instance-id>"}
Going further we also can check all the labels for a particular job using the up
metric.
Grafana and Prometheus are frequently used in organizations, and they’re such a useful monitor-stack because despite the fact that at a certain point we might not know all the implementation details concerning metric collection, they allow us to cast some light using the aforementioned tips.
Grafana |
Prometheus API |
---|---|
label_names() | |
label_values(<label_name>) |
|
label_values(<metric_name>,<label_name>) |
curl -s http://127.0.0.1:9091/api/v1/label/<label_name>/values |
up{job="<job_name>"} |