Observability Tips and Tricks For Using Grafana and Prometheus by@dejanualex

Observability Tips and Tricks For Using Grafana and Prometheus

image
dejanualex HackerNoon profile picture

dejanualex

seasoned devops && open source enthusiast && @GitKraken ambassador

“Dashboard anything. Observe everything.”

Overview

Recently I’ve started working in a project heavy focused on observability and monitoring in which the Prometheus configuration and all the Grafana dashboards and alarms were implemented by someone else.

Basically I was blindfolded working with a black-box in terms of application know-how and monitoring perspective.

Foundation

Some important concepts before going further:

  • Prometheus stores data in a time-series data model, basically streams of values associated with a timestamp.

  • Prometheus scrapes endpoints also known as instances and a collection of instances form a job.

  • Every time-series is uniquely identified by its metric name and optional key-value pairs called labels.

    #<metric name>{<label name>=<label value>, ...}
    kube_node_labels{cluster="aws-01", label_kubernetes_io_role="master"}
    
  • Grafana supports many different storage backends for your time-series data (data source). We will focus on Prometheus.

    image

Start exploring

  • Get all labels

    As I said we know close to zero things about our metrics, therefore some probing (in terms of metrics and labels) is required. For this we can create a new ephemeral dashboard, and tinker with query type variable.

    So let’s say we want to get all the labels…kind of greedy.

    image

    We have the following options:

    label_names()	Returns a list of label names.
    label_values(label)	Returns a list of label values for the label in every metric.
    label_values(metric, label)	Returns a list of label values for the label in the specified metric.
    metrics(metric)	Returns a list of metrics matching the specified metric regex.
    query_result(query)	Returns a list of Prometheus query result for the query.
    

Keep in mind that label function doesn’t support queries and isn't a part of Prometheus functions, but a part of Grafana templating (that’s why we created a new dashboard).

We can get all the labels also using Prometheus API endpoint /api/v1/labels:

image

  • Extract all the values for a particular label

    Let’s assume that we want all the values for dockerVersion label we can do this using Grafana label_values function or using Prometheus API endpoint /api/v1/label/<label_name>/values.

image

  • Get all the jobs

    Prometheus API exposes the following endpoint /api/v1/label/job/values in order to gather all the jobs scraped by that particular instance.

    image

  • Quick check if the instances are healthy/reachable - automatically generated labels and time-series

    When Prometheus scrapes a target, it attaches some labels automatically to the scraped time series which serve to identify the scraped target.

    # up time series is useful for instance availability monitoring
    up{job="<job-name>", instance="<instance-id>"}
    

    Going further we also can check all the labels for a particular job using the up metric.

image

Conclusions

Grafana and Prometheus are frequently used in organizations, and they’re such a useful monitor-stack because despite the fact that at a certain point we might not know all the implementation details concerning metric collection, they allow us to cast some light using the aforementioned tips.

Grafana

Prometheus API

label_names()

curl -s http://127.0.0.1:9091/api/v1/labels

label_values(<label_name>)

label_values(<metric_name>,<label_name>)

curl -s http://127.0.0.1:9091/api/v1/label/<label_name>/values

up{job="<job_name>"}

curl -s http://127.0.0.1:9091/api/v1/label/job/values

Tags