Unleashing VM histograms for Ruby: Migrating from Prometheus to VictoriaMetrics with vm-client

Introduction

This article describes the codebase migration from Prometheus to VictoriaMetrics, unlocking the advantages of VM histograms. We’ll create a Rails application with Prometheus, then migrate to VictoriaMetrics using the vm-client gem, a drop-in replacement for prometheus-client that implements VictoriaMetrics histograms.

The complete code for the final application is available on GitHub.

Why migrate from Prometheus to VictoriaMetrics?

For an in-depth look at the comprehensive capabilities of VictoriaMetrics, you can explore their full list of features. But for now, let’s briefly have a look at some of them:

Performance at scale

VictoriaMetrics often demonstrates superior performance, particularly in high-load environments. It is designed to handle large volumes of metrics without significant resource consumption, making it an attractive option for scaling applications.
Efficient storage

VictoriaMetrics utilizes a unique storage format that can lead to better compression of time-series data, resulting in cost savings on storage space. Up to 7x less storage space is required compared to Prometheus.
Advanced Querying

VictoriaMetrics supports PromQL (Prometheus Query Language), and extends it with additional functions and features that provide enhanced querying capabilities.
Long-Term Storage

It's designed for effective long-term storage of time-series data, which is beneficial for historical analysis and compliance with data retention policies.
High Cardinality Handling

It uses up to 7x less RAM than Prometheus when dealing with millions of unique time series (aka high cardinality)
Seamless integration with Prometheus clients

VictoriaMetrics is compatible with Prometheus scraping and pushing formats, and also MetricsQL, VictoriaMetrics' query language, is backward-compatible with PromQL, ensuring a smooth transition without sacrificing functionality.
VictoriaMetrics histograms

In the following section, we will dive into the specifics of VictoriaMetrics histograms.

VictoriaMetrics histograms

VictoriaMetrics (VM) histograms are a data structure designed to efficiently summarize the distribution of measurement values over a set period of time, capturing the essence of performance metrics like request durations or response sizes. Unlike traditional Prometheus histograms, VM histograms are dynamic and can adjust their precision and range on the fly, which is critical for performance monitoring.

The benefits of VM histograms include:

Adaptability: They automatically adjust buckets and ranges to best represent the observed data, ensuring that histograms remain relevant even as the patterns of data change.
Efficient Storage: VM histograms are optimized for storage efficiency, allowing for large volumes of data to be stored without consuming excessive disk space.
Query Performance: The structure of VM histograms is designed to enhance query performance, enabling faster retrieval and computation of metrics, which is crucial for real-time monitoring and alerting.
Cost-effective Scaling: Their efficiency and precision make scaling more cost-effective because they can handle more data with fewer resources, which is essential for growing applications.

And last but not least - since buckets adjust dynamically, developers are relieved from manually defining and updating bucket boundaries. This automatic bucket management significantly simplifies the configuration and maintenance of histograms.

In essence, VM histograms offer a way to gain more detailed and accurate insights from monitoring data with greater efficiency and at a potentially lower cost than traditional methods.

VictoriaMetrics histogram internals

Buckets for VM histograms are created on demand and cover values in the following range: [10-⁹…10¹⁸].

This includes:

Times from nanoseconds to billions of years.
Sizes from 0 bytes to 2⁶⁰ bytes.

The Histogram splits each (10^n...10^(n+1)] range into 18 log-based buckets with 10^(1/18)=1.136 multiplier:

(1.0*10^n…1.136*10^n], (1.136*10^n…1.292*10^n], … (8.799*10^n…1.0*10^(n+1)]

This gives a 13.6% worst-case precision error, which is enough for most practical cases.

Example rails app

Let’s start with a new rails app:

rails new rails-with-vm-client --api --database=postgresql
cd rails-with-vm-client
rails g scaffold Post title:string body:text

Now, let’s add docker-compose.yml with our application and Postgres:

version: '3.8'

services:
  web:
    build: .
    command: bash -c "rm -f tmp/pids/server.pid && bundle exec rails s -p 3000 -b '0.0.0.0'"
    volumes:
      - .:/rails
    depends_on:
      - "postgres"
    ports:
      - "3000:3000"
    environment:
      RAILS_ENV: "development"
  postgres:
    image: postgres:13.3-alpine
    volumes:
      - postgres:/var/lib/postgresql/data
      - ./log:/root/log:cached
    ports:
      - '5432:5432'
    environment:
      - POSTGRES_PASSWORD=postgres
    healthcheck:
      test: pg_isready -U postgres -h 127.0.0.1
      interval: 5s

volumes:
  postgres:

Update database.yml to use our db from compose:

default: &default
  encoding: unicode
  pool: <%= ENV.fetch('RAILS_MAX_THREADS') { 10 } %>
  adapter: postgresql
  url: postgres://postgres:postgres@postgres:5432

development:
  <<: *default
  database: metrics_development

test:
  <<: *default
  database: metrics_test

production:
  <<: *default
  database: metrics_production

Start db and app with docker compose up , then create and migrate db with docker compose exec web bin/rails db:create db:migrate

And now let’s add prometheus-client to Gemfile:

gem "prometheus-client"

Rebuild our image with a docker-compose build to include a new gem.

Then, I created a simple Metrics class for collecting metrics:

# app/lib/metrics.rb

class Metrics
  class << self
    def increment(name, by: 1, labels: {})
      find_or_create_metric(:counter, name, labels: labels).increment(by: by, labels: labels)
    end

    def measure(name, value, labels: {})
      find_or_create_metric(:histogram, name, labels: labels).observe(value, labels: labels)
    end

    private

    def find_or_create_metric(method, name, labels:)
      metric = Prometheus::Client.registry.get(name)
      return metric if metric

      Prometheus::Client.registry.method(method).call(name, docstring: 'metric', labels: labels.keys)
    end
  end
end

By default, these buckets will be used in Prometheus-client: [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10]

Now lets add Prometheus Exporter in config.ru to expose our metrics on /metrics:

# frozen_string_literal: true

require_relative 'config/environment'
require 'prometheus/middleware/exporter'

use Prometheus::Middleware::Exporter

run Rails.application

We will be collecting metrics from controller using ActiveSupport::Notifications:

# config/initializers/subscribers/controller.rb

# frozen_string_literal: true

ActiveSupport::Notifications.subscribe "process_action.action_controller" do |*args|
  event = ActiveSupport::Notifications::Event.new(*args)
  format = event.payload[:format].to_s || "all"
  format = "all" if format == "*/*"

  labels = {
    controller: event.payload[:controller],
    action: event.payload[:action],
    format: format,
    status: event.payload[:status],
    method: event.payload[:method].to_s.downcase,
    exception: event.payload[:exception]&.first # Exception class
  }

  duration_in_seconds = event.duration / 1000.0

  Metrics.increment(:rails_request_total, labels: labels)
  Metrics.measure(:rails_request_time, duration_in_seconds, labels: labels)
end

On every API call we increment rails_request_total counter and measure call duration in rails_request_time using histograms.

Let’s create some Posts from the rails console(docker compose exec web bin/rails console):

100.times { Post.create!(title: "Anything", body: "some long body") }

Now start our application with docker-compose up and call our API a few times from the terminal:

curl http://0.0.0.0:3000/posts
curl http://0.0.0.0:3000/posts
curl http://0.0.0.0:3000/posts

Now, let’s go to http://0.0.0.0:3000/metrics in our browser to check what metrics we have so far:

Great, by now, we have added a simple API monitoring using Prometheus client and original histograms.

Now it’s time to migrate to VictoriaMetrics and vm-client.

About vm-client

vm-client has a pretty informative readme, but in a few words it’s a fork of Prometheus-client with implementation of VM histograms and some other minor features for VictoriaMetrics.

It can be used as a drop-in replacement for prometheus-client since it only adds new features without breaking the original ones.

Migrate to VictoriaMetrics

Let’s start with migrating from prometheus-client to vm-client. In our Gemfile remove prometheus-client and add vm-client instead:

gem "vm-client"

Don’t forget to run bundle install and rebuild our docker image with docker compose build to include a new gem.

Now, we only need to start using VM histograms instead of the original ones. Let’s update our Metrics class:

# app/lib/metrics.rb

class Metrics
  class << self
    def increment(name, by: 1, labels: {})
      find_or_create_metric(:counter, name, labels: labels).increment(by: by, labels: labels)
    end

    def measure(name, value, labels: {})
      find_or_create_metric(:vm_histogram, name, labels: labels).observe(value, labels: labels)
    end

    private

    def find_or_create_metric(method, name, labels:)
      metric = Prometheus::Client.registry.get(name)
      return metric if metric

      Prometheus::Client.registry.method(method).call(name, docstring: 'metric', labels: labels.keys)
    end
  end
end

We changed only histogram to vm_histogram, now it will be using Prometheus::Client::VmHistogram.

If you use directly Prometheus::Client::Histogram in your project - just replace it with Prometheus::Client::VmHistogram. VmHistogram does not need buckets to be passed, but will accept it and ignore it for compatibility.

Let’s start our app with docker-compose up and call our API a few times again:

curl http://0.0.0.0:3000/posts
curl http://0.0.0.0:3000/posts
curl http://0.0.0.0:3000/posts

And check what metrics we have on http://0.0.0.0:3000/metrics:

Great, now it uses Vm histograms with dynamic buckets.

Let’s add VictoriaMetrics and Grafana to our docker-compose.yml and see what we can do with vm histograms:

version: '3.8'

services:
  web:
    build: .
    command: bash -c "rm -f tmp/pids/server.pid && bundle exec rails s -p 3000 -b '0.0.0.0'"
    volumes:
      - .:/rails
    depends_on:
      - "postgres"
    ports:
      - "3000:3000"
    environment:
      RAILS_ENV: "development"
    networks:
      - vm_net
  postgres:
    image: postgres:13.3-alpine
    volumes:
      - postgres:/var/lib/postgresql/data
      - ./log:/root/log:cached
    ports:
      - '5432:5432'
    environment:
      - POSTGRES_PASSWORD=postgres
    healthcheck:
      test: pg_isready -U postgres -h 127.0.0.1
      interval: 5s
    networks:
      - vm_net
  vmagent:
    image: victoriametrics/vmagent:v1.94.0
    depends_on:
      - "victoriametrics"
      - "web"
    ports:
      - 8429:8429
    volumes:
      - vmagent_data:/vmagentdata
      - ./scrape_config.yml:/etc/prometheus/prometheus.yml
    command:
      - "--promscrape.config=/etc/prometheus/prometheus.yml"
      - "--remoteWrite.url=http://victoriametrics:8428/api/v1/write"
    networks:
      - vm_net
    restart: always
  victoriametrics:
    image: victoriametrics/victoria-metrics:v1.94.0
    ports:
      - 8428:8428
      - 8089:8089
      - 8089:8089/udp
      - 4242:4242
    volumes:
      - vm_data:/storage
    command:
      - "--storageDataPath=/storage"
      - "--opentsdbListenAddr=:4242"
      - "-logNewSeries"
    networks:
      - vm_net
    restart: always
  grafana:
    image: grafana/grafana:10.1.5
    depends_on:
      - "victoriametrics"
    ports:
      - 3001:3000
    volumes:
      - grafana_data:/var/lib/grafana
      - ./provisioning/:/etc/grafana/provisioning/
    networks:
      - vm_net
    restart: always
volumes:
  postgres:
  vmagent_data: {}
  vm_data: {}
  grafana_data: {}
networks:
  vm_net:

Now let’s add scrape_config.yml at the root:

global:
  scrape_interval: 10s

scrape_configs:
  - job_name: 'vmagent'
    static_configs:
      - targets: ['vmagent:8429']
  - job_name: 'victoriametrics'
    static_configs:
      - targets: ['victoriametrics:8428']
  - job_name: 'rails'
    static_configs:
      - targets: ['web:3000']

It will scrape vmagent and victoriametrics for its internal metrics, which are pretty helpful for monitoring. Also, it will scrape our app on /metrics

And here is provisioning for grafana(provisioning/datasources/datasource.yml):

apiVersion: 1

datasources:
    - name: VictoriaMetrics
      type: prometheus
      access: proxy
      url: http://victoriametrics:8428
      isDefault: true

Now, let’s add a simple rake task to generate some metrics:

# lib/tasks/load.rake
# frozen_string_literal: true

require 'net/http'

namespace :load do
  desc 'Requests our api in a loop'
  task generator: :environment do
    puts 'starting requesting api'

    10000.times do |i|
      Net::HTTP.get(URI('http://0.0.0.0:3000/posts'))
      Net::HTTP.get(URI('http://0.0.0.0:3000/posts'))

      # create a new one
      Net::HTTP.post URI('http://0.0.0.0:3000/posts'),
               { post: { title: "Some Title", body: "Some body"}}.to_json,
               "Content-Type" => "application/json"
    end
  end
end

Start our app with docker compose up and run our rake task with docker compose exec web bundle exec rake load:generator. Let it run for a few minutes to get more realistic metrics.

And let’s go to Grafana on http://0.0.0.0:3001/explore. The default user and password is admin.

Here let’s choose last 5 minutes in the time range and show the 0.99 quantiles of request duration grouping by action:

Works perfectly; now we can use the full power of MetricsQL and dynamic buckets. In this article from VictoriaMetrics maintainers, you can have a look at how VM histograms can be used in Grafana.

Conclusion

In summary, moving your Rails app from Prometheus to VictoriaMetrics using the vm-client gem is a pretty easy way to get better monitoring and save money on the monitoring system.

You’ll keep everything you like about Prometheus but also benefit from VM histograms, reduced resource usage, and the advanced functionalities of MetricsQL, all of which contribute to a more efficient and scalable monitoring system