InfluxDB Continuous Downsampling - Optimize Your TSDB Today

Author profile picture

@drevvdrew

Backend Engineer at Very Good Security

Introduction

The security and monitoring of sensitive data is critically important to us, since customers are placing their trust in our product. That’s why we try to monitor everything, resulting in an enormous amount of data points in our time-series DB. So, what do you do when your data exceeds your cloud storage plan and you start to get outages? 
Start implementing a retention policy and downsampling scenario for old data aggregation. Brief overview of our metric capture infrastructure:
  • InfluxDB for storing or metrics in the face of `influxcloud`
  • 20+ metrics 
  • Time precision to milliseconds
  • 20+ mil points per measurement
  • 1 year of historical data
  • Real-time customer-facing service based on this data
     

Traditional approach

We use InfluxDB as our storage solution. It’s the highest-ranked time-series DB on the market, and rightfully so - it offers a very good set of tools to work with and visualize your data.
So you want to go the traditional way, and here InfluxDB has you covered. For this to work, you need to be aware of two things: retention policy and continuous query. 
Retention policy is a part of InfluxDB’s structure, like a scheme in RDB that also stores how long InfluxDB should keep data. A database can consist of several retention policies that contain measurements. By default, at database creation, InfluxDB creates an `autogen` retention policy with an infinite retention period. 
Continuous query: a query, written in InfluxQL, that runs automatically and periodically within a database. 
Our goal is to create one more retention policy within the same database where data are pushed from the default retention policy.
Let’s consider these 5 records for downsampling to a one-hour period.
Things look easy. All we should do is:
1. Create a new retention policy with infinite retention duration
    CREATE RETENTION POLICY "iceberg" 
    ON "access-metrics" DURATION 
    INF REPLICATION 1
2. Create a continuous query for every measurement we want to be aggregated
CREATE CONTINUOUS QUERY "upstreams_cq" ON "access-metrics"
BEGIN SELECT sum("values") AS "values",
INTO "iceberg"."upstreams" FROM "upstreams" 
GROUP BY time(1h), * END
Notice: asterix (*) on
GROUP BY
. We have tags-heavy data, and we need to preserve them during the aggregation process
3. Aggregate historic data using the same query
SELECT sum("values") AS "values",
INTO "access-metrics"."iceberg"."upstreams"
FROM "access-metrics"."autogen"."upstreams"
GROUP BY time(1h), *
As a result, we will get something like this:
4. Push new retention period on default retention policy
ALTER RETENTION POLICY "autogen" ON "access-metrics" DURATION 365d
This is a neat out-of-the-box solution, and it works! … to an extent.
Let's take a look at the InfluxDB notation:
In InfluxDB, a timestamp identifies a single point in any given data series. This is like an SQL database table where the primary key is pre-set by the system and is always time. 
Since all three of these records have the same timestamp, they 'silently' override each other during insertion without any private key violation constraints.
This solution is perfect for a simple case without tags in aggregated data. Using it with tags-heavy data would result in several entries with one timestamp that would overwrite over each other, causing each time period to have only one entry.

Our approach

So what can we do to improve InfluxDB downsampling so that it would suit us? Code it! 
We already have a service that encapsulates all knowledge on our metrics - why not extend it with downsampling capabilities?
Let’s revise our plan:
1. Create new retention policy with infinite retention duration
CREATE RETENTION POLICY "iceberg" ON "access-metrics"
DURATION INF REPLICATION 1
2. Schedule aggregation query:
SELECT sum("values") AS "values",
INTO "iceberg"."upstreams" FROM "upstreams" 
GROUP BY time(1h), *
3. Modify timestamp of aggregated entries, so that it would be distinct within the aggregation period
4. Aggregate historical data using the same query with timestamp modification
5. Push new retention period on default retention policy
ALTER RETENTION POLICY "autogen" ON "access-metrics" DURATION 365d
By diversifying the timestamp by adding milliseconds, we are preserving all the results of the aggregation and making an import case work, that would not just work out-of-the-box. It may not be elegant, but it works!
Autogen metrics graph
Downsampled metrics graph

Conclusion

  • When you are dealing with tons of data, you need to think of downsampling 
  • InfluxDB has good built-in mechanisms for data retention
  • InfluxDB continuous query is a good way to organize data aggregation 
  • If you want to preserve tags upon aggregation, you will need to write your own logic on top of InfluxQL
  • Don’t be afraid to write around the limitations of well-established products

Tags

The Noonification banner

Subscribe to get your daily round-up of top tech stories!