Introduction

The security and monitoring of sensitive data is critically important to us, since customers are placing their trust in our product. That’s why we try to monitor everything, resulting in an enormous amount of data points in our time-series DB. So, what do you do when your data exceeds your cloud storage plan and you start to get outages?

Start implementing a retention policy and downsampling scenario for old data aggregation. Brief overview of our metric capture infrastructure:

InfluxDB for storing or metrics in the face of `influxcloud`
20+ metrics
Time precision to milliseconds
20+ mil points per measurement
1 year of historical data
Real-time customer-facing service based on this data

Traditional approach

We use InfluxDB as our storage solution. It’s the highest-ranked time-series DB on the market, and rightfully so - it offers a very good set of tools to work with and visualize your data.

So you want to go the traditional way, and here InfluxDB has you covered. For this to work, you need to be aware of two things: retention policy and continuous query.

Retention policy is a part of InfluxDB’s structure, like a scheme in RDB that also stores how long InfluxDB should keep data. A database can consist of several retention policies that contain measurements. By default, at database creation, InfluxDB creates an `autogen` retention policy with an infinite retention period.

Continuous query: a query, written in InfluxQL, that runs automatically and periodically within a database.

Our goal is to create one more retention policy within the same database where data are pushed from the default retention policy.

Let’s consider these 5 records for downsampling to a one-hour period.

Things look easy. All we should do is:

1. Create a new retention policy with infinite retention duration

CREATE RETENTION POLICY "iceberg" 
ON "access-metrics" DURATION 
INF REPLICATION 1

2. Create a continuous query for every measurement we want to be aggregated

CREATE CONTINUOUS QUERY "upstreams_cq" ON "access-metrics"
BEGIN SELECT sum("values") AS "values",
INTO "iceberg"."upstreams" FROM "upstreams" 
GROUP BY time(1h), * END

Notice: asterix (*) on

GROUP BY

. We have tags-heavy data, and we need to preserve them during the aggregation process

3. Aggregate historic data using the same query

SELECT sum("values") AS "values",
INTO "access-metrics"."iceberg"."upstreams"
FROM "access-metrics"."autogen"."upstreams"
GROUP BY time(1h), *

As a result, we will get something like this:

4. Push new retention period on default retention policy

ALTER RETENTION POLICY "autogen" ON "access-metrics" DURATION 365d

This is a neat out-of-the-box solution, and it works! … to an extent.

Let's take a look at the InfluxDB notation:

In InfluxDB, a timestamp identifies a single point in any given data series. This is like an SQL database table where the primary key is pre-set by the system and is always time.

Since all three of these records have the same timestamp, they 'silently' override each other during insertion without any private key violation constraints.

This solution is perfect for a simple case without tags in aggregated data. Using it with tags-heavy data would result in several entries with one timestamp that would overwrite over each other, causing each time period to have only one entry.

Our approach

So what can we do to improve InfluxDB downsampling so that it would suit us? Code it!

We already have a service that encapsulates all knowledge on our metrics - why not extend it with downsampling capabilities?

Let’s revise our plan:

1. Create new retention policy with infinite retention duration

CREATE RETENTION POLICY "iceberg" ON "access-metrics"
DURATION INF REPLICATION 1

2. Schedule aggregation query:

SELECT sum("values") AS "values",
INTO "iceberg"."upstreams" FROM "upstreams" 
GROUP BY time(1h), *

3. Modify timestamp of aggregated entries, so that it would be distinct within the aggregation period

4. Aggregate historical data using the same query with timestamp modification

5. Push new retention period on default retention policy

ALTER RETENTION POLICY "autogen" ON "access-metrics" DURATION 365d

By diversifying the timestamp by adding milliseconds, we are preserving all the results of the aggregation and making an import case work, that would not just work out-of-the-box. It may not be elegant, but it works!

Autogen metrics graph

Downsampled metrics graph

Conclusion

When you are dealing with tons of data, you need to think of downsampling
InfluxDB has good built-in mechanisms for data retention
InfluxDB continuous query is a good way to organize data aggregation
If you want to preserve tags upon aggregation, you will need to write your own logic on top of InfluxQL
Don’t be afraid to write around the limitations of well-established products

InfluxDB Continuous Downsampling - Optimize Your TSDB Today

Introduction

Traditional approach

Our approach

Conclusion