paint-brush
How Hypertables Enable Automatic PostgreSQL Partitioning for Your Datasetsby@timescale
174 reads

How Hypertables Enable Automatic PostgreSQL Partitioning for Your Datasets

by TimescaleNovember 2nd, 2023
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

If you’re working with growing PostgreSQL tables, you're likely no stranger to the challenges of managing large datasets efficiently.

People Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - How Hypertables Enable Automatic PostgreSQL Partitioning for Your Datasets
Timescale HackerNoon profile picture


If you’re working with growing PostgreSQL tables, you're likely no stranger to the challenges of managing large datasets efficiently:


  • Your query performance is degrading.
  • You’re dealing with maintenance overhead.
  • You’re finding it hard to keep up with high ingestion rates.
  • You’re having trouble managing your data lifecycle.


Postgres partitioning is your most powerful ally in solving these problems. By partitioning your large PostgreSQL tables, you can keep them fast and efficient. But setting up and maintaining partitioned PostgreSQL tables in production can be difficult.


“Yes,” your mind may go, “I might be able to improve my performance if I partition my tables, but this will be at the cost of countless hours spent on manual configurations, running maintenance jobs, testing, and not to mention the unforeseen issues that might pop up during scaling. It’s like having a potent car with an incredibly complicated gearbox.” If you’re using vanilla PostgreSQL in products like Amazon RDS, there’s a lot of truth to this. You will undoubtedly spend much of your time managing your partitioned tables. Plus, you’ll have to deal with custom scripts, keep rigorous maintenance practices, and carefully monitor your performance to revisit and tweak your configuration whenever you see changes in your dataset or ingestion rate.


But guess what: there’s a better way of creating a Postgres partition, and it’s called hypertables.

Meet Hypertables: Automatic PostgreSQL Partitioning for Your Large PostgreSQL Tables

Hypertables (which are available via the TimescaleDB extension and, in AWS, via the Timescale platform) are an innovation that makes the experience of creating a Postgres partition completely seamless. They automate the generation and management of data partitions without changing your user experience.


Working with a hypertable feels exactly like working with a regular PostgreSQL table. But, under the covers, hypertables create all the partitioning magic, speeding up your queries and ingests. This performance boost will sustain as your tables' volume keeps growing, making hypertables extremely scalable.


Hypertables look like regular PostgreSQL tables, but under the hood, they’re being automatically partitioned to enhance performance


Hypertables are optimized for time-based partitioning, so this is the type of partitioning that we’ll focus on in this article. However, hypertables also work for tables that aren’t time-based but have something similar, for example, a BIGINT primary key.


Let’s explain how hypertables work with an example.


Imagine you have a PostgreSQL table called sensor_data, where data from various IoT devices is stored with a timestamp. The table might look something like this:


CREATE TABLE sensor_data (
    device_id INT NOT NULL,
    event_time TIMESTAMPTZ NOT NULL,
    temperature FLOAT NOT NULL,
    humidity FLOAT NOT NULL
);


Now, as the volume of sensor_data grows, you start facing performance issues and management complexities. Here’s where hypertables come to help. If you were using Timescale, the only thing you’d need to do is convert your sensor_data table into a hypertable:


SELECT create_hypertable('sensor_data', 'event_time');


This is how easy it is. With this simple command, sensor_data is now a hypertable that automatically partitions your data by the event_time column.


Your PostgreSQL partitioning is all set.


Your data will be automatically partitioned as it gets ingested into the hypertable, with no manual work required on your end to create or manage such partitions


Native Partitioning vs. Hypertables: How Much Easier Does It Get?

Let’s look at what’s happening under the hood.


If you were using a traditional native method to create a Postgres partition, you would have to go through all these steps to set up partitioning in sensor_data:


  1. Create a parent table with the common schema and constraints but no data.
  2. Define child tables, each covering a specific time range.
  3. Add indexes to the parent and child tables.
  4. Set up a job for scheduling the creation of partitions.
  5. Set up a job for managing old partitions.
  6. Attaching it all together.


Each one of these steps comes with its chunk of code; they require you to run different extensions, like pg_partman and cron ; you’ll have to monitor potential issues on each one of these steps and set up adjustments manually along the way, etc. Overall, you’ll create significant maintenance overhead for yourself.


What hypertables do instead is encapsulate and automate all these steps, significantly reducing the complexity, manual effort, and potential for errors on your end:


  • With hypertables, there’s no need to create a parent table manually and to define child tables for each time range. You would simply convert your existing table into a hypertable.


  • Hypertables also simplify indexing. When you create an index on a hypertable, Timescale automatically creates the corresponding indexes on all current and future partitions, ensuring consistent query performance without manual adjustments.


  • Hypertables automatically create new partitions on the fly based on the specified time interval. As new data is ingested, appropriate partitions are ready to store the data without manual intervention or scheduled jobs. Using Timescale eliminates the risk of partitions not existing, completely removing partition management from your to-do list.


  • Timescale maintains its own partition catalogs and implements its own minimized locking strategy to ensure that your application’s read or write operations are never blocked by the underlying partitioning operations (something that can be an issue in native PostgreSQL partitioning).


Once your PostgreSQL table becomes a hypertable, you can keep querying it as usual. You will instantly experience a performance boost. When you execute a query, Timescale’s query planner intelligently routes the query to the appropriate partition(s), ensuring that only relevant data is scanned. This process remains completely transparent; you don't need to think about it or worry about which partition contains which data.


Something similarly straightforward happens when you ingest data. Timescale will take care of routing your new data to the appropriate partition under the hood, ensuring that each partition remains optimally sized. (The default partition size is seven (7) days in Timescale, but you can easily modify this.)

Partitioning Is Only the Beginning: Features Unlocked With Hypertables

Hypertables make partitioning seamless and unlock a wealth of features that will help you improve your PostgreSQL performance even further and save you time when managing your data.


A few examples:


  • Columnar compression for faster queries and cheaper storage. By enabling Timescale compression, your hypertable will change from row to column-oriented. This can reduce storage usage by up to 95 % and unlock blazing-fast analytical queries while allowing the data to be updated.


  • Blazing-fast analytical views. By creating incrementally updated materialized views, known as continuous aggregates, you’ll improve the performance of aggregate queries tremendously.


    Continuous aggregates automatically refresh and store aggregated data, enabling you to build fast visualizations, including real-time insights and historical analytics that go back in time.


  • Easy and configurable data retention. Hypertables allow you to set up automatic data retention policies with one simple command:add_retention_policy. You can just tell Timescale when you want your data dropped, and your hypertables will automatically drop outdated partitions when it’s time.


  • SQL hyperfunctions to run analytics with fewer lines of code. Hypertables come with a full set of hyperfunctions that give you a blazing-fast full set of mathematical analytical functions, procedures, and data types optimized for effectively querying, aggregating, and analyzing large volumes of data.


  • Faster DISTINCT and now()queries. Queries that reference now( ) when pruning partitions will perform better in Timescale, and your ordered DISTINCT queries will benefit from SkipScan.


  • Built-in job scheduler. The Timescale job scheduler lets you schedule any SQL or function-based job within PostgreSQL, meaning you don’t need an external scheduler or another extension like pg_cron.

When To Use Hypertables: Example Use Cases

In sum, if you plan to partition your PostgreSQL tables by time, you’ll surely benefit from hypertables. But who doesn’t love some concrete use-case examples?


Let’s paint a few scenarios where hypertables would be most useful. Needless to say, this is not a comprehensive list! If you’re intrigued by hypertables and Timescale but are unsure if your use case is a fit, don’t hesitate to contact us.

Ingesting thousands of energy metrics per second

An engineering team at a leading energy company is tasked with managing the data from a newly installed smart grid, a big investment for the energy company, which now has granular insights into energy consumption, distribution efficiency, and grid health metrics. Elements in the smart grid generate thousands of energy metrics per second that need to be properly collected, analyzed, and managed.


These energy metrics are currently stored in PostgreSQL, but the engineering team has to figure out the best solution to ingest this high-velocity data efficiently without losing granularity or accuracy. They must also ensure they can query this data quickly for real-time monitoring and analysis.


This would be an ideal use case for Timescale:


  • Timescale’s hypertables can handle the high ingestion without imposing manual work on the team.
  • Hypertables also optimized query performance, ensuring that real-time energy data will be readily accessible for queries.
  • As the smart grid expands, Timescale's hypertables will seamlessly scale, accommodating increased data volumes without compromising performance.
  • Given that Timescale is built on PostgreSQL, the engineering team can leverage their existing knowledge and tools, ensuring a smooth transition and minimal learning curve.


-- Creating a table to store energy metrics from the smart grid
CREATE TABLE energy_metrics (
    element_id INT NOT NULL,
    event_time TIMESTAMPTZ NOT NULL,
    voltage DECIMAL NOT NULL,
    current DECIMAL NOT NULL,
    frequency DECIMAL NOT NULL,
    PRIMARY KEY(grid_id, event_time)
);


-- Converting the energy_metrics table into a hypertable
SELECT create_hypertable('energy_metrics', 'event_time');


-- Sample query to ingest new metrics data into the hypertable
INSERT INTO energy_metrics (element_id, event_time, voltage, current, frequency) 
VALUES  (1, NOW(), 210.5, 10.7, 50.01);


-- Sample query to retrieve the latest energy metrics for real-time monitoring
SELECT * FROM energy_metrics 
WHERE element_id = 1 
ORDER BY event_time DESC 
LIMIT 10;


Building dashboards for monitoring sensor data

An industrial manufacturing company operates a range of heavy machinery and equipment in its facilities. Each piece of machinery is equipped with sensors that continuously monitor and log temperature data in a sensor_data table to ensure optimal performance and safety.


The company needs its PostgreSQL database to achieve two distinct yet critical objectives:


  • Provide engineers and maintenance staff with real-time temperature data to detect anomalies and ensure that machinery is operating within safe temperature ranges.
  • Analyze historical temperature data to identify trends, predict maintenance needs, and improve operational efficiency.


The team decides to turn sensor_table into a hypertable. To facilitate real-time monitoring, they create a continuous aggregate to calculate the average temperature for every piece of machinery, which is updated every minute:


CREATE MATERIALIZED VIEW real_time_avg_temp
WITH (timescaledb.continuous, timescaledb.refresh_interval = '1m') AS
SELECT device_id,
       time_bucket('1 minute', event_time) AS one_min,
       AVG(temperature) AS avg_temp
FROM sensor_data
GROUP BY device_id, one_min;


With real_time_avg_temp, the maintenance team has immediate access to the average temperature of every machinery piece, enabling swift responses to temperature anomalies and preventing potential breakdowns.


For historical analysis, the team creates another continuous aggregate view, this time aggregating daily average temperatures:


CREATE MATERIALIZED VIEW daily_avg_temp
WITH (timescaledb.continuous) AS
SELECT device_id,
       time_bucket('1 day', event_time) AS one_day,
       AVG(temperature) AS avg_temp
FROM sensor_data
GROUP BY device_id, one_day;


Both views (real_time_avg_temp and daily_avg_temp) feed into a monitoring dashboard. The maintenance team would get alerted of potential issues as they arise. At the same time, the team can review historical temperature trends, conduct analyses to predict when machinery might need maintenance, and optimize operational protocols to mitigate excessive temperature fluctuations.

Storing large volumes of weather data effectively

An environmental research institute is collecting and analyzing many TBs of weather data to study climate change. The team already knows PostgreSQL, so they want to stick to it—but the storage cost is becoming a concern.


The team decides to start using Timescale. After optimizing their database to reduce storage use and enabling compression, their storage costs become a fraction of what they were, with the data remaining fully accessible for analysis.


CREATE TABLE weather_data (
    sensor_id INT NOT NULL,
    event_time TIMESTAMPTZ NOT NULL,
    temperature DECIMAL NOT NULL,
    humidity DECIMAL NOT NULL,
    pressure DECIMAL NOT NULL,
    PRIMARY KEY(sensor_id, event_time)
);


-- Conversion to hypertable 
SELECT create_hypertable('weather_data', 'event_time');


-- Enabling compression
ALTER TABLE weather_data 
SET (
    timescaledb.compress,
    timescaledb.compress_segmentby = 'sensor_id'
);


-- Sample query
SELECT * FROM weather_data 
WHERE sensor_id = 1 
AND event_time > NOW() - INTERVAL '1 month'
ORDER BY event_time DESC 
LIMIT 10;

Querying high volumes of crypto data in real-time

A new crypto exchange is grappling with the challenge of providing real-time analytics to traders. As the data volume stored in the underlying PostgreSQL database increases, the engineering team struggles to keep the database fast enough. To them, it’s essential to deliver a better user experience than the competition, which has a more established but slower product. Keeping up the speed and responsiveness of their portal is paramount.


The team knows that by partitioning their large pricing table, they’ll most likely improve query performance. Instead of attempting to manage partitioning themselves, since they’re already swamped, the engineers decide to implement Timescale.


Once turned into a hypertable, their pricing table automatically partitions the data as it gets in. Real-time analytics, previously marred by delays, are now swift and accurate. P.S. Check out the story of how Messari improved its performance with hypertables.

Get Started With Hypertables

PostgreSQL partitioning is a powerful tool for managing large tables. On its own, Postgres partitioning can be complex to implement and maintain, but Timescale's hypertables make the whole process seamless and automatic. The best part is that by using hypertables, you’ll unlock a myriad of other awesome features (like columnar compression and automatic materialized views) that will make it even easier to scale your PostgreSQL database.


If you're ready to explore Timescale's hypertables, start by signing up to Timescale, our fully managed PostgreSQL—but faster—cloud solution. It’s free for 30 days, and no credit card is required. If you are self-hosting your own PostgreSQL instance, you can get access to hypertables by adding the TimescaleDB extension.


Written by Carlota Soto.


Also published here.