If you’re working with growing PostgreSQL tables, you're likely no stranger to the challenges of managing large datasets efficiently:
“Yes,” your mind may go, “I might be able to improve my performance if I partition my tables, but this will be at the cost of countless hours spent on manual configurations, running maintenance jobs, testing, and not to mention the unforeseen issues that might pop up during scaling. It’s like having a potent car with an incredibly complicated gearbox.” If you’re using vanilla PostgreSQL in products like
But guess what: there’s a better way of creating a Postgres partition, and it’s called hypertables.
Hypertables (which are available via the
Working with a hypertable feels exactly like working with a regular PostgreSQL table. But, under the covers, hypertables create all the partitioning magic, speeding up your queries and ingests. This performance boost will sustain as your tables' volume keeps growing,
Hypertables are optimized for time-based partitioning, so this is the type of partitioning that we’ll focus on in this article. However, hypertables also work for tables that aren’t time-based but have something similar, for example, a BIGINT primary key.
Let’s explain how hypertables work with an example.
Imagine you have a PostgreSQL table called sensor_data
, where data from various IoT devices is stored with a timestamp. The table might look something like this:
CREATE TABLE sensor_data (
device_id INT NOT NULL,
event_time TIMESTAMPTZ NOT NULL,
temperature FLOAT NOT NULL,
humidity FLOAT NOT NULL
);
Now, as the volume of sensor_data
grows, you start facing performance issues and management complexities. Here’s where hypertables come to help. If you were using Timescale, the only thing you’d need to do is convert your sensor_data
table into a hypertable:
SELECT create_hypertable('sensor_data', 'event_time');
This is how easy it is. With this simple command, sensor_data
is now a hypertable that automatically partitions your data by the event_time
column.
Your PostgreSQL partitioning is all set.
Let’s look at what’s happening under the hood.
If you were using a traditional native method to create a Postgres partition, you would have to go through all these steps to set up partitioning in sensor_data
:
Each one of these steps comes with its chunk of code; pg_partman
cron
; you’ll have to monitor potential issues on each one of these steps and set up adjustments manually along the way, etc. Overall, you’ll create significant maintenance overhead for yourself.
What hypertables do instead is encapsulate and automate all these steps, significantly reducing the complexity, manual effort, and potential for errors on your end:
With hypertables, there’s no need to create a parent table manually and to define child tables for each time range. You would simply convert your existing table into a hypertable.
Hypertables also simplify indexing. When you create an index on a hypertable, Timescale automatically creates the corresponding indexes on all current and future partitions, ensuring consistent query performance without manual adjustments.
Hypertables automatically create new partitions on the fly based on the specified time interval. As new data is ingested, appropriate partitions are ready to store the data without manual intervention or scheduled jobs.
Timescale maintains its own partition catalogs and implements its own minimized locking strategy to ensure that your application’s read or write operations are never blocked by the underlying partitioning operations (
Once your PostgreSQL table becomes a hypertable, you can keep querying it as usual. You will instantly experience a performance boost. When you execute a query, Timescale’s query planner intelligently routes the query to the appropriate partition(s), ensuring that only relevant data is scanned. This process remains completely transparent; you don't need to think about it or worry about which partition contains which data.
Something similarly straightforward happens when you ingest data. Timescale will take care of routing your new data to the appropriate partition under the hood, ensuring that each partition remains optimally sized. (
Hypertables make partitioning seamless and unlock a wealth of features that will help you improve your PostgreSQL performance even further and save you time when managing your data.
A few examples:
Columnar compression for faster queries and cheaper storage. By enabling
Blazing-fast analytical views.
Continuous aggregates automatically refresh and store aggregated data, enabling you to build fast visualizations, including real-time insights and historical analytics that go back in time.
Easy and configurable data retention. Hypertables allow you to set up automatic data retention policies with one simple command:
SQL hyperfunctions to run analytics with fewer lines of code. Hypertables come with a full set of
Faster DISTINCT
and now()
queries. Queries that reference now( ) when pruning partitions
Built-in job scheduler. The pg_cron
.
In sum, if you plan to partition your PostgreSQL tables by time, you’ll surely benefit from hypertables. But who doesn’t love some concrete use-case examples?
Let’s paint a few scenarios where hypertables would be most useful. Needless to say, this is not a comprehensive list!
An engineering team at a leading energy company is tasked with managing the data from a newly installed smart grid, a big investment for the energy company, which now has granular insights into energy consumption, distribution efficiency, and grid health metrics. Elements in the smart grid generate thousands of energy metrics per second that need to be properly collected, analyzed, and managed.
These energy metrics are currently stored in PostgreSQL, but the engineering team has to figure out the best solution to ingest this high-velocity data efficiently without losing granularity or accuracy. They must also ensure they can query this data quickly for real-time monitoring and analysis.
This would be an ideal use case for Timescale:
-- Creating a table to store energy metrics from the smart grid
CREATE TABLE energy_metrics (
element_id INT NOT NULL,
event_time TIMESTAMPTZ NOT NULL,
voltage DECIMAL NOT NULL,
current DECIMAL NOT NULL,
frequency DECIMAL NOT NULL,
PRIMARY KEY(grid_id, event_time)
);
-- Converting the energy_metrics table into a hypertable
SELECT create_hypertable('energy_metrics', 'event_time');
-- Sample query to ingest new metrics data into the hypertable
INSERT INTO energy_metrics (element_id, event_time, voltage, current, frequency)
VALUES (1, NOW(), 210.5, 10.7, 50.01);
-- Sample query to retrieve the latest energy metrics for real-time monitoring
SELECT * FROM energy_metrics
WHERE element_id = 1
ORDER BY event_time DESC
LIMIT 10;
An industrial manufacturing company operates a range of heavy machinery and equipment in its facilities. Each piece of machinery is equipped with sensors that continuously monitor and log temperature data in a sensor_data
table to ensure optimal performance and safety.
The company needs its PostgreSQL database to achieve two distinct yet critical objectives:
The team decides to turn sensor_table
into a hypertable. To facilitate real-time monitoring, they create a continuous aggregate to calculate the average temperature for every piece of machinery, which is updated every minute:
CREATE MATERIALIZED VIEW real_time_avg_temp
WITH (timescaledb.continuous, timescaledb.refresh_interval = '1m') AS
SELECT device_id,
time_bucket('1 minute', event_time) AS one_min,
AVG(temperature) AS avg_temp
FROM sensor_data
GROUP BY device_id, one_min;
With real_time_avg_temp
, the maintenance team has immediate access to the average temperature of every machinery piece, enabling swift responses to temperature anomalies and preventing potential breakdowns.
For historical analysis, the team creates another continuous aggregate view, this time aggregating daily average temperatures:
CREATE MATERIALIZED VIEW daily_avg_temp
WITH (timescaledb.continuous) AS
SELECT device_id,
time_bucket('1 day', event_time) AS one_day,
AVG(temperature) AS avg_temp
FROM sensor_data
GROUP BY device_id, one_day;
Both views (real_time_avg_temp
and daily_avg_temp
) feed into a monitoring dashboard. The maintenance team would get alerted of potential issues as they arise. At the same time, the team can review historical temperature trends, conduct analyses to predict when machinery might need maintenance, and optimize operational protocols to mitigate excessive temperature fluctuations.
An environmental research institute is collecting and analyzing many TBs of weather data to study climate change. The team already knows PostgreSQL, so they want to stick to it—but the storage cost is becoming a concern.
The team decides to start using Timescale. After optimizing their database to reduce storage use and enabling compression, their storage costs become a fraction of what they were, with the data remaining fully accessible for analysis.
CREATE TABLE weather_data (
sensor_id INT NOT NULL,
event_time TIMESTAMPTZ NOT NULL,
temperature DECIMAL NOT NULL,
humidity DECIMAL NOT NULL,
pressure DECIMAL NOT NULL,
PRIMARY KEY(sensor_id, event_time)
);
-- Conversion to hypertable
SELECT create_hypertable('weather_data', 'event_time');
-- Enabling compression
ALTER TABLE weather_data
SET (
timescaledb.compress,
timescaledb.compress_segmentby = 'sensor_id'
);
-- Sample query
SELECT * FROM weather_data
WHERE sensor_id = 1
AND event_time > NOW() - INTERVAL '1 month'
ORDER BY event_time DESC
LIMIT 10;
A new crypto exchange is grappling with the challenge of providing real-time analytics to traders. As the data volume stored in the underlying PostgreSQL database increases, the engineering team struggles to keep the database fast enough. To them, it’s essential to deliver a better user experience than the competition, which has a more established but slower product. Keeping up the speed and responsiveness of their portal is paramount.
The team knows that by partitioning their large pricing table, they’ll most likely improve query performance. Instead of attempting to manage partitioning themselves, since they’re already swamped, the engineers decide to implement Timescale.
Once turned into a hypertable, their pricing table automatically partitions the data as it gets in. Real-time analytics, previously marred by delays, are now swift and accurate.
PostgreSQL partitioning is a powerful tool for managing large tables. On its own, Postgres partitioning can be complex to implement and maintain, but Timescale's hypertables make the whole process seamless and automatic. The best part is that by using hypertables, you’ll unlock a myriad of other awesome features (like columnar compression and automatic materialized views) that will make it even easier to scale your PostgreSQL database.
If you're ready to explore Timescale's hypertables, .
Written by Carlota Soto.
Also published here.