Which pg_dump Format Is Fastest and Smallest? A 21-Restore Experiment

I made a backup using pg_dump and restored it 21 times. I made backups in 4 different formats using 1 to 7 compression levels for each format. I recorded the results and compared the different types to understand which methods are more effective for my use case. pg_dump Details and measurements are below. Table of content: Table of content: Why do I need this measurement? Backup formats and compression types in pg_dump PostgreSQL configuration Data preparation Measurement results Conclusions based on measurements Conclusion Why do I need this measurement? Backup formats and compression types in pg_dump PostgreSQL configuration Data preparation Measurement results Conclusions based on measurements Conclusion Why do I need this measurement? I had a very specific task: to find the best backup format using the standard pg_dump. “Best” means the optimal ratio of backup creation speed, recovery speed and final file size. pg_dump I used this information in my open source project for PostgreSQL backups called Postgresus. PostgreSQL backups PostgreSQL backups There were additional requirements: the backup must be compressed before being sent to my server to minimize network usage; the backup file itself must be a single file (rather than, for example, a directory) so it can be streamed to a disk, S3 or the cloud; the method of creating a backup should not require any database configuration in advance (therefore, PgBackRest, WAL-G, and pg_basebackup were ruled out) in order to be easy to integrate into an open source project and work with any database (installed locally, in Docker, in DBaaS, with a read replica, etc.). the backup must be compressed before being sent to my server to minimize network usage; before being sent the backup file itself must be a single file (rather than, for example, a directory) so it can be streamed to a disk, S3 or the cloud; must be a single file the method of creating a backup should not require any database configuration in advance (therefore, PgBackRest, WAL-G, and pg_basebackup were ruled out) in order to be easy to integrate into an open source project and work with any database (installed locally, in Docker, in DBaaS, with a read replica, etc.). Backup formats and compression types in pg_dump pg_dump supports 4 formats: pg_dump Plain (SQL): Plain (SQL): ❌ No compression. ✅ Produces a single file. ❌ Does not support parallel backup. ❌ Does not support parallel restore. ❌ No compression. ✅ Produces a single file. ❌ Does not support parallel backup. not ❌ Does not support parallel restore. not Custom (-Fc): Custom (-Fc): ✅ Compression is enabled. ✅ Produces a single file. ❌ Does not support parallel backup. ✅ Supports parallel restore. ✅ Compression is enabled. ✅ Produces a single file. ❌ Does not support parallel backup. not ✅ Supports parallel restore. Directory (-Fd): Directory (-Fd): ✅ Compression is enabled. ❌ Not a single file (outputs a directory of files). ✅ Supports parallel backup. ✅ Supports parallel restore. ✅ Compression is enabled. ❌ Not a single file (outputs a directory of files). ✅ Supports parallel backup. ✅ Supports parallel restore. Tar (-Ft): Tar (-Ft): ✅ Compression is enabled. ✅ Produces a single file (tar archive). ❌ Does not support parallel backup. ❌ Does not support parallel restore. ✅ Compression is enabled. ✅ Produces a single file (tar archive). ❌ Does not support parallel backup. not ❌ Does not support parallel restore. not I was most interested in the custom format and directory format. They support parallel processing of backups. The custom format cannot create a backup in parallel (only restore), but writes it to a single file. The directory format can both back up and restore in parallel, but writes everything to a directory. For these formats the following compression types are supported: gzip: gzip Standard compression algorithm Compression speed: medium; Decompression speed: high. Compression ratio: 2–3×. Standard compression algorithm Compression speed: medium; Decompression speed: high. Compression ratio: 2–3×. lz4: lz4 Algorithm tuned for much higher speed than gzip. Compression & decompression: very high. Compression ratio: 1.5–2×. Algorithm tuned for much higher speed than gzip. Compression & decompression: very high. Compression ratio: 1.5–2×. zstd: zstd Relatively new (2016) algorithm from Facebook/Meta, balancing speed and compression. Compression speed: high; Decompression speed: high. Compression ratio: 3–5×. Relatively new (2016) algorithm from Facebook/Meta, balancing speed and compression. Compression speed: high; Decompression speed: high. Compression ratio: 3–5×. The compression characteristics described are based on perfectly prepared data. In the case of a database, compression does not take up 100% of the time with 100% CPU utilization. There are many database-specific operations that are likely to slow down compression (especially “on the fly”). on perfectly prepared data Before the test, I assumed that a custom format with gzip compression would be the best option for me (as a happy medium) and had cautious hopes for zstd (as a more modern format). By the way, zstd only began to be supported in PostgreSQL 15. PostgreSQL configuration I launched two PostgreSQL instances in Docker Compose: one for creating backups (with data) and one for restoring from them. I didn’t use the standard ports because they are already in use by my local versions of PostgreSQL. docker-compose.yml docker-compose.yml version: "3.8" services: db: image: postgres:17 container_name: db environment: POSTGRES_DB: testdb POSTGRES_USER: postgres POSTGRES_PASSWORD: testpassword ports: - "7000:7000" command: -p 7000 volumes: - ./pgdata:/var/lib/postgresql/data healthcheck: test: ["CMD-SHELL", "pg_isready -U postgres -d testdb -p 7000"] interval: 10s timeout: 5s retries: 5 restart: unless-stopped db-restore: image: postgres:17 container_name: db-restore environment: POSTGRES_DB: testdb POSTGRES_USER: postgres POSTGRES_PASSWORD: testpassword ports: - "7001:7001" command: -p 7001 volumes: - ./pgdata-restore:/var/lib/postgresql/data healthcheck: test: ["CMD-SHELL", "pg_isready -U postgres -d testdb -p 7001"] interval: 10s timeout: 5s retries: 5 restart: unless-stopped depends_on: - db version: "3.8" services: db: image: postgres:17 container_name: db environment: POSTGRES_DB: testdb POSTGRES_USER: postgres POSTGRES_PASSWORD: testpassword ports: - "7000:7000" command: -p 7000 volumes: - ./pgdata:/var/lib/postgresql/data healthcheck: test: ["CMD-SHELL", "pg_isready -U postgres -d testdb -p 7000"] interval: 10s timeout: 5s retries: 5 restart: unless-stopped db-restore: image: postgres:17 container_name: db-restore environment: POSTGRES_DB: testdb POSTGRES_USER: postgres POSTGRES_PASSWORD: testpassword ports: - "7001:7001" command: -p 7001 volumes: - ./pgdata-restore:/var/lib/postgresql/data healthcheck: test: ["CMD-SHELL", "pg_isready -U postgres -d testdb -p 7001"] interval: 10s timeout: 5s retries: 5 restart: unless-stopped depends_on: - db Then I updated postgresql.conf a little to use more computer resources. I have an AMD Ryzen 9 7950X (16 cores, 32 threads), 64 GB of RAM and a 1 TB NVMe drive. I configured the database to use 4 threads and 16 GB of memory via PgTune. postgresql.conf postgresql.conf # DB Version: 17 # OS Type: linux # DB Type: web # Total Memory (RAM): 16 GB # CPUs num: 4 # Connections num: 100 # Data Storage: ssd max_connections = 100 shared_buffers = 4GB effective_cache_size = 12GB maintenance_work_mem = 1GB checkpoint_completion_target = 0.9 wal_buffers = 16MB default_statistics_target = 100 random_page_cost = 1.1 effective_io_concurrency = 200 work_mem = 40329kB huge_pages = off min_wal_size = 1GB max_wal_size = 4GB max_worker_processes = 4 max_parallel_workers_per_gather = 2 max_parallel_workers = 4 max_parallel_maintenance_workers = 2 listen_addresses = '*' # DB Version: 17 # OS Type: linux # DB Type: web # Total Memory (RAM): 16 GB # CPUs num: 4 # Connections num: 100 # Data Storage: ssd max_connections = 100 shared_buffers = 4GB effective_cache_size = 12GB maintenance_work_mem = 1GB checkpoint_completion_target = 0.9 wal_buffers = 16MB default_statistics_target = 100 random_page_cost = 1.1 effective_io_concurrency = 200 work_mem = 40329kB huge_pages = off min_wal_size = 1GB max_wal_size = 4GB max_worker_processes = 4 max_parallel_workers_per_gather = 2 max_parallel_workers = 4 max_parallel_maintenance_workers = 2 listen_addresses = '*' Data preparation To begin with, I created a database with 3 tables and 9 indexes with a total size of ~11 GB. The data is as diverse as possible. I am more than sure that pg_dump works better with some types of data and worse with others. But my project is aimed at a wide audience, so it is important to measure the “average across the board.” Below is the table structure. Tables: Tables: large_test_table: Employees and users with various data types. 18 columns including name, email, address, salary, etc. orders: Order data and their change history. 13 columns including user_id, order_number, amounts, etc. activity_logs: User activity logs with large text fields. 13 columns including user_id, action, details, timestamps, etc. large_test_table: Employees and users with various data types. 18 columns including name, email, address, salary, etc. large_test_table: name email address salary orders: Order data and their change history. 13 columns including user_id, order_number, amounts, etc. orders: user_id order_number amounts activity_logs: User activity logs with large text fields. 13 columns including user_id, action, details, timestamps, etc. activity_logs: user_id action details timestamps Indexes: Indexes: large_test_table — idx_large_test_name on name: 🔍 Search by name. large_test_table — idx_large_test_email on email: 🔍 Search by email. large_test_table — idx_large_test_created_at on created_at: 🕒 Search by time range. large_test_table — idx_large_test_department on department: 🗂 Filter by department. orders — idx_orders_user_id on user_id: 🔍 Lookup a user’s orders. orders — idx_orders_order_date on order_date: 🕒 Search by order date/time. orders — idx_orders_status on status: 🔍 Search by status. activity_logs — idx_activity_user_id on user_id: 🔍 Search by user ID. activity_logs — idx_activity_timestamp on timestamp: 🕒 Search by date. activity_logs — idx_activity_action on action: 🔍 Search by action. large_test_table — idx_large_test_name on name: 🔍 Search by name. large_test_table idx_large_test_name name large_test_table — idx_large_test_email on email: 🔍 Search by email. large_test_table idx_large_test_email email large_test_table — idx_large_test_created_at on created_at: 🕒 Search by time range. large_test_table idx_large_test_created_at created_at large_test_table — idx_large_test_department on department: 🗂 Filter by department. large_test_table idx_large_test_department department orders — idx_orders_user_id on user_id: 🔍 Lookup a user’s orders. orders idx_orders_user_id user_id orders — idx_orders_order_date on order_date: 🕒 Search by order date/time. orders idx_orders_order_date order_date orders — idx_orders_status on status: 🔍 Search by status. orders idx_orders_status status activity_logs — idx_activity_user_id on user_id: 🔍 Search by user ID. activity_logs idx_activity_user_id user_id activity_logs — idx_activity_timestamp on timestamp: 🕒 Search by date. activity_logs idx_activity_timestamp timestamp activity_logs — idx_activity_action on action: 🔍 Search by action. activity_logs idx_activity_action action The data is generated and inserted into the database using a Python script. The algorithm is as follows: 25,000 rows of data are generated; 100,000 rows are inserted into each table in turn using COPY; when the database reaches 10 GB in size, indexes are created. 25,000 rows of data are generated; 100,000 rows are inserted into each table in turn using COPY; when the database reaches 10 GB in size, indexes are created. Measurement results After 21 creations and restores, I obtained the following table with data that includes: backup creation speed; restore speed from backup; total time; backup size relative to the original database size. backup creation speed; restore speed from backup; total time; backup size relative to the original database size. The table with raw CSV data: tool,format,compression_method,compression_level,backup_duration_seconds,restore_duration_seconds,total_duration_seconds,backup_size_bytes,database_size_bytes,restored_db_size_bytes,compression_ratio,backup_success,restore_success,backup_error,restore_error,timestamp pg_dump,plain,none,0,100.39210295677185,735.2188968658447,835.6109998226166,9792231003,11946069139,11922173075,0.8197031918249641,True,True,,,2025-07-29T09:56:20.611844 pg_dump,custom,none,0,264.56927490234375,406.6467957496643,671.216070652008,6862699613,11946069139,11943709843,0.5744734550878778,True,True,,,2025-07-29T10:07:37.226681 pg_dump,custom,gzip,1,214.07211470603943,383.0168492794037,597.0889639854431,7074031563,11946069139,11943611539,0.5921639562511493,True,True,,,2025-07-29T10:17:39.801883 pg_dump,custom,gzip,5,260.6179132461548,393.76623010635376,654.3841433525085,6866440205,11946069139,11943718035,0.5747865783384196,True,True,,,2025-07-29T10:28:40.167485 pg_dump,custom,gzip,9,272.3802499771118,385.1409020423889,657.5211520195007,6856264586,11946069139,11943619731,0.5739347819121977,True,True,,,2025-07-29T10:39:42.912960 pg_dump,custom,lz4,1,84.0079517364502,379.6986663341522,463.7066180706024,9146843234,11946069139,11943685267,0.765678075990583,True,True,,,2025-07-29T10:47:32.131593 pg_dump,custom,lz4,5,150.24981474876404,393.44346714019775,543.6932818889618,8926348325,11946069139,11943718035,0.7472205477078983,True,True,,,2025-07-29T10:56:41.333595 pg_dump,custom,lz4,12,220.93980932235718,418.26913809776306,639.2089474201202,8923243046,11946069139,11943767187,0.7469606062188722,True,True,,,2025-07-29T11:07:26.574678 pg_dump,custom,zstd,1,87.83108067512512,419.07846903800964,506.90954971313477,6835388679,11946069139,11943767187,0.5721872692570225,True,True,,,2025-07-29T11:15:59.917828 pg_dump,custom,zstd,5,102.42366409301758,413.64263129234314,516.0662953853607,6774137561,11946069139,11944357011,0.567059966100871,True,True,,,2025-07-29T11:24:42.075008 pg_dump,custom,zstd,15,844.7868592739105,388.23959374427795,1233.0264530181885,6726189591,11946069139,11943636115,0.5630462633973209,True,True,,,2025-07-29T11:45:17.885163 pg_dump,custom,zstd,22,5545.566084384918,404.1370210647583,5949.7031054496765,6798947241,11946069139,11943750803,0.5691367731000038,True,True,,,2025-07-29T13:24:30.014902 pg_dump,directory,none,0,114.9900906085968,395.2716040611267,510.2616946697235,6854332396,11946069139,11943693459,0.5737730391684116,True,True,,,2025-07-29T13:33:05.944191 pg_dump,directory,lz4,1,53.48561334609985,384.92091369628906,438.4065270423889,9146095976,11946069139,11943668883,0.7656155233641663,True,True,,,2025-07-29T13:40:30.590719 pg_dump,directory,lz4,5,83.44352841377258,410.42058181762695,493.86411023139954,8925601067,11946069139,11943718035,0.7471579950814815,True,True,,,2025-07-29T13:48:50.201990 pg_dump,directory,lz4,12,114.15110802650452,400.04946303367615,514.2005710601807,8922495788,11946069139,11943758995,0.7468980535924554,True,True,,,2025-07-29T13:57:30.419171 pg_dump,directory,zstd,1,57.22735643386841,414.4600088596344,471.6873652935028,6835014976,11946069139,11943750803,0.5721559867493079,True,True,,,2025-07-29T14:05:28.529630 pg_dump,directory,zstd,5,60.121564865112305,398.27933716773987,458.4009020328522,6773763858,11946069139,11943709843,0.5670286835931563,True,True,,,2025-07-29T14:13:13.472761 pg_dump,directory,zstd,15,372.43965554237366,382.9877893924713,755.427444934845,6725815888,11946069139,11943644307,0.5630149808896062,True,True,,,2025-07-29T14:25:54.580924 pg_dump,directory,zstd,22,2637.47145485878,394.4939453601837,3031.9654002189636,6798573538,11946069139,11943660691,0.5691054905922891,True,True,,,2025-07-29T15:16:29.450828 pg_dump,tar,none,0,126.3212628364563,664.1294028759003,790.4506657123566,9792246784,11946069139,11942759571,0.8197045128452776,True,True,,,2025-07-29T15:29:45.280592 tool,format,compression_method,compression_level,backup_duration_seconds,restore_duration_seconds,total_duration_seconds,backup_size_bytes,database_size_bytes,restored_db_size_bytes,compression_ratio,backup_success,restore_success,backup_error,restore_error,timestamp pg_dump,plain,none,0,100.39210295677185,735.2188968658447,835.6109998226166,9792231003,11946069139,11922173075,0.8197031918249641,True,True,,,2025-07-29T09:56:20.611844 pg_dump,custom,none,0,264.56927490234375,406.6467957496643,671.216070652008,6862699613,11946069139,11943709843,0.5744734550878778,True,True,,,2025-07-29T10:07:37.226681 pg_dump,custom,gzip,1,214.07211470603943,383.0168492794037,597.0889639854431,7074031563,11946069139,11943611539,0.5921639562511493,True,True,,,2025-07-29T10:17:39.801883 pg_dump,custom,gzip,5,260.6179132461548,393.76623010635376,654.3841433525085,6866440205,11946069139,11943718035,0.5747865783384196,True,True,,,2025-07-29T10:28:40.167485 pg_dump,custom,gzip,9,272.3802499771118,385.1409020423889,657.5211520195007,6856264586,11946069139,11943619731,0.5739347819121977,True,True,,,2025-07-29T10:39:42.912960 pg_dump,custom,lz4,1,84.0079517364502,379.6986663341522,463.7066180706024,9146843234,11946069139,11943685267,0.765678075990583,True,True,,,2025-07-29T10:47:32.131593 pg_dump,custom,lz4,5,150.24981474876404,393.44346714019775,543.6932818889618,8926348325,11946069139,11943718035,0.7472205477078983,True,True,,,2025-07-29T10:56:41.333595 pg_dump,custom,lz4,12,220.93980932235718,418.26913809776306,639.2089474201202,8923243046,11946069139,11943767187,0.7469606062188722,True,True,,,2025-07-29T11:07:26.574678 pg_dump,custom,zstd,1,87.83108067512512,419.07846903800964,506.90954971313477,6835388679,11946069139,11943767187,0.5721872692570225,True,True,,,2025-07-29T11:15:59.917828 pg_dump,custom,zstd,5,102.42366409301758,413.64263129234314,516.0662953853607,6774137561,11946069139,11944357011,0.567059966100871,True,True,,,2025-07-29T11:24:42.075008 pg_dump,custom,zstd,15,844.7868592739105,388.23959374427795,1233.0264530181885,6726189591,11946069139,11943636115,0.5630462633973209,True,True,,,2025-07-29T11:45:17.885163 pg_dump,custom,zstd,22,5545.566084384918,404.1370210647583,5949.7031054496765,6798947241,11946069139,11943750803,0.5691367731000038,True,True,,,2025-07-29T13:24:30.014902 pg_dump,directory,none,0,114.9900906085968,395.2716040611267,510.2616946697235,6854332396,11946069139,11943693459,0.5737730391684116,True,True,,,2025-07-29T13:33:05.944191 pg_dump,directory,lz4,1,53.48561334609985,384.92091369628906,438.4065270423889,9146095976,11946069139,11943668883,0.7656155233641663,True,True,,,2025-07-29T13:40:30.590719 pg_dump,directory,lz4,5,83.44352841377258,410.42058181762695,493.86411023139954,8925601067,11946069139,11943718035,0.7471579950814815,True,True,,,2025-07-29T13:48:50.201990 pg_dump,directory,lz4,12,114.15110802650452,400.04946303367615,514.2005710601807,8922495788,11946069139,11943758995,0.7468980535924554,True,True,,,2025-07-29T13:57:30.419171 pg_dump,directory,zstd,1,57.22735643386841,414.4600088596344,471.6873652935028,6835014976,11946069139,11943750803,0.5721559867493079,True,True,,,2025-07-29T14:05:28.529630 pg_dump,directory,zstd,5,60.121564865112305,398.27933716773987,458.4009020328522,6773763858,11946069139,11943709843,0.5670286835931563,True,True,,,2025-07-29T14:13:13.472761 pg_dump,directory,zstd,15,372.43965554237366,382.9877893924713,755.427444934845,6725815888,11946069139,11943644307,0.5630149808896062,True,True,,,2025-07-29T14:25:54.580924 pg_dump,directory,zstd,22,2637.47145485878,394.4939453601837,3031.9654002189636,6798573538,11946069139,11943660691,0.5691054905922891,True,True,,,2025-07-29T15:16:29.450828 pg_dump,tar,none,0,126.3212628364563,664.1294028759003,790.4506657123566,9792246784,11946069139,11942759571,0.8197045128452776,True,True,,,2025-07-29T15:29:45.280592 I had to remove the results for zstd with compression level 15 and zstd with compression level 22 from the graphs. They distorted the graphs significantly due to the long compression time, while not providing any noticeable increase in compression. So, measurements. Backup speed in seconds (lower is better): Zoom image will be displayed Restore speed from backup in seconds (lower is better): Total time for creation and restoration from backup in seconds (lower is better): Total backup size as a percentage of the original database size (smaller is better): Zoom image will be displayed Conclusions based on measurements Before talking about conclusions, I would like to make an important disclaimer: the test was performed on synthetic data, and the results with “real-world” data will differ significantly. The trend will be the same, but the time difference will be greater. the test was performed on synthetic data, and the results with “real-world” data will differ significantly The measurements show that there is no radical difference in speed between the plain format, custom format and directory format on synthetic data. Despite the fact that the custom format is restored in parallel mode relative to plain, and the directory format also creates the copy itself in parallel. The difference in speed between the custom format and plain is ~30%. Between the custom and directory formats, it is only ~20%. I would suppose that the test data lacked a sufficient number of independent tables and objects — otherwise, the gap between the formats would have been multiple times greater. So, based on the measurements, I can make the following conclusions: **The fastest backup format is directory-based. \ The custom format is faster than plain and tar in terms of total time under any circumstances. The directory format is faster than custom under any circumstances. If parallel mode enabled, of course.Moreover, in terms of backup creation speed, the directory format is more than twice as fast as the custom format. This is very important, considering thatwe make backups more often than we restore from them. **The most useful in terms of the “speed and compression level” ratio was zstd with a compression level of 5. \ In terms of backup creation speed, it is only surpassed by uncompressed formats. In terms of recovery speed, it is on average ~4% slower than other formats. At the same time, it has maximum compression comparable to gzip with compression level 9, but outperforms it in speed by ~18%. Taking into account the error margin on synthetic data. zstd 15 and zstd 22 turned out to be useless in this particular test.They gave compression levels roughly the same as gzip 9, but took 2-8x times longer to produce the result.I think that with a database of at least 1 TB without synthetic data and cold storage, they will show completely different results and may turn out to be very cost-effective (especially if you need to make hundreds of backups and store them for a long time). **The fastest backup format is directory-based. \ The custom format is faster than plain and tar in terms of total time under any circumstances. The directory format is faster than custom under any circumstances. If parallel mode enabled, of course.Moreover, in terms of backup creation speed, the directory format is more than twice as fast as the custom format. This is very important, considering thatwe make backups more often than we restore from them. we make backups more often than we restore from them **The most useful in terms of the “speed and compression level” ratio was zstd with a compression level of 5. \ In terms of backup creation speed, it is only surpassed by uncompressed formats. In terms of recovery speed, it is on average ~4% slower than other formats. At the same time, it has maximum compression comparable to gzip with compression level 9, but outperforms it in speed by ~18%. Taking into account the error margin on synthetic data. zstd 15 and zstd 22 turned out to be useless in this particular test.They gave compression levels roughly the same as gzip 9, but took 2-8x times longer to produce the result.I think that with a database of at least 1 TB without synthetic data and cold storage, they will show completely different results and may turn out to be very cost-effective (especially if you need to make hundreds of backups and store them for a long time). zstd 15 and zstd 22 turned out to be useless in this particular test. Conclusion The measurement showed that the most optimal format for my task was a custom format with zstd compression and a compression level of 5. I get the maximum total speed with almost maximum compression and a single backup file. After implementing zstd 5 instead of gzip 6 in the project, the backup size was reduced by almost half with a slightly shorter backup time. At the same time, unlike synthetic data, a 4.7 GB database was compressed to 276 MB (17 times smaller!): Zoom image will be displayed I hope that my test will be useful to those who develop backup tools or regularly dump databases using pg_dump scripts. Perhaps in the future, I will conduct the same test, but with a more diverse data set. And once again, if you need to create regular backups, I have an open source project for this task. I would be extremely grateful for a star on GitHub ❤️, as the first stars are hard to come by. open source project open source project