As the leading fulfilment platform for digital commerce in India, Delhivery fulfils a million packages a day, 365 days a year. Its 24 automated sort centres, 101 hubs, 3,100+ direct delivery centres, 1000+ partner centres, 11,000+ fleet, and 60,000+ team members run smoothly thanks to a vast network of IoT devices. Thousands of data events and messages are coming in and going out of our pipelines each second. This amounts to a massive daily data volume in terabytes, which makes operational visibility crucial for us and our stakeholders.
Recognising the requirements, we decided to build data marts—centralised, eventually consistent databases that offer users quick access to pre-aggregated business data. This allows our stakeholders to quickly access business insights without searching through an entire data warehouse.
However, with this daunting scale, one of the major challenges was to maintain data integrity and low latency while providing the capacity for analytical workloads.
In this blog, I am going to unpack all of my leanings while migrating our data marts from Amazon Aurora to TiDB, a Hybrid Transactional/Analytical Processing (HTAP), distributed SQL database. Hopefully, this post can provide insights to data engineering leaders, database administrators, or data architects who are considering a similar migration to TiDB or any other HTAP database.
To better understand the real-time data marts case at Delhivery, let’s first get familiar with three concepts that are at the core of our use case: OLTP, OLAP & HTAP:
Real-time data marts differ from traditional data marts in that they ingest data in real time, not at specific intervals. These data marts are critical for ground operational decision-making at Delhivery because we can't afford any delay in synchronising these events.
Our real-time data mart journey began in 2020 when we identified a need for centralised dashboards—specifically the EYE dashboard. This dashboard's purpose was to provide real-time operational visibility to ground operations, enabling decision-making based on up-to-the-minute data. Examples of usages include:
We thought of solving our use cases using data warehouse tools like Redshift, and Snowflake but none of these solutions worked for us considering the design pattern and requirement for real-time data ingestion along with merge.
Thus, we initially chose Aurora (PostgreSQL) to serve our data mart use case.
We architected our real-time data marts using Spark Streaming and Aurora. Our steaming pipeline was very simple--reading data from Kafka, processing data in Spark micro batches, and performing upsert operations in Aurora.
Our database was modelled using a multi-layered architecture, which consists of a raw layer, a partitioned layer, and a data marts layer. Users did not have access to view or modify data in the raw layer. The partitioned layer is kept to maintain all partitioned tables (dimension tables generally). Below is a simple schema design of our database:
The system initially performed well, until it had to handle a throughput beyond 3K messages per second. This marked the onset of several challenges:
Scalability limitation: As we exceeded a throughput of 3K messages per second, Aurora's Input/Output Operations Per Second (IOPS) limitations became a bottleneck. The scalability constraint had started impacting our operations.
Bloating issue: Every record update led to the creation of a new record and a dead tuple (previous version of the record). When the production rate of these dead tuples outpaced the cleanup process, bloating occurred. Since VACUUM FULL was not able to claim the storage, the disk usage continuously increased. For roughly 5 TB of data, Aurora was using 30+ TB of storage.
Maintenance burden: The bloating issue is directly tied to our maintenance challenges. With over 70 pipelines and a total write QPS surpassing 5k messages/second, we found PostgreSQL's auto cleanup process, Auto Vacuum, failed to keep pace with the rate of dead tuple generation. Therefore, manually running VACUUM or VACUUM FULL is required to recover the database. Our attempts with PostgreSQL tools like pg_repack and pgcompacttable also proved unsuccessful. Consequently, maintenance became increasingly complex and time-consuming.
To resolve Aurora’s limitations, we set out to find a better alternative that met the following requirements:
Considering all the above requirements, we initially explored many PostgreSQL alternatives including Spanner and Yugabyte because we wanted to keep our change management minimal.
Spanner is a distributed SQL database management and storage service offered by Google. It is fully managed on the Google Cloud Platform (GCP). However, we found that Spanner might not be a good use case for our architecture due to the following reasons:
YugabyteDB is a high-performance transactional distributed SQL database for cloud-native applications, developed by Yugabyte. This database is very close to our use case because it was fully PostgreSQL compliant, horizontally scalable, and fully distributed. Unfortunately, it didn’t work as well because of its limitation with scalability, Our success criteria demanded 7k+ transactions per second but Yugabyte was only able to scale up to 5k.
We also looked into other possible candidates like BigQuery, but none of them served our requirements well.
After the above PostgreSQL alternatives, we decided to add HTAP to our requirements, which led us to TiDB. It supports out-of-the-box scalability, consistency, availability, multi-site deployment topology, and many more features. As a distributed database, TiDB has multiple components that communicate with each other and form a complete TiDB system.
The following features of TiDB addressed our key challenges and met our operational requirements:
Easy scaling
The TiDB architecture design separates computing from storage, letting you scale out or scale in the computing or storage capacity online as needed. The scaling process is transparent to application operations and maintenance staff.
ACID compliant
TiDB is MySQL-compliant and supports transactions out of the box. It supports both optimistic & pessimistic types of transactions. This makes it unique from other databases.
Highly available
TiKV stores data in multiple replicas and uses the Multi-Raft protocol to obtain the transaction log. A transaction can only be committed when the data has been successfully written into the majority of replicas. This guarantees strong consistency and high availability when a minority of replicas go down.
Real-time HTAP
TiDB combines both row storage (TiKV) and columnar storage (TiFlash) in the same architecture, forming a streamlined tech stack that makes it easier to produce real-time analytics on operational data.
Our TiDB infrastructure is deployed over the VMs of leading cloud service providers. We use TiUP, TiDB’s package manager, to manage the cluster and all the administrative operations. Our cluster is deployed over 3 available zones (AZs).
Our cluster configurations are as follows:
By deploying our TiDB cluster across multiple AZs and carefully selecting node types to meet our processing and memory needs, we've created a robust, highly available infrastructure capable of handling our high data throughput requirements.
To make it work for our use case, we worked closely with the PingCAP team to tune the database. Here are some of the critical adjustments we made:
Set the following parameters before starting the index.
SET @@global.tidb_ddl_reorg_worker_cnt = 16;
SET @@global.tidb_ddl_reorg_batch_size = 4096;
Reset to default values after index creation.
SET @@global.tidb_ddl_reorg_worker_cnt = 4;
SET @@global.tidb_ddl_reorg_batch_size = 256;
This is mainly important for partitioned tables. It analyses the filter conditions in query statements and eliminates (prunes) partitions when they do not contain any required data.
SET @@session.tidb_partition_prune_mode = 'dynamic';
Sometimes the auto analyser in TiDB fails if a high volume of data is ingested. In that case, all the queries might use the wrong execution plan and end up scanning the full table. To avoid such a situation we made the following changes in TiDB configurations:
set global tidb_max_auto_analyze_time = 86400;
set global tidb_enable_pseudo_for_outdated_stats = off;
set global tidb_sysproc_scan_concurrency = 15;
If you are working with partitioned tables, we suggest you run analyze table operations manually for one partition at a time to avoid analyzing failures.
Through adjustments like these, we were able to effectively streamline our use of TiDB, so that we can achieve an optimal performance for our real-time data mart.
Improved queries performance
We have bench-marked 400+ queries and found that all the queries are running within SLA. We have even seen a 15-20% performance gain of P95 queries.
Easy migration
We used the TiDB Lighting tool to migrate all of our table's historical data from Postgres to TiDB. This tool is very easy to use and very fast. We were able to load terabytes of data within roughly 2-3 hours. However, it’s worth noting that there is a lot of tuning required before loading such huge data.
Strong support
We went through a couple of hiccups during the production infrastructure setup but the PingCAP support team played a very crucial role and helped us tune the cluster for the nature of the workload.
In this post, we have explored the challenges of using Aurora with our use case of real-time data marts and the migration journey to TiDB. We also discussed how Delhivery is using TiDB at scale.
Despite our success with TiDB, we acknowledge that no solution is perfect, and effectiveness can vary depending on the use case. In TiDB, we noted a couple of areas for improvement, including the lack of out-of-the-box support for materialized views and native quota management. However, with appropriate workarounds and adjustments, we have managed to address these limitations effectively.
So far, we have deployed TiDB in our production environment. Based on our benchmarks, TiDB enables us to handle over thousands of requests per second with less than 100ms latency. Moving forward, we will continue to explore more use cases that require a robust, consistently distributed database.
https://docs.pingcap.com/tidb/stable/tidb-lightning-overview
https://reorg.github.io/pg_repack/
https://github.com/dataegret/pgcompacttable
https://cloud.google.com/spanner
https://www.yugabyte.com/yugabytedb/
https://cloud.google.com/bigquery/
https://docs.pingcap.com/tidb/dev/transaction-overview
https://proxysql.com/
Hari Kishan (Senior Engineering Manager @ Delhivery)
Akash Deep Verma (Director of Technology @ Delhivery)