Amazon Aurora DSQL is a distributed SQL database launched by Amazon Web Services in December 2024. It is designed for building applications with unlimited scalability, high availability, and zero infrastructure management. Aurora DSQL offers high availability, serverless architecture, strong compatibility, fault tolerance, and advanced security.
Because Aurora DSQL’s authentication mechanism integrates with IAM, accessing the database requires generating tokens based on IAM identities. Tokens are valid for only 15 minutes by default, which means many mainstream data synchronization tools currently do not support migrating data from other databases to Aurora DSQL.
In response, the author developed a dedicated sink Connector for Aurora DSQL based on the Apache SeaTunnel data integration tool, enabling seamless migration of data from other databases to Aurora DSQL.
Introduction to SeaTunnel
SeaTunnel is an easy-to-use, multi-modal, high-performance distributed data integration platform, focusing on data integration and synchronization, and aims to solve common problems in the data integration field.
Key Features of SeaTunnel
- Rich and Extensible Connectors: SeaTunnel currently supports over 190 connectors and continues to grow. Mainstream databases like MySQL, Oracle, SQLServer, and PostgreSQL are already supported. Its plugin-based design allows users to develop and integrate custom connectors easily.
- Batch and Stream Integration: Connectors developed with the SeaTunnel Connector API are fully compatible with offline synchronization, real-time synchronization, full-load, and incremental sync scenarios, greatly simplifying task management.
- Distributed Snapshots: Supports distributed snapshot algorithms to ensure data consistency.
- Multi-Engine Support: SeaTunnel uses the SeaTunnel engine (Zeta) by default for data synchronization. It also supports Flink or Spark as execution engines, adapting to existing enterprise technology stacks, with support for multiple versions.
- JDBC Reuse and Multi-Table Log Parsing: SeaTunnel supports multi-table or full-database synchronization, reducing excessive JDBC connections. It also supports log parsing for multiple tables to handle CDC scenarios efficiently.
- High Throughput, Low Latency: Parallel read/write operations provide stable, high-throughput, and low-latency data synchronization.
- Comprehensive Real-Time Monitoring: Monitors every step of data synchronization, showing data volume, size, and QPS.
SeaTunnel Workflow
The workflow shows that users configure job information and select an execution engine. Source Connectors read source data in parallel and send it downstream to Transform or directly to Sink, which writes data to the destination.
Building SeaTunnel from Source
git clone https://github.com/apache/seatunnel.git
cd seatunnel
sh ./mvnw clean install -DskipTests -Dskip.spotless=true
cp seatunnel-dist/target/apache-seatunnel-${version}-bin.tar.gz /The-Path-You-Want-To-Copy
cd /The-Path-You-Want-To-Copy
tar -xzvf "apache-seatunnel-${version}-bin.tar.gz"
After building from source, all connector plugins and necessary dependencies (e.g., MySQL driver) are included in the binary package. Users can use connectors directly without separate installation.
Example: Syncing MySQL Data to Aurora DSQL with SeaTunnel
env {
parallelism = 1
job.mode = "STREAMING"
checkpoint.interval = 6000
checkpoint.timeout = 1200000
}
source {
MySQL-CDC {
username = "user name"
password = "password"
table-names = ["db.table1"]
url = "jdbc:mysql://dbhost:3306/db?useSSL=false&allowPublicKeyRetrieval=true&serverTimezone=UTC&connectTimeout=120000&socketTimeout=120000&autoReconnect=true&failOverReadOnly=false&maxReconnects=10"
table-names-config = [
{
table = "db.table1"
primaryKeys = ["id"]
}
]
}
}
transform {
}
sink {
Jdbc {
url="jdbc:postgresql://<dsql_endpoint>:5432/postgres"
dialect="dsql"
driver = "org.postgresql.Driver"
username = "admin"
access_key_id = "ACCESSKEYIDEXAMPLE"
secret_access_key = "SECRETACCESSKEYEXAMPLE"
region = "us-east-1"
database = "postgres"
generate_sink_sql = true
primary_keys = ["id"]
max_retries="3"
batch_size =1000
}
}
Running the Data Sync Job
Save the above configuration as mysql-to-dsql.conf(replace example values with real parameters) in the config directory of apache-seatunnel-${version}, then run:
cd "apache-seatunnel-${version}"
./bin/seatunnel.sh --config ./config/mysql-to-dsql.conf -m local
After execution, check the logs for task progress. Errors like connection timeouts or missing tables can be identified. Normally, data will be successfully written to Aurora DSQL.
Summary
Aurora DSQL is a secure, scalable, serverless distributed database. Its IAM-based authentication makes data synchronization challenging, especially for real-time scenarios. SeaTunnel is an excellent data integration and sync tool that supports multiple data sources and flexible custom sync needs (full-load or incremental). This article introduced a dedicated Sink Connector for Aurora DSQL to meet these requirements.
References
- SeaTunnel Deployment: https://seatunnel.apache.org/zh-CN/docs/start-v2/locally/deployment
- Developing New SeaTunnel Connectors: https://github.com/apache/seatunnel/blob/dev/seatunnel-connectors-v2/README.zh.md
- Generating Auth Tokens in Aurora DSQL: https://docs.aws.amazon.com/aurora-dsql/latest/userguide/SECTION_authentication-token.html
AWS generative AI services are available in overseas regions; in China, AWS services are operated by Xiyun Data and Guanghuan Xinnet. Refer to regional websites for details.
