This article provides a detailed walkthrough of how to achieve full data synchronization from MySQL to PostgreSQL using Apache SeaTunnel 2.3.9. We cover the complete end-to-end process — from environment setup to production validation. Let’s dive into the MySQL-to-PostgreSQL synchronization scenario. Apache SeaTunnel 2.3.9 Version Requirements: Version Requirements: MySQL: MySQL 8.3 PostgreSQL: PostgreSQL 13.2 Apache SeaTunnel: Apache-SeaTunnel-2.3.9 MySQL: MySQL 8.3 MySQL: PostgreSQL: PostgreSQL 13.2 PostgreSQL: Apache SeaTunnel: Apache-SeaTunnel-2.3.9 Apache SeaTunnel: Preliminaries Verify Version Information Run the following SQL command to check the version: -- Check version information select version(); -- Check version information select version(); Enable Master-Slave Replication -- View replication-related variables show variables where variable_name in ('log_bin', 'binlog_format', 'binlog_row_image', 'gtid_mode', 'enforce_gtid_consistency'); -- View replication-related variables show variables where variable_name in ('log_bin', 'binlog_format', 'binlog_row_image', 'gtid_mode', 'enforce_gtid_consistency'); For MySQL CDC data synchronization, SeaTunnel needs to read the MySQLbinlogand act as a slave node in the MySQL cluster. binlog Note: In MySQL 8.0+,binlogis enabled by default, but replication mode must be enabled manually. Note: In MySQL 8.0+, binlog -- Enable master-slave replication (execute in sequence) -- SET GLOBAL gtid_mode=OFF; -- SET GLOBAL enforce_gtid_consistency=OFF; SET GLOBAL gtid_mode=OFF_PERMISSIVE; SET GLOBAL gtid_mode=ON_PERMISSIVE; SET GLOBAL enforce_gtid_consistency=ON; SET GLOBAL gtid_mode=ON; -- Enable master-slave replication (execute in sequence) -- SET GLOBAL gtid_mode=OFF; -- SET GLOBAL enforce_gtid_consistency=OFF; SET GLOBAL gtid_mode=OFF_PERMISSIVE; SET GLOBAL gtid_mode=ON_PERMISSIVE; SET GLOBAL enforce_gtid_consistency=ON; SET GLOBAL gtid_mode=ON; Grant Necessary User Permissions A user must haveREPLICATION SLAVEandREPLICATION CLIENTprivileges: REPLICATION SLAVE REPLICATION CLIENT -- Grant privileges to the user CREATE USER 'test'@'%' IDENTIFIED BY 'password'; GRANT SELECT, RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'test'; FLUSH PRIVILEGES; -- Grant privileges to the user CREATE USER 'test'@'%' IDENTIFIED BY 'password'; GRANT SELECT, RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'test'; FLUSH PRIVILEGES; SeaTunnel Cluster Setup Cluster Logging By default, SeaTunnel logs output to a single file. For production, it’s preferable to have separate log files per job. Update the logging configuration inlog4j2.properties: log4j2.properties ############################ log output to file ############################# # rootLogger.appenderRef.file.ref = fileAppender # Change log output to use independent log files for each job rootLogger.appenderRef.file.ref = routingAppender ############################ log output to file ############################# ############################ log output to file ############################# # rootLogger.appenderRef.file.ref = fileAppender # Change log output to use independent log files for each job rootLogger.appenderRef.file.ref = routingAppender ############################ log output to file ############################# Client Configuration For production clusters, it is recommended to install SeaTunnel under the/optdirectory and point theSEATUNNEL_HOMEenvironment variable accordingly. /opt SEATUNNEL_HOME If multiple versions exist, create a symbolic link to align with the server deployment directory: # Create a symlink ln -s /opt/apache-seatunnel-2.3.9 /opt/seatunnel # Set environment variable export SEATUNNEL_HOME=/opt/seatunnel # Create a symlink ln -s /opt/apache-seatunnel-2.3.9 /opt/seatunnel # Set environment variable export SEATUNNEL_HOME=/opt/seatunnel Environment Variables Configuration For Linux servers, add the following lines to/etc/profile.d/seatunnel.sh: /etc/profile.d/seatunnel.sh echo 'export SEATUNNEL_HOME=/opt/seatunnel' >> /etc/profile.d/seatunnel.sh echo 'export PATH=$SEATUNNEL_HOME/bin:$PATH' >> /etc/profile.d/seatunnel.sh source /etc/profile.d/seatunnel.sh echo 'export SEATUNNEL_HOME=/opt/seatunnel' >> /etc/profile.d/seatunnel.sh echo 'export PATH=$SEATUNNEL_HOME/bin:$PATH' >> /etc/profile.d/seatunnel.sh source /etc/profile.d/seatunnel.sh Job Configuration Note: The configuration below does not cover all options but illustrates common production settings. env { job.mode = "STREAMING" job.name = "DEMO" parallelism = 3 checkpoint.interval = 30000 # 30 seconds checkpoint.timeout = 30000 # 30 seconds job.retry.times = 3 job.retry.interval.seconds = 3 # 3 seconds } env { job.mode = "STREAMING" job.name = "DEMO" parallelism = 3 checkpoint.interval = 30000 # 30 seconds checkpoint.timeout = 30000 # 30 seconds job.retry.times = 3 job.retry.interval.seconds = 3 # 3 seconds } The first step is setting up theenvmodule, which operates in a streaming mode. Therefore, it’s essential to specify the configuration mode asSTREAMING. env STREAMING Task Naming and Management Configuring a task name is crucial for identifying and managing jobs in a production environment. Naming conventions based on database or table names can help with monitoring and administration. Parallelism Settings Here, we set the parallelism to3, but this value can be adjusted based on the cluster size and database performance. 3 Checkpoint Configuration Checkpoint Frequency: Set to30 seconds. If higher precision is required, this can be reduced to10 secondsor less. Checkpoint Timeout: If a checkpoint takes too long, the job is considered failed. Set to30 seconds. Automatic Retry: Configured to3 retries, with a retry interval of3 seconds(adjustable based on system requirements). Checkpoint Frequency: Set to30 seconds. If higher precision is required, this can be reduced to10 secondsor less. Checkpoint Frequency 30 seconds 10 seconds Checkpoint Timeout: If a checkpoint takes too long, the job is considered failed. Set to30 seconds. Checkpoint Timeout 30 seconds Automatic Retry: Configured to3 retries, with a retry interval of3 seconds(adjustable based on system requirements). Automatic Retry 3 retries 3 seconds source { MySQL-CDC { base-url = "jdbc:mysql://192.168.8.101:3306/test?serverTimezone=Asia/Shanghai" username = "test" password = "123456" database-names = ["test"] # table-names = ["test.test_001","test.test_002"] table-pattern = "test\\.test_.*" # The first dot is a literal character, requiring escaping; the second dot represents any single character. table-names-config = [ {"table":"test.test_002","primaryKeys":["id"]} ] startup.mode = "initial" # First sync all historical data, then incremental updates snapshot.split.size = "8096" snapshot.fetch.size = "1024" server-id = "6500-8500" connect.timeout.ms = 30000 connect.max-retries = 3 connection.pool.size = 20 exactly_once = false # In analytical scenarios, disabling exactly-once consistency allows some duplicates/losses for better performance. schema-changes.enabled = true # Enable schema evolution to avoid frequent modifications; supports add, rename, drop operations. } } source { MySQL-CDC { base-url = "jdbc:mysql://192.168.8.101:3306/test?serverTimezone=Asia/Shanghai" username = "test" password = "123456" database-names = ["test"] # table-names = ["test.test_001","test.test_002"] table-pattern = "test\\.test_.*" # The first dot is a literal character, requiring escaping; the second dot represents any single character. table-names-config = [ {"table":"test.test_002","primaryKeys":["id"]} ] startup.mode = "initial" # First sync all historical data, then incremental updates snapshot.split.size = "8096" snapshot.fetch.size = "1024" server-id = "6500-8500" connect.timeout.ms = 30000 connect.max-retries = 3 connection.pool.size = 20 exactly_once = false # In analytical scenarios, disabling exactly-once consistency allows some duplicates/losses for better performance. schema-changes.enabled = true # Enable schema evolution to avoid frequent modifications; supports add, rename, drop operations. } } Key MySQL CDC Configurations Time Zone Configuration: It’s recommended to specify the MySQL connection timezone to prevent discrepancies when extractingdatetimeortimestampdata. User Credentials: Time Zone Configuration: It’s recommended to specify the MySQL connection timezone to prevent discrepancies when extractingdatetimeortimestampdata. Time Zone Configuration: It’s recommended to specify the MySQL connection timezone to prevent discrepancies when extractingdatetimeortimestampdata. Time Zone Configuration datetime timestamp User Credentials: User Credentials: User Credentials The username and password must have replication privileges , allowing access to the bin_log logs. The account should be able to query all tables under the designated databases. The username and password must have replication privileges , allowing access to the bin_log logs. username password replication privileges bin_log The account should be able to query all tables under the designated databases. Database & Table Selection Typically, each database is assigned to a separate task. Here, we specify only thetestdatabase.Two methods can be used: test Direct table name selection Regular expression-based table matching(recommended for large datasets or entire database synchronization). Direct table name selection Direct table name selection Regular expression-based table matching(recommended for large datasets or entire database synchronization). Regular expression-based table matching Important:When using regular expressions, both the database name and table name must be included. The.character, which separates them, must be escaped (\\.). Important:When using regular expressions, both the database name and table name must be included. The.character, which separates them, must be escaped (\\.). regular expressions database name table name . \\. For example, to match tables prefixed withtest_, we use: test_ test\\.test_.* test\\.test_.* The first dot (.) represents a literal separator, requiring escaping (\\.). The second dot (.) representsany single characterin regex. The first dot (.) represents a literal separator, requiring escaping (\\.). . \\. The second dot (.) representsany single characterin regex. . any single character Additionally, for tables without primary keys, logical primary keys can be specified manually to facilitate data synchronization. without primary keys Startup Mode The default startup mode is “initial”, which means: Full historical data syncfirst Incremental updatesafterward Full historical data syncfirst Full historical data sync Incremental updatesafterward Incremental updates Sharding & Fetching Sharding & Fetching The default values for shard size and batch fetch size work well. If the server has higher performance, these values can be increased. The default values for shard size and batch fetch size work well. default values shard size batch fetch size If the server has higher performance, these values can be increased. higher performance Server ID Server ID MySQL requires unique server IDs for replication nodes. Apache SeaTunnel must masquerade as a MySQL replica. If not configured, a default value is used, but manual specification is recommended to avoid conflicts. The server ID range must be greater than the parallelism level, or errors may occur. MySQL requires unique server IDs for replication nodes. unique server IDs Apache SeaTunnel must masquerade as a MySQL replica. masquerade as a MySQL replica If not configured, a default value is used, but manual specification is recommended to avoid conflicts. manual specification is recommended The server ID range must be greater than the parallelism level, or errors may occur. server ID range must be greater than the parallelism level Timeouts & Retries Connection Timeout: For large datasets, increase this value accordingly. Auto-Retry Interval: If handling a high volume of tables, consider extending retry intervals. Connection Timeout: For large datasets, increase this value accordingly. Connection Timeout Auto-Retry Interval: If handling a high volume of tables, consider extending retry intervals. Auto-Retry Interval Exactly-Once Consistency For CDC-based data synchronization,exactly-once consistency is often not required in analytical scenarios. CDC-based data synchronization exactly-once consistency not required Disablingit can significantly boost performance. However, if strict consistency is required, it can be enabled at the cost of reduced performance. Disablingit can significantly boost performance. Disabling boost performance However, if strict consistency is required, it can be enabled at the cost of reduced performance. can be enabled Schema Evolution It’s highly recommended to enable schema evolution, which: highly recommended enable schema evolution Allows automatic table modifications (e.g., adding/removing fields) Reduces the need for manual job updates when the schema changes Allows automatic table modifications (e.g., adding/removing fields) Allows automatic table modifications (e.g., adding/removing fields) automatic table modifications Reduces the need for manual job updates when the schema changes Reduces the need for manual job updates when the schema changes manual job updates However,downstream tasks may fail if they rely on a field that was modified.Supported schema changes:✔️ADD COLUMN✔️DROP COLUMN✔️RENAME COLUMN✔️MODIFY COLUMN downstream tasks ADD COLUMN DROP COLUMN RENAME COLUMN MODIFY COLUMN Note: Schema evolution does not supportCREATE TABLEorDROP TABLE. Note: Schema evolution does not supportCREATE TABLEorDROP TABLE. Note: does not support CREATE TABLE DROP TABLE Configuring the Sink (PostgreSQL) The sink configuration inserts data into PostgreSQL. sink PostgreSQL sink { jdbc { url = "jdbc:postgresql://192.168.8.101:5432/test" driver = "org.postgresql.Driver" user = "postgres" password = "123456" generate_sink_sql = true database = "test" table = "${database_name}.${table_name}" schema_save_mode = "CREATE_SCHEMA_WHEN_NOT_EXIST" data_save_mode = "APPEND_DATA" # enable_upsert = false } } sink { jdbc { url = "jdbc:postgresql://192.168.8.101:5432/test" driver = "org.postgresql.Driver" user = "postgres" password = "123456" generate_sink_sql = true database = "test" table = "${database_name}.${table_name}" schema_save_mode = "CREATE_SCHEMA_WHEN_NOT_EXIST" data_save_mode = "APPEND_DATA" # enable_upsert = false } } Key Considerations: Key Considerations: JDBC Connection: JDBC Connection Specify PostgreSQL driver, user, and password. Specify PostgreSQL driver, user, and password. driver, user, and password Auto SQL Generation: Auto SQL Generation Enabling generate_sink_sql lets SeaTunnel automatically create tables and generate INSERT,DELETE, andUPDATEstatements. Enabling generate_sink_sql lets SeaTunnel automatically create tables and generate INSERT,DELETE, andUPDATEstatements. generate_sink_sql INSERT DELETE UPDATE Schema Handling: Schema Handling PostgreSQL uses Database → Schema → Table, while MySQL has only Database → Table. Ensure the schema is correctly mapped to avoid data mismatches. PostgreSQL uses Database → Schema → Table, while MySQL has only Database → Table. Database → Schema → Table Database → Table Ensure the schema is correctly mapped to avoid data mismatches. schema is correctly mapped User Permissions: User Permissions The PostgreSQL user must have table creation permissions if using auto-schema generation. The PostgreSQL user must have table creation permissions if using auto-schema generation. The PostgreSQL user must have table creation permissions if using auto-schema generation. user must have table creation permissions For more details, refer to the official documentation:🔗SeaTunnel MySQL-CDC Connector Docs For more details, refer to the official documentation:🔗SeaTunnel MySQL-CDC Connector Docs 🔗SeaTunnel MySQL-CDC Connector Docs SeaTunnel MySQL-CDC Connector Docs Using Placeholders in Sink Configuration Apache SeaTunnel supports placeholders, which dynamically adjust table names based on the source data. placeholders For example: table = "${database\_name}.${table\_name}" table = "${database\_name}.${table\_name}" Ensures each table syncs correctly without manual specification. Supports concatenation and dynamic formatting. Ensures each table syncs correctly without manual specification. manual specification Supports concatenation and dynamic formatting. concatenation and dynamic formatting Schema Save Mode and Data Append Strategy Theschema_save_modeparameter plays a crucial role in database-wide synchronization. It simplifies the process by automatically creating tables in the target database, eliminating the need for manual table creation steps. schema_save_mode database-wide synchronization automatically creating tables Another key configuration isAPPEND_DATA, which is particularly useful when the target database already contains previously synchronized data. This setting prevents the accidental deletion of existing records , making it a safer choice for most scenarios. However, if your use case requires a different approach, you can modify this setting according to the official documentation guidelines. APPEND_DATA previously synchronized data prevents the accidental deletion of existing records safer official documentation guidelines Enable Upsert for Performance Optimization Another important parameter isenable_upsert. If you can guarantee that the source data contains no duplicate records , disabling upsert (enable_upsert = false) can significantly enhance synchronization performance . This is because, without upsert, the system does not need to check for existing records before inserting new ones. enable_upsert can guarantee that the source data contains no duplicate records enable_upsert = false significantly enhance synchronization performance However, if there is a possibility of duplicate records in the source data, it is strongly recommended to keep **Upsert enabled (enable_upsert = true). This ensures that records are inserted or updated based on their primary key , preventing duplication issues. strongly recommended enable_upsert = true primary key For detailed parameter explanations and further customization options , please refer to the official Apache SeaTunnel documentation. detailed parameter explanations and further customization options official Apache SeaTunnel documentation Task Submission and Monitoring Once your configuration file is ready, submit the job using the SeaTunnel command-line tool: ./bin/start-seatunnel.sh --config /path/to/config.yaml --async ./bin/start-seatunnel.sh --config /path/to/config.yaml --async Key Parameters: Key Parameters: --config: Specifies the path to your configuration file. --async: Submits the job asynchronously, allowing the command line to exit while the job continues in the background. --config: Specifies the path to your configuration file. --config --async: Submits the job asynchronously, allowing the command line to exit while the job continues in the background. --async After submission, you can monitor the job via SeaTunnel’s cluster UI. In version 2.3.9, SeaTunnel provides a visual interface where you can view job logs, execution status, and data throughput details. Data Synchronization Demonstration For this demonstration, we created two tables (test_001andtest_002) and inserted sample data into MySQL. Using SeaTunnel's synchronization tasks, the data was successfully synchronized to PostgreSQL. The demonstration included insertions, deletions, updates, and even table schema modifications—all of which were reflected in real time on PostgreSQL. test_001 test_002 Key Points: Key Points: Schema Synchronization: SeaTunnel supports automatic table schema synchronization. When the source MySQL table structure changes, the target PostgreSQL table automatically updates. Data Consistency: SeaTunnel ensures data consistency by accurately synchronizing all insert, delete, and update operations to the target database. Schema Synchronization: SeaTunnel supports automatic table schema synchronization. When the source MySQL table structure changes, the target PostgreSQL table automatically updates. Data Consistency: SeaTunnel ensures data consistency by accurately synchronizing all insert, delete, and update operations to the target database. About SeaTunnel Apache SeaTunnel focuses on data integration and synchronization, addressing common challenges such as: Diverse Data Sources: Supporting hundreds of data sources, even as new ones emerge. Complex Sync Scenarios: Including full, incremental, CDC, real-time, and whole-database synchronizations. High Resource Demands: Traditional tools often require extensive computing or JDBC resources for real-time sync of many small tables. Monitoring and Quality: Sync processes can suffer from data loss or duplication, and effective monitoring is essential. Complex Technology Stacks: Multiple sync programs may be needed for different systems. Management Challenges: Offline and real-time sync are often developed and managed separately, increasing complexity. Diverse Data Sources: Supporting hundreds of data sources, even as new ones emerge. Complex Sync Scenarios: Including full, incremental, CDC, real-time, and whole-database synchronizations. High Resource Demands: Traditional tools often require extensive computing or JDBC resources for real-time sync of many small tables. Monitoring and Quality: Sync processes can suffer from data loss or duplication, and effective monitoring is essential. Complex Technology Stacks: Multiple sync programs may be needed for different systems. Management Challenges: Offline and real-time sync are often developed and managed separately, increasing complexity.