X2SeaTunnel: One-Click Migration from DataX/Sqoop to Apache SeaTunnel

Preface: Migrating tens of thousands of data integration jobs (for example, DataX jobs) to Apache SeaTunnel is a tedious task. To solve this problem, X2SeaTunnel was created. It is a generic configuration conversion tool for transforming configuration files from multiple data integration tools (such as DataX, Sqoop, etc.) into SeaTunnel format, helping users migrate smoothly to the SeaTunnel platform. X2SeaTunnel Meanwhile, this tool is also a meaningful practice of AI Coding and Vibe Coding, so in this article the author also shares insights on using AI to complete product, architecture, code, and delivery in a short time. Currently, X2SeaTunnel is still at its first-version stage — we hope more friends will join to co-build and share the gains. Data integration scenario brief For customer-facing scenarios, we built a data integration product based on open-source Apache SeaTunnel + FlinkCDC, using the Flink engine under the hood, to support massive data synchronization to lake-house platforms. Our data sources mainly include various databases, data warehouses, data lakes, as well as Kafka, HTTP and so on. Typical data targets are Hive, Doris, Iceberg, etc. Apache SeaTunnel + FlinkCDC Thanks to the evolution of the open-source community, we were able to obtain core data integration capability at a relatively high ROI, so we focused R&D on improving integration reliability and usability. Also, we gradually feed back discovered bugs and designed features from our scenarios to the open-source community for co-construction and win-win. For example, Hive overwrite writes, Hive auto-create-table, and other commonly requested features — Flink 1.20 support, X2SeaTunnel, SBS data sharding algorithm, and other new features have already been or are planned to be contributed to the Apache SeaTunnel community. On the Flink engine layer, we have many scene-specific issues. Recently, we have also solved 2PC reliability in SeaTunnel on Flink Streaming mode, including data loss and resume-from-breakpoint problems. FunctionDescriptionContribution PlanHive Coverage ImportA demand required by many customersContributedHive Automatic Table CreationQuite convenientModifying and contributingSeaTunnel on Flink Streaming Mode 2PC ReliabilitySeaTunnel on Flink 1.15+ is unavailable in Streaming mode. For example, Hive and Iceberg have data loss. A large-scale modification of the Flink translation module has been made in SeaTunnel 1.20.0Flink 1.20 is being modified and contributed. There are many related points, and the community jointly maintains the Flink connectorSampled Balanced ShardingThe first two sharding algorithms in SeaTunnel have uneven sharding in scenarios with large data volume and data skew, and the sharding is also slow, leading to timeouts and performance degradation. The algorithm has been optimized. There's a long story to tell...If community partners need it, it can be contributed laterX2SeaTunnelHistorical migration from DataX and other tools to SeaTunnel, with a lot of manpower input.Contributed, welcome to build togetherSome Small FunctionsIssues such as JDBC, Iceberg time zones, format and Doris small write support.Contributed. This topic......... FunctionDescriptionContribution PlanHive Coverage ImportA demand required by many customersContributedHive Automatic Table CreationQuite convenientModifying and contributingSeaTunnel on Flink Streaming Mode 2PC ReliabilitySeaTunnel on Flink 1.15+ is unavailable in Streaming mode. For example, Hive and Iceberg have data loss. A large-scale modification of the Flink translation module has been made in SeaTunnel 1.20.0Flink 1.20 is being modified and contributed. There are many related points, and the community jointly maintains the Flink connectorSampled Balanced ShardingThe first two sharding algorithms in SeaTunnel have uneven sharding in scenarios with large data volume and data skew, and the sharding is also slow, leading to timeouts and performance degradation. The algorithm has been optimized. There's a long story to tell...If community partners need it, it can be contributed laterX2SeaTunnelHistorical migration from DataX and other tools to SeaTunnel, with a lot of manpower input.Contributed, welcome to build togetherSome Small FunctionsIssues such as JDBC, Iceberg time zones, format and Doris small write support.Contributed. This topic......... FunctionDescriptionContribution Plan FunctionDescriptionContribution Plan Function Description Contribution Plan Hive Coverage ImportA demand required by many customersContributedHive Automatic Table CreationQuite convenientModifying and contributingSeaTunnel on Flink Streaming Mode 2PC ReliabilitySeaTunnel on Flink 1.15+ is unavailable in Streaming mode. For example, Hive and Iceberg have data loss. A large-scale modification of the Flink translation module has been made in SeaTunnel 1.20.0Flink 1.20 is being modified and contributed. There are many related points, and the community jointly maintains the Flink connectorSampled Balanced ShardingThe first two sharding algorithms in SeaTunnel have uneven sharding in scenarios with large data volume and data skew, and the sharding is also slow, leading to timeouts and performance degradation. The algorithm has been optimized. There's a long story to tell...If community partners need it, it can be contributed laterX2SeaTunnelHistorical migration from DataX and other tools to SeaTunnel, with a lot of manpower input.Contributed, welcome to build togetherSome Small FunctionsIssues such as JDBC, Iceberg time zones, format and Doris small write support.Contributed. This topic......... Hive Coverage ImportA demand required by many customersContributed Hive Coverage Import A demand required by many customers Contributed Hive Automatic Table CreationQuite convenientModifying and contributing Hive Automatic Table Creation Quite convenient Modifying and contributing SeaTunnel on Flink Streaming Mode 2PC ReliabilitySeaTunnel on Flink 1.15+ is unavailable in Streaming mode. For example, Hive and Iceberg have data loss. A large-scale modification of the Flink translation module has been made in SeaTunnel 1.20.0Flink 1.20 is being modified and contributed. There are many related points, and the community jointly maintains the Flink connector SeaTunnel on Flink Streaming Mode 2PC Reliability SeaTunnel on Flink 1.15+ is unavailable in Streaming mode. For example, Hive and Iceberg have data loss. A large-scale modification of the Flink translation module has been made in SeaTunnel 1.20.0 Flink 1.20 is being modified and contributed. There are many related points, and the community jointly maintains the Flink connector Sampled Balanced ShardingThe first two sharding algorithms in SeaTunnel have uneven sharding in scenarios with large data volume and data skew, and the sharding is also slow, leading to timeouts and performance degradation. The algorithm has been optimized. There's a long story to tell...If community partners need it, it can be contributed later Sampled Balanced Sharding The first two sharding algorithms in SeaTunnel have uneven sharding in scenarios with large data volume and data skew, and the sharding is also slow, leading to timeouts and performance degradation. The algorithm has been optimized. There's a long story to tell... If community partners need it, it can be contributed later X2SeaTunnelHistorical migration from DataX and other tools to SeaTunnel, with a lot of manpower input.Contributed, welcome to build together X2SeaTunnel Historical migration from DataX and other tools to SeaTunnel, with a lot of manpower input. Contributed, welcome to build together Some Small FunctionsIssues such as JDBC, Iceberg time zones, format and Doris small write support.Contributed. This topic Some Small Functions Issues such as JDBC, Iceberg time zones, format and Doris small write support. Contributed. This topic ......... ... ... ... X2SeaTunnel: design, development, and delivery Let’s get into the main topic — I take this opportunity to summarize the design, development, and delivery of X2SeaTunnel. Scenario and requirements analysis for X2SeaTunnel In the AI era — especially entering the Agentic era — code becomes cheap, so thinking about whether a feature is worth doing becomes more important. During X2SeaTunnel’s development, I spent significant energy on thinking about these questions. Is this a real scenario demand? Is this a real scenario demand? As the mindmap above shows, X2SeaTunnel’s scenario comes from migrations and upgrades of data platforms. When migrating and upgrading a data platform to a lake-house unified platform, there are many steps and many details. Among them, upgrading data integration jobs is particularly painful: many customers built data integration platforms years ago based on open-source components like DataX and Sqoop. When migrating to a lake-house platform, the thousands of data integration jobs often become a project “roadblock”. Unlike SQL, which has many open conversion tools, when migrating DataX and Sqoop jobs to lake-house platforms, each company’s integration jobs are different — processing those tasks is labor-intensive. From this scenario, if there is a tool to upgrade integration jobs, it would be very valuable. Who are the target users for this requirement? Who are the target users for this requirement? Our current target customers are developers or delivery engineers. So there is no need to design a complex UI — a usable CLI is most appropriate. Can this requirement be standardized? Can this requirement be standardized? This requirement has considerable community demand; some people have implemented related tools but didn’t open-source them because standardization is hard. Although you can quickly customize for each customer, it’s difficult to be universally applicable. For example, DataX’s MySQL source writing to Hive sink has many different scenarios and coding patterns; the conversion rules for different situations are hard to reuse. Therefore, we should not pursue a perfect one-shot conversion. Just like we shouldn’t expect AI to write perfectly correct code on the first try, we design for a “human + tool” hybrid process that supports secondary modifications. A template system is important. Is this suitable for open-source co-construction? Is this suitable for open-source co-construction? Since Apache SeaTunnel has many sources and sinks, and the needs of companies vary, one company cannot cover all needs. If we can co-develop under shared conventions, X2SeaTunnel will become more useful through community contributions. Is this suitable for AI co-development? Is this suitable for AI co-development? X2SeaTunnel is relatively decoupled, won’t affect production severely, and can be validated quickly — suitable for AI Coding. So from architecture design to coding and to delivery, AI participated heavily; most code was AI-generated. (Recording time matters: implementation was in June 2025. AI evolves monthly — by October 2025 Agent modes can cover lower-level and more complex requirements.) After discussing with AI and thinking through the above, I decided to seriously implement X2SeaTunnel using AI Coding and open-source it. Product design for X2SeaTunnel Even a small tool needs product thinking. By defining boundaries and simplifying flows, we can balance ROI and standardization. https://github.com/apache/seatunnel/issues/9507 https://github.com/apache/seatunnel/issues/9507 https://github.com/apache/seatunnel/issues/9507 The core concepts of our tool are: Lightweight & Simple: Keep the tool lightweight and efficient, focusing on configuration format conversion.Usability: Provide multiple usage modes — SDK, CLI, single-file and bulk conversion, meeting different scenarios.Unified Framework: Build a general framework supporting config conversion for multiple integration tools.Extensibility: Plugin-style design — adding new data type conversions only requires editing config/templates, no code recompilation. Lightweight & Simple: Keep the tool lightweight and efficient, focusing on configuration format conversion. Lightweight & Simple Usability: Provide multiple usage modes — SDK, CLI, single-file and bulk conversion, meeting different scenarios. Usability Unified Framework: Build a general framework supporting config conversion for multiple integration tools. Unified Framework Extensibility: Plugin-style design — adding new data type conversions only requires editing config/templates, no code recompilation. Extensibility X2SeaTunnel architecture design Overall flow As shown above, the overall logic includes the following steps: Script invocation and tool triggerExecute sh bin/X2SeaTunnel.sh --config conversion.yaml to call the X2SeaTunnel jar tool. The tool relies on conversion.yaml (optional) or CLI parameters to start the conversion process.Jar core initializationAt runtime, the Jar infers which SeaTunnel connector type the source config (DataX, Sqoop, etc.) should map to according to the source config and parameters, laying the groundwork for field matching and file conversion.Rule matching & field filling stageTraverse connectors and, using the mapping rules library, extract and fill corresponding fields from DataX’s JSON files. Output the field and connector matching status to show what was adapted during conversion.Conversion output stage4.1 Config file conversion: Fill templates and generate SeaTunnel-compatible HOCON/JSON files, output to a target directory.4.2 Conversion report output: Traverse source configs to produce a conversion report (convert report) recording details and matching results — for manual inspection and verification to ensure conversion quality.Rules iteration stageBased on real conversion scenarios, continuously improve the mapping rules library to cover more data transformation needs and optimize X2SeaTunnel’s adaptability.After the rules engine matures, adding new conversion rules only requires modifying the mapping-rule library to quickly support new source types.Using summarized prompts, AI models can quickly generate mapping rules.The entire flow is rule-driven with human verification, helping migration of data sync jobs to Apache SeaTunnel and supporting feature delivery and iteration. Script invocation and tool trigger Script invocation and tool trigger Execute sh bin/X2SeaTunnel.sh --config conversion.yaml to call the X2SeaTunnel jar tool. The tool relies on conversion.yaml (optional) or CLI parameters to start the conversion process. sh bin/X2SeaTunnel.sh --config conversion.yaml conversion.yaml Jar core initialization Jar core initialization At runtime, the Jar infers which SeaTunnel connector type the source config (DataX, Sqoop, etc.) should map to according to the source config and parameters, laying the groundwork for field matching and file conversion. Rule matching & field filling stage Rule matching & field filling stage Traverse connectors and, using the mapping rules library, extract and fill corresponding fields from DataX’s JSON files. Output the field and connector matching status to show what was adapted during conversion. Conversion output stage Conversion output stage 4.1 Config file conversion: Fill templates and generate SeaTunnel-compatible HOCON/JSON files, output to a target directory. Config file conversion 4.2 Conversion report output: Traverse source configs to produce a conversion report (convert report) recording details and matching results — for manual inspection and verification to ensure conversion quality. Conversion report output Rules iteration stage Rules iteration stage Based on real conversion scenarios, continuously improve the mapping rules library to cover more data transformation needs and optimize X2SeaTunnel’s adaptability. After the rules engine matures, adding new conversion rules only requires modifying the mapping-rule library to quickly support new source types. Using summarized prompts, AI models can quickly generate mapping rules. The entire flow is rule-driven with human verification, helping migration of data sync jobs to Apache SeaTunnel and supporting feature delivery and iteration. Key design questions and discussions During design, I discussed many questions with AI and the community; here are some highlights: Use Python or Java for implementation?Python is quick and simple with less code.✓ Java matches the project, can be used as an SDK, and is better for distribution.Can AI replace X2SeaTunnel? Or call AI to implement this function?✓ X2SeaTunnel still cannot be replaced by AI.□ AI hallucination issue□ High AI cost□ In batch scenarios, AI is difficult to ensure consistency.Details: Implementation idea of configuration conversion centered on pull-based mode✓ Take "pull-based mapping" as the core to ensure the integrity of target configuration;✓ Push-based mode is used to generate reports, and manual inspection is performed to check for missing fields.How to meet the respective special conversion needs of different users?✓ Template system: Compatible with Jinja2-style template syntax✓ Custom configuration: Implement respective special scenarios✓ Conversion report: Facilitate manual inspection as a fallback✓ Filter: Support implementing complex functions through configuration Use Python or Java for implementation? Use Python or Java for implementation? Python is quick and simple with less code. ✓ Java matches the project, can be used as an SDK, and is better for distribution. Can AI replace X2SeaTunnel? Or call AI to implement this function? Can AI replace X2SeaTunnel? Or call AI to implement this function? ✓ X2SeaTunnel still cannot be replaced by AI. □ AI hallucination issue □ High AI cost □ In batch scenarios, AI is difficult to ensure consistency. Details: Implementation idea of configuration conversion centered on pull-based mode Details: Implementation idea of configuration conversion centered on pull-based mode ✓ Take "pull-based mapping" as the core to ensure the integrity of target configuration; ✓ Push-based mode is used to generate reports, and manual inspection is performed to check for missing fields. How to meet the respective special conversion needs of different users? How to meet the respective special conversion needs of different users? ✓ Template system: Compatible with Jinja2-style template syntax ✓ Custom configuration: Implement respective special scenarios ✓ Conversion report: Facilitate manual inspection as a fallback ✓ Filter: Support implementing complex functions through configuration 1. Implement in Python or Java? I initially considered Python because of faster development and less code. But after community communication and considering future usage as an SDK, we implemented Java, which is easier to distribute as a jar. 2. Can AI replace X2SeaTunnel? Or simply use AI to do conversions directly? For example, give the source DataX JSON to a large LLM and let it do the conversion. I believe that even as AI gets stronger, the tool still has value because: AI still hallucinates — even with sufficient context it may produce plausible but incorrect conversions.Calling AI for conversion is costly.For bulk conversion, AI may not guarantee consistency. AI still hallucinates — even with sufficient context it may produce plausible but incorrect conversions. Calling AI for conversion is costly. For bulk conversion, AI may not guarantee consistency. That said, AI is very valuable: X2SeaTunnel was designed and developed with AI. In the future, it can use AI + prompts to quickly generate templates tailored to scenarios. 3. Pull-based conversion as the core implementation idea This is an implementation detail; possible approaches include: Object mapping route: Strongly typed, convert via an object model — code-driven.Declarative mapping (push style): Traverse source and push mappings to the target — config-driven.Pull-based logic: Traverse target requirements and pull corresponding fields from source — template-driven. Object mapping route: Strongly typed, convert via an object model — code-driven. Object mapping route Declarative mapping (push style): Traverse source and push mappings to the target — config-driven. Declarative mapping (push style) Pull-based logic: Traverse target requirements and pull corresponding fields from source — template-driven. Pull-based logic FeaturesObject Mapping RouteDeclarative Mapping Logic (Push Mode)Usage Logic (Pull Mode)Basic PrincipleDataX JSON → DataX Object → SeaTunnel Object → SeaTunnel JSONDataX JSON → Traverse source key → Map to target key → SeaTunnel JSONDataX JSON → Traverse required target key → Map from source → SeaTunnel JSONType Safety Check✅ Strong typing, compile-time check❌ Weak typing, runtime check❌ Weak typing, runtime checkExtension Difficulty❌ High (need to define object models for each tool, leading to extremely bloated code)✅ Low (only need to add mapping configuration)✅ Low (only need to add templates, but requires abstraction ability for the core framework)Complex Conversion✅ Java code handles complex logic❌ Difficult to handle complex logic⚠️ Can be handled by converters or through some rulesConfiguration Integrity⚠️ Depends on development implementation❌ May miss target configuration items✅ Naturally ensures target configuration integrityError Detection✅ Can be checked at compile time❌ Can only be checked at runtime✅ Can check mandatory fields in advanceMapping DirectionSource → Target (Indirect)Source → Target (Direct)Target → Source (Reverse) FeaturesObject Mapping RouteDeclarative Mapping Logic (Push Mode)Usage Logic (Pull Mode)Basic PrincipleDataX JSON → DataX Object → SeaTunnel Object → SeaTunnel JSONDataX JSON → Traverse source key → Map to target key → SeaTunnel JSONDataX JSON → Traverse required target key → Map from source → SeaTunnel JSONType Safety Check✅ Strong typing, compile-time check❌ Weak typing, runtime check❌ Weak typing, runtime checkExtension Difficulty❌ High (need to define object models for each tool, leading to extremely bloated code)✅ Low (only need to add mapping configuration)✅ Low (only need to add templates, but requires abstraction ability for the core framework)Complex Conversion✅ Java code handles complex logic❌ Difficult to handle complex logic⚠️ Can be handled by converters or through some rulesConfiguration Integrity⚠️ Depends on development implementation❌ May miss target configuration items✅ Naturally ensures target configuration integrityError Detection✅ Can be checked at compile time❌ Can only be checked at runtime✅ Can check mandatory fields in advanceMapping DirectionSource → Target (Indirect)Source → Target (Direct)Target → Source (Reverse) FeaturesObject Mapping RouteDeclarative Mapping Logic (Push Mode)Usage Logic (Pull Mode) FeaturesObject Mapping RouteDeclarative Mapping Logic (Push Mode)Usage Logic (Pull Mode) Features Object Mapping Route Declarative Mapping Logic (Push Mode) Usage Logic (Pull Mode) Basic PrincipleDataX JSON → DataX Object → SeaTunnel Object → SeaTunnel JSONDataX JSON → Traverse source key → Map to target key → SeaTunnel JSONDataX JSON → Traverse required target key → Map from source → SeaTunnel JSONType Safety Check✅ Strong typing, compile-time check❌ Weak typing, runtime check❌ Weak typing, runtime checkExtension Difficulty❌ High (need to define object models for each tool, leading to extremely bloated code)✅ Low (only need to add mapping configuration)✅ Low (only need to add templates, but requires abstraction ability for the core framework)Complex Conversion✅ Java code handles complex logic❌ Difficult to handle complex logic⚠️ Can be handled by converters or through some rulesConfiguration Integrity⚠️ Depends on development implementation❌ May miss target configuration items✅ Naturally ensures target configuration integrityError Detection✅ Can be checked at compile time❌ Can only be checked at runtime✅ Can check mandatory fields in advanceMapping DirectionSource → Target (Indirect)Source → Target (Direct)Target → Source (Reverse) Basic PrincipleDataX JSON → DataX Object → SeaTunnel Object → SeaTunnel JSONDataX JSON → Traverse source key → Map to target key → SeaTunnel JSONDataX JSON → Traverse required target key → Map from source → SeaTunnel JSON Basic Principle DataX JSON → DataX Object → SeaTunnel Object → SeaTunnel JSON DataX JSON → Traverse source key → Map to target key → SeaTunnel JSON DataX JSON → Traverse required target key → Map from source → SeaTunnel JSON Type Safety Check✅ Strong typing, compile-time check❌ Weak typing, runtime check❌ Weak typing, runtime check Type Safety Check ✅ Strong typing, compile-time check ❌ Weak typing, runtime check ❌ Weak typing, runtime check Extension Difficulty❌ High (need to define object models for each tool, leading to extremely bloated code)✅ Low (only need to add mapping configuration)✅ Low (only need to add templates, but requires abstraction ability for the core framework) Extension Difficulty ❌ High (need to define object models for each tool, leading to extremely bloated code) ✅ Low (only need to add mapping configuration) ✅ Low (only need to add templates, but requires abstraction ability for the core framework) Complex Conversion✅ Java code handles complex logic❌ Difficult to handle complex logic⚠️ Can be handled by converters or through some rules Complex Conversion ✅ Java code handles complex logic ❌ Difficult to handle complex logic ⚠️ Can be handled by converters or through some rules Configuration Integrity⚠️ Depends on development implementation❌ May miss target configuration items✅ Naturally ensures target configuration integrity Configuration Integrity ⚠️ Depends on development implementation ❌ May miss target configuration items ✅ Naturally ensures target configuration integrity Error Detection✅ Can be checked at compile time❌ Can only be checked at runtime✅ Can check mandatory fields in advance Error Detection ✅ Can be checked at compile time ❌ Can only be checked at runtime ✅ Can check mandatory fields in advance Mapping DirectionSource → Target (Indirect)Source → Target (Direct)Target → Source (Reverse) Mapping Direction Source → Target (Indirect) Source → Target (Direct) Target → Source (Reverse) As an object-oriented programmer, my first instinct was to convert DataX JSON into an intermediate object and then map to the SeaTunnel object model (similar to converting SQL via AST). But that seemed overly complex and unnecessary. Another idea was push vs pull mapping. Although similar in using a mapping engine, their direction is opposite: Push-style: Starting from source — “here’s what I have, take what you can” — may omit target fields.Pull-style: Starting from the target — “I need these fields; fetch them from you” — ensures completeness for the target. Push-style: Starting from source — “here’s what I have, take what you can” — may omit target fields. Push-style Pull-style: Starting from the target — “I need these fields; fetch them from you” — ensures completeness for the target. Pull-style I finally chose a pull-based mapping as core, supplemented by some object mapping to handle complex logic. This ensures the target config’s completeness while keeping extensibility and maintainability. If the source misses fields, the conversion report shows it. I finally chose a pull-based mapping as core, supplemented by some object mapping to handle complex logic. 4. How to satisfy different users’ custom conversion needs? Use a template system + custom configuration + conversion report to cover diverse needs. In practice, customers can quickly implement customized conversions. Template system design ideaBecause X2SeaTunnel’s conversion needs are highly flexible, hardcoding rules loses flexibility. Thus, a template system is essential. Template system design idea Template system design idea Because X2SeaTunnel’s conversion needs are highly flexible, hardcoding rules loses flexibility. Thus, a template system is essential. We used a syntax compatible with SeaTunnel’s HOCON initially; later, for greater expressiveness, we chose Jinja2-style template syntax — details in docs. Custom configuration templatesIn practice, most scenarios use custom templates. Custom templates enable special-case behaviors.Conversion reportThe conversion report acts as a “safety net” to check whether each conversion is correct — it has great value.FiltersWe provide filters like join, replace, regex_extract, etc. Combined with templates, they can cover most complex scenarios. Custom configuration templates Custom configuration templates In practice, most scenarios use custom templates. Custom templates enable special-case behaviors. Conversion report Conversion report The conversion report acts as a “safety net” to check whether each conversion is correct — it has great value. Filters Filters We provide filters like join, replace, regex_extract, etc. Combined with templates, they can cover most complex scenarios. Quick usage & demo for X2SeaTunnel Documentation: https://github.com/apache/seatunnel-tools/blob/main/X2SeaTunnel/README_zh.md https://github.com/apache/seatunnel-tools/blob/main/X2SeaTunnel/README_zh.md Follow the official doc step-by-step to get started — sample cases show core usage and are easy to pick up. Use the Release Package Use the Release Package # Download and unzip the release package unzip x2seatunnel-*.zip cd x2seatunnel-*/ # Download and unzip the release package unzip x2seatunnel-*.zip cd x2seatunnel-*/ Basic Usage Basic Usage # Standard conversion: Use the default template system with built-in common Sources and Sinks ./bin/x2seatunnel.sh -s examples/source/datax-mysql2hdfs.json -t examples/target/mysql2hdfs-result.conf -r examples/report/mysql # Custom task: Implement customized conversion requirements through custom templates # Scenario: MySQL → Hive (DataX has no HiveWriter) # DataX configuration: MySQL → HDFS Custom task: Convert to MySQL → Hive ./bin/x2seatunnel.sh -s examples/source/datax-mysql2hdfs2hive.json -t examples/target/mysql2hive-result.conf -r examples/report # YAML configuration method (equivalent to the above command-line parameters) ./bin/x2seatunnel.sh -c examples/yaml/datax-mysql2hdfs2hive.yaml # Batch conversion mode: Process by directory ./bin/x2seatunnel.sh -d examples/source -o examples/target2 -R examples/report2 # Batch mode supports wildcard filtering ./bin/x2seatunnel.sh -d examples/source -o examples/target3 -R examples/report3 --pattern "*-full.json" --verbose # View help ./bin/x2seatunnel.sh --help # Standard conversion: Use the default template system with built-in common Sources and Sinks ./bin/x2seatunnel.sh -s examples/source/datax-mysql2hdfs.json -t examples/target/mysql2hdfs-result.conf -r examples/report/mysql # Custom task: Implement customized conversion requirements through custom templates # Scenario: MySQL → Hive (DataX has no HiveWriter) # DataX configuration: MySQL → HDFS Custom task: Convert to MySQL → Hive ./bin/x2seatunnel.sh -s examples/source/datax-mysql2hdfs2hive.json -t examples/target/mysql2hive-result.conf -r examples/report # YAML configuration method (equivalent to the above command-line parameters) ./bin/x2seatunnel.sh -c examples/yaml/datax-mysql2hdfs2hive.yaml # Batch conversion mode: Process by directory ./bin/x2seatunnel.sh -d examples/source -o examples/target2 -R examples/report2 # Batch mode supports wildcard filtering ./bin/x2seatunnel.sh -d examples/source -o examples/target3 -R examples/report3 --pattern "*-full.json" --verbose # View help ./bin/x2seatunnel.sh --help Below are features and the directory structure — straightforward. Note that many docs were AI-authored. Functional Features Functional Features ✅ Standard Configuration Conversion: DataX → SeaTunnel configuration file conversion✅ Custom Template Conversion: Supports user-defined conversion templates✅ Detailed Conversion Report: Generates conversion reports in Markdown format✅ Support for Regular Expression Variable Extraction: Extracts variables from configurations using regular expressions, supporting custom scenarios✅ Batch Conversion Mode: Supports batch conversion of directories and files with wildcards, automatically generating reports and summary reports ✅ Standard Configuration Conversion: DataX → SeaTunnel configuration file conversion ✅ Custom Template Conversion: Supports user-defined conversion templates ✅ Detailed Conversion Report: Generates conversion reports in Markdown format ✅ Support for Regular Expression Variable Extraction: Extracts variables from configurations using regular expressions, supporting custom scenarios ✅ Batch Conversion Mode: Supports batch conversion of directories and files with wildcards, automatically generating reports and summary reports Directory Structure Directory Structure x2seatunnel/ ├── bin/ # Executable files │ └── x2seatunnel.sh # Startup script ├── lib/ # JAR package files │ └── x2seatunnel-*.jar # Core JAR package ├── config/ # Configuration files │ └── log4j2.xml # Log configuration ├── templates/ # Template files │ ├── template-mapping.yaml # Template mapping configuration │ ├── report-template.md # Report template │ └── datax/ # DataX-related templates │ ├── custom/ # Custom templates │ ├── env/ # Environment configuration templates │ ├── sources/ # Data source templates │ └── sinks/ # Data target templates ├── examples/ # Examples and tests │ ├── source/ # Example source files │ ├── target/ # Generated target files │ └── report/ # Generated reports ├── logs/ # Log files ├── LICENSE # License └── README.md # Usage instructions x2seatunnel/ ├── bin/ # Executable files │ └── x2seatunnel.sh # Startup script ├── lib/ # JAR package files │ └── x2seatunnel-*.jar # Core JAR package ├── config/ # Configuration files │ └── log4j2.xml # Log configuration ├── templates/ # Template files │ ├── template-mapping.yaml # Template mapping configuration │ ├── report-template.md # Report template │ └── datax/ # DataX-related templates │ ├── custom/ # Custom templates │ ├── env/ # Environment configuration templates │ ├── sources/ # Data source templates │ └── sinks/ # Data target templates ├── examples/ # Examples and tests │ ├── source/ # Example source files │ ├── target/ # Generated target files │ └── report/ # Generated reports ├── logs/ # Log files ├── LICENSE # License └── README.md # Usage instructions Usage Instructions Usage Instructions Basic Syntax x2seatunnel [OPTIONS] x2seatunnel [OPTIONS] Command-Line Parameters OptionLong OptionDescriptionRequired-s--sourcePath to the source configuration fileYes-t--targetPath to the target configuration fileYes-st--source-typeSource configuration type (datax, default: datax)No-T--templatePath to the custom template fileNo-r--reportPath to the conversion report fileNo-c--configPath to the YAML configuration file, containing settings like source, target, report, template, etc.No-d--directoryDirectory for batch conversion sourcesNo-o--output-dirOutput directory for batch conversionNo-p--patternFile wildcard pattern (comma-separated, e.g.: json,xml)No-R--report-dirReport output directory in batch mode, where single-file reports and summary summary.md will be outputNo-v--versionShow version informationNo-h--helpShow help informationNo--verboseEnable detailed log outputNo OptionLong OptionDescriptionRequired-s--sourcePath to the source configuration fileYes-t--targetPath to the target configuration fileYes-st--source-typeSource configuration type (datax, default: datax)No-T--templatePath to the custom template fileNo-r--reportPath to the conversion report fileNo-c--configPath to the YAML configuration file, containing settings like source, target, report, template, etc.No-d--directoryDirectory for batch conversion sourcesNo-o--output-dirOutput directory for batch conversionNo-p--patternFile wildcard pattern (comma-separated, e.g.: json,xml)No-R--report-dirReport output directory in batch mode, where single-file reports and summary summary.md will be outputNo-v--versionShow version informationNo-h--helpShow help informationNo--verboseEnable detailed log outputNo OptionLong OptionDescriptionRequired OptionLong OptionDescriptionRequired Option Long Option Description Required -s--sourcePath to the source configuration fileYes-t--targetPath to the target configuration fileYes-st--source-typeSource configuration type (datax, default: datax)No-T--templatePath to the custom template fileNo-r--reportPath to the conversion report fileNo-c--configPath to the YAML configuration file, containing settings like source, target, report, template, etc.No-d--directoryDirectory for batch conversion sourcesNo-o--output-dirOutput directory for batch conversionNo-p--patternFile wildcard pattern (comma-separated, e.g.: json,xml)No-R--report-dirReport output directory in batch mode, where single-file reports and summary summary.md will be outputNo-v--versionShow version informationNo-h--helpShow help informationNo--verboseEnable detailed log outputNo -s--sourcePath to the source configuration fileYes -s --source Path to the source configuration file Yes -t--targetPath to the target configuration fileYes -t --target Path to the target configuration file Yes -st--source-typeSource configuration type (datax, default: datax)No -st --source-type Source configuration type (datax, default: datax) No -T--templatePath to the custom template fileNo -T --template Path to the custom template file No -r--reportPath to the conversion report fileNo -r --report Path to the conversion report file No -c--configPath to the YAML configuration file, containing settings like source, target, report, template, etc.No -c --config Path to the YAML configuration file, containing settings like source, target, report, template, etc. No -d--directoryDirectory for batch conversion sourcesNo -d --directory Directory for batch conversion sources No -o--output-dirOutput directory for batch conversionNo -o --output-dir Output directory for batch conversion No -p--patternFile wildcard pattern (comma-separated, e.g.: json,xml)No -p --pattern File wildcard pattern (comma-separated, e.g.: json,xml) No -R--report-dirReport output directory in batch mode, where single-file reports and summary summary.md will be outputNo -R --report-dir Report output directory in batch mode, where single-file reports and summary summary.md will be output No -v--versionShow version informationNo -v --version Show version information No -h--helpShow help informationNo -h --help Show help information No --verboseEnable detailed log outputNo --verbose Enable detailed log output No Below I emphasize the template system. X2SeaTunnel uses a DSL-based template system driven by configuration to quickly adapt different sources and targets. Core advantages: DSL-based template system Config-driven: All conversion logic is defined in YAML files — no Java code changes needed.Easy to extend: Adding new source types only requires adding templates and mapping configs.Unified syntax: Jinja2-style template syntax for readability and maintainability.Smart mapping: Transformers implement complex parameter mapping logic. Config-driven: All conversion logic is defined in YAML files — no Java code changes needed. Config-driven Easy to extend: Adding new source types only requires adding templates and mapping configs. Easy to extend Unified syntax: Jinja2-style template syntax for readability and maintainability. Unified syntax Smart mapping: Transformers implement complex parameter mapping logic. Smart mapping Conversion report After conversion, view the generated Markdown report containing: Basic info: conversion time, source/target paths, connector types, conversion status.Conversion stats: direct mappings, smart conversions, default usage, count & percentage of unmapped fields.Detailed field mappings: for each field — source value, target value, used filters.Default value usage: list of fields using defaults.Unmapped fields: show fields present in DataX but not converted.Possible errors and warnings: prompts for issues during conversion. Basic info: conversion time, source/target paths, connector types, conversion status. Basic info Conversion stats: direct mappings, smart conversions, default usage, count & percentage of unmapped fields. Conversion stats Detailed field mappings: for each field — source value, target value, used filters. Detailed field mappings Default value usage: list of fields using defaults. Default value usage Unmapped fields: show fields present in DataX but not converted. Unmapped fields Possible errors and warnings: prompts for issues during conversion. Possible errors and warnings For batch conversion, a summary folder with batch reports is generated, containing: Conversion overview: overall stats, success rate, elapsed time.Successful conversions list: list of successfully converted files.Failed conversions list: failed files and error info (if any). Conversion overview: overall stats, success rate, elapsed time. Conversion overview Successful conversions list: list of successfully converted files. Successful conversions list Failed conversions list: failed files and error info (if any). Failed conversions list AI Coding practice and reflections AI4Me: my LLM exploration journey I have always been passionate about exploring AI and chasing technological waves. From Midjourney’s explosion: I bought an account early and immersed in AI art.ChatGPT era: from 3.5 to 4.0, it was the main driver of my work and thinking; after a pause, GPT-5 took over my workflow.Claude series: I registered early and used it heavily for coding.Domestic models: I continuously followed and used ChatGLM, Kimi, MiniCPM, Qianwen, etc.Local experiments: bought GPU servers to play with small models locally.RAG & Agent exploration: studied almost all open Agent frameworks, deeply used LangChain, and conceived ideas like Delta2.DeepSeek V2: domestic success — I supported it financially; V2.5 became my main Claude substitute.AI Coding & debugging practice: tried auto_coder, TongYi, Cursor, G-Copilot, Aug, CodeX, etc.AI product experiences: tried many, from Minmax to other domestic products. From Midjourney’s explosion: I bought an account early and immersed in AI art. From Midjourney’s explosion ChatGPT era: from 3.5 to 4.0, it was the main driver of my work and thinking; after a pause, GPT-5 took over my workflow. ChatGPT era Claude series: I registered early and used it heavily for coding. Claude series Domestic models: I continuously followed and used ChatGLM, Kimi, MiniCPM, Qianwen, etc. Domestic models Local experiments: bought GPU servers to play with small models locally. Local experiments RAG & Agent exploration: studied almost all open Agent frameworks, deeply used LangChain, and conceived ideas like Delta2. RAG & Agent exploration DeepSeek V2: domestic success — I supported it financially; V2.5 became my main Claude substitute. DeepSeek V2 AI Coding & debugging practice: tried auto_coder, TongYi, Cursor, G-Copilot, Aug, CodeX, etc. AI Coding & debugging practice AI product experiences: tried many, from Minmax to other domestic products. AI product experiences At one point, when DeepSeek V3 R1 was released and hadn’t yet gone mainstream, I fell into fervor — product, architecture, prototyping, even fortune-telling — trying to use AI for everything. fervor When DeepSeek went mainstream, I felt lost. Information explosion and noisy input made me lose focus and feel overwhelmed. lose focus Later, I learned to do subtraction. do subtraction I stopped chasing AI for its own sake and returned to the core: let AI solve my present problems and live in the moment. solve my present problems and live in the moment This is the correct distance between humans and AI. X2SeaTunnel Vibe Coding insights AI develops so fast that it’s hard to keep upThis article was written on October 21, 2025. Given AI’s speed of change, these thoughts might be outdated in a couple of months.Over the past months, I tried many AI coding tools. For X2SeaTunnel, I mainly used Augment and GitHub Copilot. In June, AI Agent mode was just starting — success rates solving complex problems were low. By October, success rates improved significantly for both new code and legacy problems.Assist AI and save tokensAI doesn’t just make us efficient — it makes us stronger. My role shifted toward choosing directions and making decisions.Although AI is obedient, humans must “protect” it — start saving tokens. Don’t waste AI on trivial tasks like spotless formatting or simple mvn package compile issues — these I do myself faster.Agents can do them, but waste time and tokens. So I tell AI to focus on complex logic while handing basic operations to myself.Human-in-the-loop quick validationLike fast CPUs making disk I/O the bottleneck, functional validation becomes the bottleneck in product iteration. So speed up validation — I scripted and automated builds, packaging, verification, and observation for X2SeaTunnel.Context management: keep rules & docs trackedI refactored many versions because July’s AI wasn’t good at self-documenting. You must guide AI with docs and iterate step-by-step — verify after each development to avoid chaos. Also, AI writes docs too fast — manually prune invalid docs to avoid wasting tokens. AI develops so fast that it’s hard to keep up AI develops so fast that it’s hard to keep up This article was written on October 21, 2025. Given AI’s speed of change, these thoughts might be outdated in a couple of months. Over the past months, I tried many AI coding tools. For X2SeaTunnel, I mainly used Augment and GitHub Copilot. In June, AI Agent mode was just starting — success rates solving complex problems were low. By October, success rates improved significantly for both new code and legacy problems. Assist AI and save tokens Assist AI and save tokens AI doesn’t just make us efficient — it makes us stronger. My role shifted toward choosing directions and making decisions. Although AI is obedient, humans must “protect” it — start saving tokens. Don’t waste AI on trivial tasks like spotless formatting or simple mvn package compile issues — these I do myself faster. mvn package Agents can do them, but waste time and tokens. So I tell AI to focus on complex logic while handing basic operations to myself. Human-in-the-loop quick validation Human-in-the-loop quick validation Like fast CPUs making disk I/O the bottleneck, functional validation becomes the bottleneck in product iteration. So speed up validation — I scripted and automated builds, packaging, verification, and observation for X2SeaTunnel. Context management: keep rules & docs tracked Context management: keep rules & docs tracked I refactored many versions because July’s AI wasn’t good at self-documenting. You must guide AI with docs and iterate step-by-step — verify after each development to avoid chaos. Also, AI writes docs too fast — manually prune invalid docs to avoid wasting tokens. Question AI appropriatelyBecause AI’s logic is self-consistent, some issues become hard to detect or lead to over-design — be ready to question it. Question AI appropriately Question AI appropriately Because AI’s logic is self-consistent, some issues become hard to detect or lead to over-design — be ready to question it. From Vibe Coding to Spec Coding Anthropic released a guide on Agent context engineering. Past mistakes have been abstracted in the industry: Context Engineering and Spec Coding. Context Engineering Spec Coding Just as writing a good Spark job requires Spark principles, using AI and Agents well requires understanding their mechanisms. Recommended reading: Anthropic’s AI Agent Context Engineering GuideSpec Coding practices (e.g., GitHub’s spec-kit) Anthropic’s AI Agent Context Engineering Guide Anthropic’s AI Agent Context Engineering Guide Spec Coding practices (e.g., GitHub’s spec-kit) spec-kit When to keep it minimal and when to write Specs? When to keep it minimal and when to write Specs? Short-cycle / ad-hoc tasks (e.g., troubleshooting)Don’t pile on complex prompts. Keep it simple: give concise context and put related files in one folder so the Agent can explore using bash and file. The goal is to minimize distractions and save “AI attention”.Complex/long-term projectsVibe Coding easily detours. Community consensus leans to Spec Coding.Philosophy: Everything is a Spec — requirements, boundaries, interfaces, and acceptance criteria should be written as clear executable specs. Short-cycle / ad-hoc tasks (e.g., troubleshooting) Short-cycle / ad-hoc tasks (e.g., troubleshooting) Don’t pile on complex prompts. Keep it simple: give concise context and put related files in one folder so the Agent can explore using bash and file. The goal is to minimize distractions and save “AI attention”. bash file Complex/long-term projects Complex/long-term projects Vibe Coding easily detours. Community consensus leans to Spec Coding. Spec Coding Philosophy: Everything is a Spec — requirements, boundaries, interfaces, and acceptance criteria should be written as clear executable specs. Everything is a Spec https://github.com/github/spec-kit https://github.com/github/spec-kit My previous approach aligned with Spec Coding but was rough. Next I will systematically adopt Spec Coding for complex projects. Spec Coding checklist: Spec Coding checklist: Align requirements: discuss requirements thoroughly with AI, invite AI to be a critic/reviewer to expose gaps.Produce design: ask AI to output a design/spec (goals, constraints, interfaces, data flow, dependencies, tests & acceptance) that is reviewable and executable.Iterative implementation: decompose the spec → implement → fast feedback; human+AI collaboration in small increments. Use Git for branch/control to ensure auditability and rollback. Align requirements: discuss requirements thoroughly with AI, invite AI to be a critic/reviewer to expose gaps. Align requirements Produce design: ask AI to output a design/spec (goals, constraints, interfaces, data flow, dependencies, tests & acceptance) that is reviewable and executable. Produce design Iterative implementation: decompose the spec → implement → fast feedback; human+AI collaboration in small increments. Use Git for branch/control to ensure auditability and rollback. Iterative implementation Key Agent capabilities from AI coding tools Quoted: Software 3.0 — Andrej Karpathy Software 3.0 Recommended: Andrej Karpathy’s Software 3.0 (July 2025) — insightful on AI Agents. Current mainstream Agent frameworks & methodologies still fall under his depiction. Software 3.0 Agent projects have been most successfully landed in AI Coding. To understand Agents, start with AI Coding tools. GitHub Copilot team’s Agent product design highlights (Agent capability core): Context management: the user and the Agent maintain context for multi-turn tasks so the model stays focused.Multi-call & orchestration (Agentic Orchestration): allows AI to plan task chains and call multiple tools/functions autonomously.Efficient human-AI interface: GUI raises collaboration efficiency — GUI acts as a “human GPU”.Generate-verify loop (Gen-Verify Loop): human+AI build a continuous loop — AI generates, humans verify & correct; feedback improves outcomes. Context management: the user and the Agent maintain context for multi-turn tasks so the model stays focused. Context management Multi-call & orchestration (Agentic Orchestration): allows AI to plan task chains and call multiple tools/functions autonomously. Multi-call & orchestration (Agentic Orchestration) Efficient human-AI interface: GUI raises collaboration efficiency — GUI acts as a “human GPU”. Efficient human-AI interface Generate-verify loop (Gen-Verify Loop): human+AI build a continuous loop — AI generates, humans verify & correct; feedback improves outcomes. Generate-verify loop (Gen-Verify Loop) Fully automated “AI autopilot” is far away. For practitioners, that’s fine — human+AI collaboration already releases huge productivity gains. AI & data platform convergence and opportunity areas This section may seem audacious, but I’ll share some thoughts that I follow. AI×data-platform convergence can be categorized into two directions: 1. Data4AI: providing solid data foundations for AI 1. Data4AI: providing solid data foundations for AI Data4AI deals with how to make data better support AI. Multi-modal data management and lake-house architectures are key. They provide a unified, efficient, governable foundation for data prep, training, inference, and model ops. Thus, traditional data platforms evolve to AI-oriented capabilities: formats like Lance, FileSet management, Raycompute framework, Python-native interfaces — these help AI “eat better and run steadier”. Lance Ray 2. AI4Data: AI-enhancing data platforms 2. AI4Data: AI-enhancing data platforms AI4Data asks how AI can enhance platform efficiency and reliability. Two sub-areas: DevAgent (platform building): AI assists development, ops & optimization — enabling system-level self-healing & automation. Under the hood, this calls observability and small-file merge algorithms, etc.DataAgent (data analytics): AI as analytics assistant to explore data, generate insights, and support decisions — leveraging integration, development, and query tools. DevAgent (platform building): AI assists development, ops & optimization — enabling system-level self-healing & automation. Under the hood, this calls observability and small-file merge algorithms, etc. DevAgent (platform building) DataAgent (data analytics): AI as analytics assistant to explore data, generate insights, and support decisions — leveraging integration, development, and query tools. DataAgent (data analytics) All rely on the Agent methodology: AI acts as an autonomous entity with perception, planning, execution, and reflection; humans provide direction, constraints, and value judgment. From demo to real-world adoption, there’s still a “last mile” which humans fill: understand business, define goals, control boundaries. The figure above shows a DevAgent prototype I sketched in 2023, inspired by a paper — a “special forces squad” of humans + AI that can collaborate, execute automatically, and continuously learn in complex environments. Now, that idea is gradually becoming reality. Humans + AI will be standard collaborators. AI changes daily — what can we do? AI and Agent progress quickly — we should fully embrace and deeply experience them. From a team and individual perspective: From a team and individual perspective: As AI amplifies individual efficiency, organizational communication cost may exceed execution value. This expands individual responsibilities while shrinking organizational boundaries. A trend: more “super individuals” emerge. Not focusing on risks, I find this era interesting. The evolution of AI reminds me of martial arts cultivation: one person, one sword, one path. An individual + AI combination is like forging a personal magic weapon. A good AI tool is a “Heaven Reliant Sword”; a good Spec is like a martial classic. Through practice and iteration, you can accomplish what was once impossible — become stronger and contribute widely. From an industry perspective: From an industry perspective: I fight in the data trenches daily. I ask: Can AI Agents help overcome ToB digital transformation bottlenecks? Can they reduce the landing cost of data platforms? Can AI Agents help overcome ToB digital transformation bottlenecks? I think yes. Quote adapted from Xiaomi startup thinking: Xiaomi startup thinking Discover customers’ real pain points and abstract & standardize them; use open source and AI’s momentum to resolve contradictions between customer needs and technical delivery; with reliable, usable, and fairly priced products/services, democratize advanced technology, reduce societal costs, and contribute to national digital transformation. Start with reliability. The future is already here — it’s just unevenly distributed Finally, quoting William Gibson: “The future is already here — it’s just not evenly distributed.” Technological iteration comes with uneven pace and resource allocation, but we can proactively embrace the wave, follow the times, and make tech serve people and social progress. “The future is already here — it’s just not evenly distributed.” Above is my sharing — comments and corrections are welcome. I look forward to moving forward together.