As the OSPP program draws to a close, after more than half a year of dedicated development, contributors to the Apache SeaTunnel project have achieved fruitful results. Today, let’s focus on the project that implemented CDC source schema evolution support in the Apache SeaTunnel Flink engine. CDC source schema evolution From the project’s initial concept to its gradual development and completion, we’ll explore the full journey behind this achievement. Next, through an interview, let’s step into the open-source world of this developer from the University of Science and Technology Beijing and see how she balanced her demanding studies while completing this project! Personal Introduction Project Mentor: Lucifer TyrantName: Dong JiaxinSchool & Major: University of Science and Technology Beijing – Big Data Management and ApplicationGitHub ID: 147227543Research Interests: Big data platform development; previously worked on data platform development at Kuaishou and MeituanHobbies: Reading technical documentation, experimenting with new technology stacks, and reading novels Project Mentor: Lucifer Tyrant Project Mentor Name: Dong Jiaxin Name School & Major: University of Science and Technology Beijing – Big Data Management and Application School & Major GitHub ID: 147227543 GitHub ID Research Interests: Big data platform development; previously worked on data platform development at Kuaishou and Meituan Research Interests Hobbies: Reading technical documentation, experimenting with new technology stacks, and reading novels Hobbies Project Title Schema Evolution Support for CDC Sources on Flink Engine Schema Evolution Support for CDC Sources on Flink Engine Schema Evolution Support for CDC Sources on Flink Engine Project Background In real-time data synchronization scenarios, source-table schema changes — such as adding columns or modifying column types — are common. Currently, Apache SeaTunnel already supports CDC schema evolution on its native engine, but this feature has not yet been implemented on the Flink engine. native engine Flink engine As a result, when using Flink for CDC synchronization, users must restart jobs whenever a schema change occurs, which severely impacts data synchronization continuity and stability. Implementation Approach My implementation inspiration mainly came from the Flink CDC project. Flink CDC project After studying Flink CDC’s schema-evolution design and combining it with SeaTunnel’s architecture, I designed a schema-evolution solution tailored for the Flink engine. The core architecture design includes the following key components: SchemaCoordinator SchemaCoordinator SchemaCoordinator Role: The central coordinator of the entire solution, responsible for global schema-change state management and synchronizationImplementation details:Maintains a schemaChangeStatesmapping table to record each table’s schema-change statusTracks schema versions for each table through schemaVersionsUses a ReentrantLock to ensure thread safety for concurrent schema-change requestsMaintains a pendingRequests queue to manage pending CompletableFutures awaiting schema changes Role: The central coordinator of the entire solution, responsible for global schema-change state management and synchronization Role Implementation details: Implementation details Maintains a schemaChangeStatesmapping table to record each table’s schema-change status schemaChangeStates Tracks schema versions for each table through schemaVersions schemaVersions Uses a ReentrantLock to ensure thread safety for concurrent schema-change requests ReentrantLock Maintains a pendingRequests queue to manage pending CompletableFutures awaiting schema changes pendingRequests CompletableFuture SchemaOperator SchemaOperator SchemaOperator Role: A specialized operator inserted between the CDC Source and Sink to intercept and process schema-change eventsImplementation details:Detects SchemaChangeEvents in processElement()Handles schema-change flow via processSchemaChangeEvent()Maintains currentSchemaChangeFutureto support schema-change cancellation and rollbackUses lastProcessedEventTime to prevent duplicate processing of old events Role: A specialized operator inserted between the CDC Source and Sink to intercept and process schema-change events Role Implementation details: Implementation details Detects SchemaChangeEvents in processElement() SchemaChangeEvent processElement() Handles schema-change flow via processSchemaChangeEvent() processSchemaChangeEvent() Maintains currentSchemaChangeFutureto support schema-change cancellation and rollback currentSchemaChangeFuture Uses lastProcessedEventTime to prevent duplicate processing of old events lastProcessedEventTime Key Problem and Solution Key Problem and Solution During development, I encountered a tricky issue: When processing schema-change events in processElement(), the job would hang — no further data was processed, only continuous checkpointing. When processing schema-change events in processElement(), the job would hang — no further data was processed, only continuous checkpointing. processElement() After analyzing logs, I found the root cause: 2025-08-17 12:33:36,597 INFO FlinkSinkWriter - FlinkSinkWriter handled FlushEvent for table: .schema_test.products 2025-08-17 12:33:36,597 INFO SchemaOperator - FlushEvent sent to downstream for table: .schema_test.products 2025-08-17 12:33:36,597 INFO SchemaCoordinator - Processing schema change for table: .schema_test.products 2025-08-17 12:33:36,598 WARN SchemaCoordinator - No schema change state found for table: .schema_test.products 2025-08-17 12:33:36,597 INFO FlinkSinkWriter - FlinkSinkWriter handled FlushEvent for table: .schema_test.products 2025-08-17 12:33:36,597 INFO SchemaOperator - FlushEvent sent to downstream for table: .schema_test.products 2025-08-17 12:33:36,597 INFO SchemaCoordinator - Processing schema change for table: .schema_test.products 2025-08-17 12:33:36,598 WARN SchemaCoordinator - No schema change state found for table: .schema_test.products As the logs show, a FlushEvent was sent downstream first. FlushEvent After the FlinkSinkWriter processed it and tried to notify the SchemaCoordinator, the SchemaCoordinator hadn’t yet initialized the schema-change state (because the coordinator code hadn’t executed yet). FlinkSinkWriter SchemaCoordinator SchemaCoordinator This caused the notification to fail. As a result, the schemaChangeFuture.get() call in SchemaOperator kept waiting until timeout (60 seconds). schemaChangeFuture.get() SchemaOperator After observing this, I adjusted the execution order: instead of “sending FlushEvent first, then requesting SchemaCoordinator,” I changed it to “request SchemaCoordinator first, then send FlushEvent,” as shown below: CompletableFuture<SchemaResponse> schemaChangeFuture = schemaCoordinator.requestSchemaChange( tableId, jobId, schemaChangeEvent.getChangeAfter(), 1); currentSchemaChangeFuture.set(schemaChangeFuture); sendFlushEventToDownstream(schemaChangeEvent); // send after coordinator request CompletableFuture<SchemaResponse> schemaChangeFuture = schemaCoordinator.requestSchemaChange( tableId, jobId, schemaChangeEvent.getChangeAfter(), 1); currentSchemaChangeFuture.set(schemaChangeFuture); sendFlushEventToDownstream(schemaChangeEvent); // send after coordinator request This ensured that the SchemaCoordinator created its state before the FlushEvent was sent. SchemaCoordinator FlushEvent Thus, when the downstream finished processing the event, the state already existed, allowing successful notification and completion of the CompletableFuture. CompletableFuture The processSchemaChangeEvent method then resumed, continuing normal execution. processSchemaChangeEvent Project Outcomes Problems Solved Problems Solved Problems Solved Implemented real-time schema-evolution capability on the Flink engineUsers no longer need to restart jobs when source schemas change during CDC synchronizationProvided a full schema-change coordination mechanism ensuring synchronization among operators Implemented real-time schema-evolution capability on the Flink engine Users no longer need to restart jobs when source schemas change during CDC synchronization Provided a full schema-change coordination mechanism ensuring synchronization among operators User Benefits User Benefits User Benefits Improved business continuity — schema changes no longer require downtimeReduced O&M costs — less manual intervention and fewer restartsGuaranteed data consistency — FlushEventensures pre- and post-change data consistencyFlexible engine choice — users can freely choose between the Flink or SeaTunnel engines while retaining schema-evolution capability Improved business continuity — schema changes no longer require downtime Improved business continuity Reduced O&M costs — less manual intervention and fewer restarts Reduced O&M costs Guaranteed data consistency — FlushEventensures pre- and post-change data consistency Guaranteed data consistency FlushEvent Flexible engine choice — users can freely choose between the Flink or SeaTunnel engines while retaining schema-evolution capability Flexible engine choice Technical Contributions Technical Contributions Technical Contributions Added a global SchemaCoordinator componentIntroduced a new FlushEvent type and handling mechanismImplemented full schema-evolution adaptation in the Flink translation layer Added a global SchemaCoordinator component SchemaCoordinator Introduced a new FlushEvent type and handling mechanism FlushEvent Implemented full schema-evolution adaptation in the Flink translation layer Future Improvements Future Improvements Future Improvements Multi-parallelism support: design a flush-coordination mechanism for parallel scenarios using parallelism-aware counters and finer-grained state managementState persistence: consider converting the SchemaCoordinator into a Flink operator or leveraging Flink’s BroadcastState so its state participates in checkpoints Multi-parallelism support: design a flush-coordination mechanism for parallel scenarios using parallelism-aware counters and finer-grained state management Multi-parallelism support State persistence: consider converting the SchemaCoordinator into a Flink operator or leveraging Flink’s BroadcastState so its state participates in checkpoints State persistence SchemaCoordinator BroadcastState To better understand students’ experiences during the Summer of Open Source, Apache SeaTunnel conducted a brief interview. Here’s the transcript: Q: Among so many projects, why did you choose Apache SeaTunnel? Q: Among so many projects, why did you choose Apache SeaTunnel? A: Several reasons. A: First, the project’s technical direction matches my background well. During an internship at a startup, we used SeaTunnel for data integration and warehouse construction. I also frequently use Flink to build data pipelines and real-time lineage systems, so I’m passionate about data integration and streaming synchronization. SeaTunnel, as a new-generation data-integration platform with a modern tech stack and clear architecture, is a perfect project to learn deeply and contribute to. Moreover, the SeaTunnel community is extremely welcoming and active. Responses are quick, and for someone like me participating in open source for the first time, it’s very friendly. The CDC schema-evolution feature solves a real-world pain point, and seeing my code truly help users gives me a strong sense of accomplishment. Q: How does the Apache SeaTunnel project relate to your academic studies? Q: How does the Apache SeaTunnel project relate to your academic studies? A: Quite closely. Our courses on big-data processing cover frameworks like Flink and StarRocks, both of which are widely used in SeaTunnel. A: In my sophomore year, I also worked with StreamPark for Spark micro-batch jobs, which made me familiar with data integration concepts. By joining SeaTunnel, I could apply classroom theory in real projects and deepen my understanding. Q: How has participating in this project influenced your studies and future plans? Q: How has participating in this project influenced your studies and future plans? A: I’ve gained a lot. To understand CDC implementation, I studied Flink CDC’s source code in depth, strengthening my understanding of Flink’s runtime, distributed coordination, and asynchronous programming. A: Under my mentor’s guidance, I also learned open-source collaboration practices — code style, PR process, testing — which laid a solid foundation for future contributions. Most importantly, this experience confirmed my long-term interest in data infrastructure; I plan to continue developing in this area. Q: What was your biggest challenge during the project, and how did you overcome it? Q: What was your biggest challenge during the project, and how did you overcome it? A: The toughest issue was the job hang-up processElement, which only kept checkpointing without processing data. A: processElement Architecturally, integrating new functionality gracefully into the existing system was also more complex than expected. To resolve these, I repeatedly debugged, researched documentation, and sought advice from my mentor and community members. Their feedback was incredibly helpful and guided me through the problem. Q: How long have you been involved in open source? Do you like it? What has it changed for you? Q: How long have you been involved in open source? Do you like it? What has it changed for you? A: This is my first formal open-source experience. Though I wrote internal code during internships, this was my first time submitting PRs publicly. A: I really enjoy open source — the openness, collaboration, and shared learning environment where everyone contributes to a common goal. It’s meaningful and rewarding. Q: Have you used Apache SeaTunnel or other data-integration tools before? Q: Have you used Apache SeaTunnel or other data-integration tools before? A: Yes. During internships, I used SeaTunnel mainly for syncing between different data sources — for example, Kafka to Hive or Kafka to StarRocks. A: I’ve also worked with Flink CDC for streaming source integration. Compared with others, SeaTunnel supports three execution engines, covering both batch and streaming use cases. If I choose a data-integration tool in the future, SeaTunnel will be my first choice — it’s feature-rich and easy to configure. Q: What was your first impression of the SeaTunnel community, and what do you hope to gain here? Q: What was your first impression of the SeaTunnel community, and what do you hope to gain here? A: My first impression was great — friendly community, responsive mentors, and very careful code reviews. I hope to keep meeting like-minded contributors, make valuable contributions, and continue improving my technical skills. A: Q: Will you continue contributing to Apache SeaTunnel? Q: Will you continue contributing to Apache SeaTunnel? A: Definitely. I plan to further optimize my current feature to better ensure exactly-once semantics for SeaTunnel on Flink, and I hope to join more interesting topics in the future. A: