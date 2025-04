Nhanganyaya kuCDC (Shandura Data Capture)

Shandura Dhata Capture (CDC) inzira inoshandiswa kuteedzera shanduko padanho remutsara mumabasa edhatabhesi (kupinza, kugadziridza, kudzima) uye kuzivisa mamwe masisitimu mukurongeka kwezviitiko. Mumamiriro ekudzoreredza njodzi, CDC inonyanya kuwiriranisa data pakati pekutanga uye dhatabhesi yekuchengetedza, ichigonesa chaiyo-nguva data syncing kubva kune yekutanga kuenda kune yechipiri dhatabhesi.

source ----------> CDC ----------> sink

Apache SeaTunnel CDC

SeaTunnel CDC inopa marudzi maviri ekuyananisa data:

Snapshot Read : Inoverenga nhoroondo kubva patafura.

: Inoverenga nhoroondo kubva patafura. Kuwedzera Kutevera : Inoverenga yekuwedzera log shanduko kubva patafura.

Kiya-Mahara Snapshot Synchronization

Iyo yekuvhara-yemahara snapshot yekubatanidza chikamu inosimbiswa nekuti akawanda aripo eCDC mapuratifomu, akadai seDebezium, anogona kukiya matafura panguva yenhoroondo yekuwiriranisa data. Kuverenga kweSnapshot ndiyo maitiro ekuyananisa nhoroondo yedatabase data. Hwaro hwekufamba kweiyo nzira ndeiyi inotevera:

storage -------------> splitEnumerator ---------- split ----------> reader ^ | | | \----------------- report -----------/

Split Partitioning

splitEnumerator (split distributor) inopatsanura data retafura kuita kupatsanurwa kwakawanda zvichienderana nenzvimbo dzakatarwa (senge ID yetafura kana makiyi akasiyana) uye saizi yenhanho yakatsanangurwa.





Parallel Processing

Kupatsanurwa kwega kwega kunopihwa muverengi akasiyana kuti averenge zvakafanana. Muverengi mumwe chete anotora kubatana kumwe.





Chiitiko Feedback

Mushure splitEnumerator basa rekuverenga rekupatsanura, muverengi wega wega anoshuma kufambira mberi kudzokera kusplitEnumerator. Iyo metadata yekuparadzanisa inopihwa sezvizvi:

String splitId # Routing ID TableId tableId # Table ID SeatunnelRowType splitKeyType # The type of field used for partitioning Object splitStart # Start point of the partition Object splitEnd # End point of the partition





Kana muverengi angogamuchira ruzivo rwakapatsanurwa, inogadzira iyo yakakodzera SQL zvirevo. Isati yatanga, inoisa iyo yazvino kupatsanura inoenderana chinzvimbo mudura redatabase. Mushure mekupedza kupatsanurwa kwazvino, muverengi anoshuma kufambira mberi kune splitEnumerator neiyo inotevera data:

String splitId # Split ID Offset highWatermark # Log position corresponding to the split, for future validation

Incremental Synchronization

Iyo yekuwedzera yekuwiriranisa chikamu inotanga mushure meiyo snapshot yekuverenga chikamu. Mune ino nhanho, chero shanduko inoitika mune sosi dhatabhesi inotorwa uye inowiriraniswa kune yekuchengetedza dhatabhesi munguva chaiyo. Ichi chikamu chinoteerera kune dhatabhesi log (semuenzaniso, MySQL binlog). Kuwedzera kwekutevera kunowanzo kumwechete-shinda kuti udzivise kudhirowa kudhonzwa kwebinlog uye kuderedza dhatabhesi mutoro. Naizvozvo, muverengi mumwe chete anoshandiswa, achitora kubatana kumwe.

data log -------------> splitEnumerator ---------- split ----------> reader ^ | | | \----------------- report -----------/





Muchikamu chekuwedzera chekubatanidza, zvese zvakapatsanurwa uye matafura kubva pachikamu chechidimbu zvinosanganiswa kuita kupatsanurwa kumwe. Iyo kupatsanurwa metadata panguva ino ndeiyi inotevera:

String splitId Offset startingOffset # The lowest log start position among all splits Offset endingOffset # Log end position, or "continuous" if ongoing, eg, in the incremental phase List<TableId> tableIds Map<TableId, Offset> tableWatermarks # Watermark for all splits List<CompletedSnapshotSplitInfo> completedSnapshotSplitInfos # Snapshot phase split details





Iyo CompletedSnapshotSplitInfo minda yakaita seiyi:

String splitId TableId tableId SeatunnelRowType splitKeyType Object splitStart Object splitEnd Offset watermark # Corresponds to the highWatermark in the report

Iyo yakakamurwa muchikamu chekuwedzera ine watermark kune ese akapatsanurwa muchikamu chesnapshot. Iyo shoma watermark inosarudzwa senzvimbo yekutanga yekuwedzera kuwiriranisa.

Chaizvo-Kamwe Semantics

Ingave mumufananidzo wekuverenga kana chikamu chekuwedzera chekuverenga, dhatabhesi rinogonawo kuchinja kuti riwirirane. Isu tinovimbisa sei chaizvo kutumirwa kumwe chete?

Snapshot Read Phase

Muchikamu chekuverenga snapshot, semuenzaniso, kupatsanurwa kuri kuwiriraniswa apo shanduko dziri kuitika, sekuiswa kwemutsara k3 , inogadziridza k2 , uye kudzima k1 . Kana pasina chiziviso chebasa chinoshandiswa panguva yekuverenga, zvigadziriso zvinogona kurasika. SeaTunnel inobata izvi ne:





Kutanga, kutarisa binlog chinzvimbo (yakaderera watermark) usati waverenga kupatsanura.

Kuverenga data muchikamu split{start, end} .

. Kurekodha iyo yakakwira watermark mushure mekuverenga.





Kana high = low , data yekupatsanurwa haina kuchinja panguva yekuverenga. Kana (high - low) > 0 , shanduko dzakaitika panguva yekugadzirisa. Mumamiriro ezvinhu akadaro, SeaTunnel icha:





Cache iyo yakakamurwa data mundangariro semu-mundangariro tafura.

Isa shanduko kubva low watermark kuenda high watermark mukurongeka, uchishandisa makiyi ekutanga kudzoreredza mashandiro patafura yendangariro.

kuenda mukurongeka, uchishandisa makiyi ekutanga kudzoreredza mashandiro patafura yendangariro. Taura iyo yakakwira watermark.





insert k3 update k2 delete k1 | | | vvv bin log --|---------------------------------------------------|-- log offset low watermark high watermark CDC reads: k1 k3 k4 | Replays v Real data: k2 k3' k4

Incremental Phase

Asati atanga chikamu chekuwedzera, SeaTunnel inotanga kusimbisa kupatsanurwa kwese kubva padanho rekutanga. Pakati pekuparadzaniswa, data inogona kuvandudzwa, semuenzaniso, kana zvinyorwa zvitsva zvakaiswa pakati pekuparadzanisa1 uye split2, zvinogona kupotsa panguva yechikamu chechidimbu. Kuti udzore iyi data pakati pekuparadzana, SeaTunnel inotevera nzira iyi:





Kubva pane ese akakamurwa mishumo, tsvaga diki watermark seyekutanga watermark kuti utange kuverenga irogi.

Kune yega yega yekupinda yekuverenga, tarisa completedSnapshotSplitInfos kuti uone kana iyo data yakagadziriswa mune chero kupatsanurwa. Kana zvisina kudaro, inofungidzirwa kuti data pakati pekuparadzana uye inofanira kugadziriswa.

kuti uone kana iyo data yakagadziriswa mune chero kupatsanurwa. Kana zvisina kudaro, inofungidzirwa kuti data pakati pekuparadzana uye inofanira kugadziriswa. Kana zvese zvakapatsanurwa zvasimbiswa, maitiro acho anoenda kune yakazara chikamu chekuwedzera.





|------------filter split2-----------------| |----filter split1------| data log -|-----------------------|------------------|----------------------------------|- log offset min watermark split1 watermark split2 watermark max watermark

Checkpoint uye Resume

Zvakadini nekumbomira uye kutangazve CDC? SeaTunnel inoshandisa yakagoverwa snapshot algorithm (Chandy-Lamport):

Fungidzira kuti sisitimu ine maitiro maviri, p1 uye p2 , apo p1 ine matatu akasiyana X1 Y1 Z1 uye p2 ine matatu akasiyana X2 Y2 Z2 . Mamiriro ekutanga ndeaya anotevera:

p1 p2 X1:0 X2:4 Y1:0 Y2:2 Z1:0 Z2:3





Panguva ino, p1 inotanga mufananidzo wepasi rose. p1 inotanga kurekodha maitiro ayo, yozotumira chiratidzo kune p2 .





Chiratidzo chisati chasvika p2 , p2 inotumira meseji M kune p1 .

p1 p2 X1:0 -------marker-------> X2:4 Y1:0 <---------M---------- Y2:2 Z1:0 Z2:3





Pakugamuchira chiratidzo, p2 inorekodha mamiriro ayo, uye p1 inogamuchira iyo meseji M . Sezvo p1 yakatoita snapshot yemunharaunda, inongoda kuisa meseji M . Mufananidzo wekupedzisira unotaridzika seizvi:

p1 M p2 X1:0 X2:4 Y1:0 Y2:2 Z1:0 Z2:3





MuSeaTunnel CDC, mamakisi anotumirwa kune vese vaverengi, kupatsanura vaverengi, vanyori, uye dzimwe node, imwe neimwe ichichengeta ndangariro yayo.