Nhanganyaya kuCDC (Shandura Data Capture) Shandura Dhata Capture (CDC) inzira inoshandiswa kuteedzera shanduko padanho remutsara mumabasa edhatabhesi (kupinza, kugadziridza, kudzima) uye kuzivisa mamwe masisitimu mukurongeka kwezviitiko. Mumamiriro ekudzoreredza njodzi, CDC inonyanya kuwiriranisa data pakati pekutanga uye dhatabhesi yekuchengetedza, ichigonesa chaiyo-nguva data syncing kubva kune yekutanga kuenda kune yechipiri dhatabhesi. source ----------> CDC ----------> sink Apache SeaTunnel CDC SeaTunnel CDC inopa marudzi maviri ekuyananisa data: : Inoverenga nhoroondo kubva patafura. Snapshot Read : Inoverenga yekuwedzera log shanduko kubva patafura. Kuwedzera Kutevera Kiya-Mahara Snapshot Synchronization Iyo yekuvhara-yemahara snapshot yekubatanidza chikamu inosimbiswa nekuti akawanda aripo eCDC mapuratifomu, akadai seDebezium, anogona kukiya matafura panguva yenhoroondo yekuwiriranisa data. Kuverenga kweSnapshot ndiyo maitiro ekuyananisa nhoroondo yedatabase data. Hwaro hwekufamba kweiyo nzira ndeiyi inotevera: storage -------------> splitEnumerator ---------- split ----------> reader ^ | | | \----------------- report -----------/ Split Partitioning (split distributor) inopatsanura data retafura kuita kupatsanurwa kwakawanda zvichienderana nenzvimbo dzakatarwa (senge ID yetafura kana makiyi akasiyana) uye saizi yenhanho yakatsanangurwa. splitEnumerator Parallel Processing Kupatsanurwa kwega kwega kunopihwa muverengi akasiyana kuti averenge zvakafanana. Muverengi mumwe chete anotora kubatana kumwe. Chiitiko Feedback Mushure basa rekuverenga rekupatsanura, muverengi wega wega anoshuma kufambira mberi kudzokera kusplitEnumerator. Iyo metadata yekuparadzanisa inopihwa sezvizvi: splitEnumerator String splitId # Routing ID TableId tableId # Table ID SeatunnelRowType splitKeyType # The type of field used for partitioning Object splitStart # Start point of the partition Object splitEnd # End point of the partition Kana muverengi angogamuchira ruzivo rwakapatsanurwa, inogadzira iyo yakakodzera SQL zvirevo. Isati yatanga, inoisa iyo yazvino kupatsanura inoenderana chinzvimbo mudura redatabase. Mushure mekupedza kupatsanurwa kwazvino, muverengi anoshuma kufambira mberi kune neiyo inotevera data: splitEnumerator String splitId # Split ID Offset highWatermark # Log position corresponding to the split, for future validation Incremental Synchronization Iyo yekuwedzera yekuwiriranisa chikamu inotanga mushure meiyo snapshot yekuverenga chikamu. Mune ino nhanho, chero shanduko inoitika mune sosi dhatabhesi inotorwa uye inowiriraniswa kune yekuchengetedza dhatabhesi munguva chaiyo. Ichi chikamu chinoteerera kune dhatabhesi log (semuenzaniso, MySQL binlog). Kuwedzera kwekutevera kunowanzo kumwechete-shinda kuti udzivise kudhirowa kudhonzwa kwebinlog uye kuderedza dhatabhesi mutoro. Naizvozvo, muverengi mumwe chete anoshandiswa, achitora kubatana kumwe. data log -------------> splitEnumerator ---------- split ----------> reader ^ | | | \----------------- report -----------/ Muchikamu chekuwedzera chekubatanidza, zvese zvakapatsanurwa uye matafura kubva pachikamu chechidimbu zvinosanganiswa kuita kupatsanurwa kumwe. Iyo kupatsanurwa metadata panguva ino ndeiyi inotevera: String splitId Offset startingOffset # The lowest log start position among all splits Offset endingOffset # Log end position, or "continuous" if ongoing, eg, in the incremental phase List<TableId> tableIds Map<TableId, Offset> tableWatermarks # Watermark for all splits List<CompletedSnapshotSplitInfo> completedSnapshotSplitInfos # Snapshot phase split details Iyo minda yakaita seiyi: CompletedSnapshotSplitInfo String splitId TableId tableId SeatunnelRowType splitKeyType Object splitStart Object splitEnd Offset watermark # Corresponds to the highWatermark in the report Iyo yakakamurwa muchikamu chekuwedzera ine watermark kune ese akapatsanurwa muchikamu chesnapshot. Iyo shoma watermark inosarudzwa senzvimbo yekutanga yekuwedzera kuwiriranisa. Chaizvo-Kamwe Semantics Ingave mumufananidzo wekuverenga kana chikamu chekuwedzera chekuverenga, dhatabhesi rinogonawo kuchinja kuti riwirirane. Isu tinovimbisa sei chaizvo kutumirwa kumwe chete? Snapshot Read Phase Muchikamu chekuverenga snapshot, semuenzaniso, kupatsanurwa kuri kuwiriraniswa apo shanduko dziri kuitika, sekuiswa kwemutsara , inogadziridza , uye kudzima . Kana pasina chiziviso chebasa chinoshandiswa panguva yekuverenga, zvigadziriso zvinogona kurasika. SeaTunnel inobata izvi ne: k3 k2 k1 Kutanga, kutarisa binlog chinzvimbo (yakaderera watermark) usati waverenga kupatsanura. Kuverenga data muchikamu . split{start, end} Kurekodha iyo yakakwira watermark mushure mekuverenga. Kana , data yekupatsanurwa haina kuchinja panguva yekuverenga. Kana , shanduko dzakaitika panguva yekugadzirisa. Mumamiriro ezvinhu akadaro, SeaTunnel icha: high = low (high - low) > 0 Cache iyo yakakamurwa data mundangariro semu-mundangariro tafura. Isa shanduko kubva kuenda mukurongeka, uchishandisa makiyi ekutanga kudzoreredza mashandiro patafura yendangariro. low watermark high watermark Taura iyo yakakwira watermark. insert k3 update k2 delete k1 | | | vvv bin log --|---------------------------------------------------|-- log offset low watermark high watermark CDC reads: k1 k3 k4 | Replays v Real data: k2 k3' k4 Incremental Phase Asati atanga chikamu chekuwedzera, SeaTunnel inotanga kusimbisa kupatsanurwa kwese kubva padanho rekutanga. Pakati pekuparadzaniswa, data inogona kuvandudzwa, semuenzaniso, kana zvinyorwa zvitsva zvakaiswa pakati pekuparadzanisa1 uye split2, zvinogona kupotsa panguva yechikamu chechidimbu. Kuti udzore iyi data pakati pekuparadzana, SeaTunnel inotevera nzira iyi: Kubva pane ese akakamurwa mishumo, tsvaga diki watermark seyekutanga watermark kuti utange kuverenga irogi. Kune yega yega yekupinda yekuverenga, tarisa kuti uone kana iyo data yakagadziriswa mune chero kupatsanurwa. Kana zvisina kudaro, inofungidzirwa kuti data pakati pekuparadzana uye inofanira kugadziriswa. completedSnapshotSplitInfos Kana zvese zvakapatsanurwa zvasimbiswa, maitiro acho anoenda kune yakazara chikamu chekuwedzera. |------------filter split2-----------------| |----filter split1------| data log -|-----------------------|------------------|----------------------------------|- log offset min watermark split1 watermark split2 watermark max watermark Checkpoint uye Resume Zvakadini nekumbomira uye kutangazve CDC? SeaTunnel inoshandisa yakagoverwa snapshot algorithm (Chandy-Lamport): Fungidzira kuti sisitimu ine maitiro maviri, uye , apo ine matatu akasiyana uye ine matatu akasiyana . Mamiriro ekutanga ndeaya anotevera: p1 p2 p1 X1 Y1 Z1 p2 X2 Y2 Z2 p1 p2 X1:0 X2:4 Y1:0 Y2:2 Z1:0 Z2:3 Panguva ino, inotanga mufananidzo wepasi rose. inotanga kurekodha maitiro ayo, yozotumira chiratidzo kune . p1 p1 p2 Chiratidzo chisati chasvika , inotumira meseji kune . p2 p2 M p1 p1 p2 X1:0 -------marker-------> X2:4 Y1:0 <---------M---------- Y2:2 Z1:0 Z2:3 Pakugamuchira chiratidzo, inorekodha mamiriro ayo, uye inogamuchira iyo meseji . Sezvo yakatoita snapshot yemunharaunda, inongoda kuisa meseji . Mufananidzo wekupedzisira unotaridzika seizvi: p2 p1 M p1 M p1 M p2 X1:0 X2:4 Y1:0 Y2:2 Z1:0 Z2:3 MuSeaTunnel CDC, mamakisi anotumirwa kune vese vaverengi, kupatsanura vaverengi, vanyori, uye dzimwe node, imwe neimwe ichichengeta ndangariro yayo.