paint-brush
Kubaka Ikiyaga kigezweho muri nyuma yisi ya Hadoopna@minio
Amateka mashya

Kubaka Ikiyaga kigezweho muri nyuma yisi ya Hadoop

na MinIO7m2024/09/13
Read on Terminal Reader

Birebire cyane; Gusoma

Uru rupapuro ruvuga kuzamuka no kugwa kwa Hadoop HDFS n'impamvu kubika ibintu byinshi-kubika ibintu ari umusimbura karemano mwisi nini.
featured image - Kubaka Ikiyaga kigezweho muri nyuma yisi ya Hadoop
MinIO HackerNoon profile picture


Uwiteka Datalake igezweho ni kimwe cya kabiri cyububiko bwamakuru hamwe nigice cyamakuru yikiyaga kandi ikoresha ububiko bwibintu kuri buri kintu. Gukoresha ububiko bwibintu kugirango wubake ububiko bwamakuru byashobotse nuburyo bwo gufungura imbonerahamwe (OTFs) nka Apache Iceberg, Apache Hudi, na Delta Lake, ibyo bikaba aribyo bisobanuro, bimaze gushyirwa mubikorwa, bituma bidafite aho bihuriye no kubika ibintu kugirango bikoreshwe nka igisubizo cyububiko bwibanze kububiko bwamakuru. Ibi bisobanuro biratanga kandi ibintu bidashobora kubaho mububiko busanzwe bwa Data - urugero, snapshots (bizwi kandi nkurugendo rwigihe), ubwihindurize bwibishushanyo, ibice, ubwihindurize, hamwe nishami rya zeru.


Mugihe amashyirahamwe yubaka Datalakes igezweho, dore bimwe mubintu byingenzi twibwira ko bagomba gutekereza:


  1. Gutandukanya kubara no kubika
  2. Kwimuka kuva murwego rwa monolithic kumurongo mwiza-wubwoko
  3. Guhuriza hamwe amakuru - gusimbuza amashami igisubizo hamwe nigisubizo kimwe cyibigo
  4. Imikorere idafite aho ihuriye na dosiye nini nini
  5. Porogaramu-isobanurwa, igicu-kavukire ibisubizo bipima neza


Uru rupapuro ruvuga ku kuzamuka no kugwa kwa Hadoop HDFS n'impamvu kubika ibintu byinshi-kubika ibintu ari umusimbura usanzwe mumakuru manini yisi.

Kurera Hadoop

Hamwe no kwagura porogaramu za interineti, ikibazo cya mbere gikomeye cyo kubika amakuru no gukusanya hamwe n’amasosiyete y’ikoranabuhanga yateye imbere yatangiye mu myaka 15 ishize. Gakondo ya RDBMS (Sisitemu yo gucunga amakuru yububiko) ntishobora gupimwa kugirango yegere amakuru menshi. Nyuma haje Hadoop, moderi nini cyane. Muri moderi ya Hadoop, umubare munini wamakuru ugabanijwemo imashini nyinshi zidahenze muri cluster hanyuma igatunganyirizwa hamwe. Umubare wizi mashini cyangwa node zirashobora kwiyongera cyangwa kugabanuka nkuko bisabwa na entreprise.


Hadoop yari isoko ifunguye kandi yakoresheje ibikoresho byigiciro cyigiciro cyinshi, gitanga icyitegererezo cyigiciro cyinshi, bitandukanye nububiko busanzwe bwimibanire, busaba ibyuma bihenze kandi bitunganijwe neza kugirango bikemure amakuru manini. Kubera ko byari bihenze cyane gupima urugero rwa RDBMS, ibigo byatangiye gukuraho amakuru yibanze. Ibi byatumye habaho suboptimal ibisubizo mubice byinshi.


Ni muri urwo rwego, Hadoop yatanze inyungu zikomeye kurenza uburyo bwa RDBMS. Byari binini cyane uhereye kubiciro, nta gutamba imikorere.

Iherezo rya Hadoop

Kuza kwa tekinolojiya mishya nko guhindura amakuru (CDC) no gukwirakwiza amakuru, cyane cyane biva mubigo byimbuga nkoranyambaga nka Twitter na Facebook, byahinduye uburyo amakuru yinjira kandi abikwa. Ibi byakuruye imbogamizi mugutunganya no gukoresha iyi nini nini cyane yamakuru.


Ikibazo cyingenzi cyari ugutunganya ibyiciro. Inzira zikorwa zikora inyuma kandi ntizikorana numukoresha. Hadoop yakoraga neza mugutunganya ibyiciro iyo bigeze kumadosiye manini cyane ariko yababajwe namadosiye mato - haba muburyo bwo gukora neza ndetse no gutinda - byerekana neza ko yataye igihe mugihe ibigo byashakishaga uburyo bwo gutunganya no gukoresha ibicuruzwa bishobora kwinjiza imibare itandukanye kandi nini ntoya mubice, CDC, nigihe-nyacyo.


Gutandukanya kubara no kubika birumvikana gusa uyumunsi. Ububiko bugomba gusumbya kubara kugeza ku icumi kugeza kuri imwe. Ibi ntibikora neza mwisi ya Hadoop, aho ukeneye compte imwe ya compte kuri buri bubiko. Kubatandukanya bivuze ko bishobora guhuzwa kugiti cye. Imibare ya compte idafite ubwenegihugu kandi irashobora gutezimbere hamwe na CPU nyinshi hamwe nibuka. Ububiko bwibanze burashobora kuba bwiza kandi burashobora kuba I / O gutezimbere hamwe numubare munini wa drives ya denser hamwe numuyoboro mwinshi.


Mugutandukanya, ibigo birashobora kugera kubukungu bwisumbuyeho, gucunga neza, kunoza ubunini, no kuzamura igiciro cya nyirubwite.


HDFS ntishobora gukora iyi nzibacyuho. Iyo usize amakuru yaho, imbaraga za Hadoop HDFS ziba intege nke zayo. Hadoop yakorewe mudasobwa ya MapReduce, aho amakuru na compte byagombaga kuba hamwe. Nkigisubizo, Hadoop ikeneye gahunda yayo yakazi, umuyobozi ushinzwe umutungo, kubika, no kubara. Ibi ntaho bihuriye rwose nububiko bushingiye kubintu, aho ibintu byose byoroshye, biremereye, kandi bikodeshwa byinshi.


Ibinyuranye, MinIO yavutse igicu kavukire kandi yagenewe kontineri na orchestre ikoresheje Kubernetes, bigatuma ikoranabuhanga ryiza ryo kwimuka mugihe cyo gusezerera umurage HDFS.


Ibi byatanze amakuru agezweho ya Datalake. Ifashisha gukoresha uburyo bwibikoresho byibikoresho byarazwe na Hadoop ariko ntibitandukanya kubika no kubara - bityo bigahindura uburyo amakuru yatunganijwe, asesengurwa, kandi akoreshwa.

Kubaka Ikiyaga kigezweho hamwe na MinIO

MinIO nuburyo bukoreshwa cyane mububiko bwibikoresho byubatswe kuva kera kugirango bibe binini kandi bivuka-kavukire. Itsinda ryubatse MinIO ryanubatse imwe muri sisitemu ya dosiye yatsinze cyane, GlusterFS, mbere yo guhindura imitekerereze yabo kububiko. Gusobanukirwa kwimbitse kwa sisitemu ya dosiye nibikorwa bihenze cyangwa bidakora neza bamenyesheje imyubakire ya MinIO, itanga imikorere n'ubworoherane mubikorwa.


Minio ikoresha gusiba code kandi itanga urutonde rwiza rwa algorithms yo gucunga neza ububiko no gutanga imbaraga. Mubisanzwe, ni kopi inshuro 1.5, bitandukanye ninshuro 3 muri cluster ya Hadoop. Ibi byonyine bimaze gutanga uburyo bwo kubika no kugabanya ibiciro ugereranije na Hadoop.


Kuva yatangira, MinIO yagenewe moderi ikora igicu. Nkigisubizo, ikora kuri buri gicu - rusange, ibyigenga, kuri-prem, ibyuma byambaye ubusa, no kumpera. Ibi bituma biba byiza kubicu byinshi hamwe na Hybrid-ibicu byoherejwe. Hamwe na Hybrid iboneza, MinIO ituma kwimuka kwamakuru yisesengura hamwe nubumenyi bwa siyanse yimirimo ikurikije inzira nka Igishushanyo cy'umutini gukundwa na Martin Fowler.


Hano hari izindi mpamvu nyinshi zituma MinIO aribwo buryo bwibanze bwubaka Datalake igezweho ishobora gushyigikira ibikorwa remezo bya IA kimwe nindi mirimo yisesengura nkubwenge bwubucuruzi, isesengura ryamakuru, hamwe nubumenyi bwamakuru.

Amakuru agezweho yiteguye

Hadoop yari igamije intego zamakuru aho "amakuru atubatswe" bisobanura amadosiye manini (GiB kugeza TiB-nini). Iyo ikoreshejwe nkibikoresho rusange byo kubika aho amakuru yukuri atubatswe arimo gukina, ubwinshi bwibintu bito (KB kugeza MB) byangiza cyane Hadoop HDFS, kuko amazina yizina atigeze agenerwa gupima muri ubu buryo. MinIO iruta kuri dosiye iyo ari yo yose (8KiB kugeza 5TiB).

Gufungura isoko

Ibigo byakiriye Hadoop byabikoze biturutse ku ikoranabuhanga rifunguye. Ubushobozi bwo kugenzura, umudendezo wo gufunga, no guhumurizwa biva mubihumbi icumi byabakoresha, bifite agaciro nyako. MinIO nayo ni isoko ifunguye 100%, yemeza ko amashyirahamwe ashobora gukomeza intego zayo mugihe azamura uburambe.

Biroroshye

Kwiyoroshya biragoye. Bisaba akazi, indero, kandi ikiruta byose, kwiyemeza. Ubworoherane bwa MinIO ni umugani kandi ni ibisubizo byubwitange bwa filozofiya yo koroshya software yacu gukoresha, gukoresha, kuzamura, no gupima. Nabafana ba Hadoop bazakubwira ko bigoye. Kugirango ukore byinshi hamwe na bike, ugomba kwimukira muri MinIO.

Umuhanzi

Hadoop yazamutse cyane kubera ubushobozi bwayo bwo gutanga amakuru manini. Byari, mugice cyiza cyimyaka icumi, igipimo cyibikorwa byo gusesengura imishinga. Ntibikiriho. MinIO yerekanye byinshi ibipimo ko byihuta mubintu kuruta Hadoop. Ibi bivuze imikorere myiza ya Datalake yawe igezweho.

Umucyo

Seriveri ya MinIO binary ni yose ya <100MB. Nubunini bwayo, irakomeye bihagije kugirango ikore data center, nyamara iracyari nto bihagije kugirango ubeho neza kuruhande. Nta bundi buryo bushoboka kwisi ya Hadoop. Icyo bivuze mubigo nuko S3 yawe ishobora kubona amakuru aho ariho hose, igihe icyo aricyo cyose, hamwe na API imwe. Mugukoresha MinIO kumwanya wanyuma, urashobora gufata no kuyungurura amakuru kuruhande hanyuma ugakoresha ubushobozi bwo kwigana MinIO kugirango ubyohereze kuri Datalake yawe igezweho kugirango ikusanyirizwe hamwe nibindi bisesengura.

Kwihangana

MinIO irinda amakuru hamwe na buri kintu, inode yo gusiba kode, ikora neza cyane kuruta ubundi buryo bwa HDFS bwaje nyuma yo kwigana kandi butigeze buboneka. Byongeye kandi, gutahura bitrot ya MinIO yemeza ko itazigera isoma amakuru yangiritse - gufata no gukiza ibintu byangiritse ku isazi. MinIO nayo ishyigikira kwambukiranya akarere, gukora-gukora kwigana. Hanyuma, MinIO ishyigikira uburyo bwuzuye bwo gufunga ibintu bitanga byemewe n'amategeko no kugumana (hamwe nimiyoborere nuburyo bwo kubahiriza).

Porogaramu Yasobanuwe

Uzasimbura Hadoop HDFS ntabwo ari ibikoresho byuma; ni software ikora kubikoresho byibicuruzwa. Nibyo MinIO aribyo - software. Kimwe na Hadoop HDFS, MinIO yagenewe gukoresha neza seriveri y'ibicuruzwa. Hamwe nubushobozi bwo gukoresha disiki ya NVMe hamwe numuyoboro wa 100 GbE, MinIO irashobora kugabanya ikigo cyamakuru - kunoza imikorere no gucunga neza.

Umutekano

MinIO ishyigikira byinshi, bihanitse bya seriveri kuruhande rwibanga kugirango irinde amakuru - aho yaba iri hose - mu ndege cyangwa kuruhuka. Uburyo bwa MinIO bwizeza ibanga, ubunyangamugayo, nukuri hamwe nibikorwa bidahwitse hejuru. Seriveri-kuruhande hamwe nabakiriya kuruhande rwibanga rushyigikirwa ukoresheje AES-256-GCM, ChaCha20-Poly1305, na AES-CBC, byemeza guhuza porogaramu. Byongeye kandi, MinIO ishyigikira inganda ziyobora sisitemu zo kuyobora (KMS).

Kwimuka uva Hadoop muri MinIO

Itsinda rya MinIO rifite ubuhanga bwo kuva muri HDFS muri MinIO. Abakiriya bagura uruhushya rwa Enterprises barashobora kubona ubufasha kubashakashatsi bacu. Kugira ngo umenye byinshi kubyerekeye gukoresha MinIO kugirango usimbuze HDFS reba iki cyegeranyo cyibikoresho .

Umwanzuro

Buri kigo ni ikigo cyamakuru muri iki gihe. Ububiko bwayo makuru hamwe nisesengura ryakurikiyeho bigomba kuba bidafite icyerekezo, binini, bifite umutekano, kandi bikora. Ibikoresho byo gusesengura byatewe na ecosystem ya Hadoop, nka Spark, birakora neza kandi neza iyo bihujwe nububiko bushingiye ku biyaga bishingiye ku biyaga. Tekinoroji nka Flink itezimbere imikorere muri rusange kuko itanga igihe kimwe cyo gukora cyo gutembera kimwe no gutunganya ibyiciro bitagenze neza muburyo bwa HDFS. Ibikorwa nka Apache Arrow birasobanura uburyo amakuru abikwa kandi atunganywa, kandi Iceberg na Hudi barimo gusobanura uburyo imiterere yimbonerahamwe yemerera kubaza amakuru neza.


Izi tekinoroji zose zisaba ikigezweho, kubika ibintu bishingiye ku kiyaga cya data aho kubara no kubika bitandukanijwe kandi akazi-keza. Niba ufite ikibazo mugihe wubaka ikiyaga cyawe cya kijyambere, nyamuneka utugereho [email protected] cyangwa kuri twe Ubunebwe umuyoboro.

L O A D I N G
. . . comments & more!

About Author

MinIO HackerNoon profile picture
MinIO@minio
MinIO is a high-performance, cloud-native object store that runs anywhere (public cloud, private cloud, colo, onprem).

HANG TAGS

IYI ngingo YATANZWE MU...