Mid ka mid ah xirfadaha muhiimka ah ee xirfadlaha xogta la dhammeeyay waa maaraynta wax ku oolka ah ee kaydinta xogta waaweyn, hubinta tayada xogta iyo isku halaynta. Xogtu waa udub dhexaadka iyo qaybta aasaasiga ah ee nidaam kasta oo xog ah, iyo xirfad kasta oo wanaagsan oo aad ku leedahay dhinacyada kale ee ganacsigeena, tani waa mid aadan awoodin inaad iska indhatirto. Maqaalkan, waxaan ku sahamiyaa farsamooyin adag oo lagu sameeyo hubinta QA ee kaydinta xogta waaweyn iyadoo la adeegsanayo maktabadda Deequ iyo hababka tirakoobka. Marka la isku daro hababka aan hoos ku sharaxo, waxaad awoodi doontaa inaad ilaaliso daacadnimada xogta, xoojiso dhaqamada maaraynta xogtaada, oo aad ka hortagto arrimaha suurtagalka ah ee codsiyada hoose. QA waxay hubisaa Isticmaalka Maktabadda Deequ Maxaa Deequ? Xaqiijinta tayada xogta miisaankeedu waa hawl culus, gaar ahaan marka la macaamilayo balaayiin saf ah oo lagu kaydiyay nidaamyada faylka la qaybiyay ama bakhaarada xogta. Maktabadda Deequ waa xog-ururin il furan iyo qaab QA ah oo lagu dhisay Spark oo ah qalab casri ah oo la taaban karo oo loogu talagalay in lagu xalliyo dhibaatadan. Waxa ay ka duwan tahay qalabyada la midka ah waa awoodda ay u leedahay in ay si aan kala go 'lahayn ula midowdo Spark, iyada oo ka faa'iidaysanaysa awoodda wax-qabad ee la qaybiyay si wax-ku-ool ah loogu maareeyo kaydka xogta baaxadda leh. Markaad tijaabiso, waxaad arki doontaa sida dabacsanaanteedu kuu ogolaanayso inaad qeexdo xeerarka ansixinta kakan ee ku habboon shuruudahaaga gaarka ah, iyadoo hubinaysa caymis dhammaystiran. Intaa waxaa dheer, Deequ waxa uu leeyahay cabbirro ballaadhan iyo awoodo ogaanshaha cilladaha kuwaas oo kaa caawin doona inaad aqoonsato oo aad si firfircoon wax uga qabato arrimaha tayada xogta. Xirfadlayaasha xogta ee ku shaqeeya xog-ururin waaweyn oo firfircoon, Deequ waa xalka mindida Swiss. Aan aragno sida aan u isticmaali karno. Dejinta Deequ Faahfaahin dheeraad ah oo ku saabsan habaynta maktabadda Deequ iyo isticmaalka kiisaska ku saabsan xog-ururinta ayaa laga heli karaa. Fududnaanta awgeed, tusaalahan, waxaanu soo saarnay dhawr diiwaan oo carruurtu ku ciyaarto: halkan val rdd = spark.sparkContext.parallelize(Seq( Item(1, "Thingy A", "awesome thing.", "high", 0), Item(2, "Thingy B", "available at http://thingb.com", null, 0), Item(3, null, null, "low", 5), Item(4, "Thingy D", "checkout https://thingd.ca", "low", 10), Item(5, "Thingy E", null, "high", 12))) val data = spark.createDataFrame(rdd) Qeexida Malaha Xogta Inta badan codsiyada xogta waxay la yimaadaan malo-awaal aan toos ahayn oo ku saabsan sifooyinka xogta, sida qiyamka aan NULL ahayn iyo gaarnimada. Deequ, malo-awaaladaasi waxay si cad u noqdaan imtixaanada cutubka. Waa kuwan qaar ka mid ah jeegaga caadiga ah: Hubi in xogta xogta ay ka kooban tahay tiro saf ah oo gaar ah. Tirinta safka: Hubi in sifada sida id iyo magaca wax soo saarka aanay waligood waxba ka jirin. Dhamaystirka sifada: Hubi in sifooyinka qaarkood, sida id, ay gaar yihiin. Sifada Gaarka ah: Xaqiiji in sifooyinka sida mudnaanta iyo numViews ay ku dhacaan xadka la filayo. Qiyaasta Qiimaha: Xaqiiji in sharraxaadaha ay ku jiraan URL-yada marka la filayo. Isbarbardhigga Qaabka: Hubi in dhexda sifada tirooyinka ay buuxiyaan shuruudo gaar ah. Guryaha Tirakoobka: Waa kan sida aad u hirgelin karto jeegaggaan adigoo isticmaalaya Deequ: import com.amazon.deequ.VerificationSuite import com.amazon.deequ.checks.{Check, CheckLevel, CheckStatus} val verificationResult = VerificationSuite() .onData(data) .addCheck( Check(CheckLevel.Error, "unit testing my data") .hasSize(_ == 5) // we expect 5 rows .isComplete("id") // should never be NULL .isUnique("id") // should not contain duplicates .isComplete("productName") // should never be NULL // should only contain the values "high" and "low" .isContainedIn("priority", Array("high", "low")) .isNonNegative("numViews") // should not contain negative values // at least half of the descriptions should contain a url .containsURL("description", _ >= 0.5) // half of the items should have less than 10 views .hasApproxQuantile("numViews", 0.5, _ <= 10)) .run() Natiijooyinka Tarjumaadda Ka dib markii uu sameeyo jeegagyadan, Deequ waxay u tarjuntaa shaqooyin taxane ah oo Spark ah, kuwaas oo ay fuliso si ay u xisaabiso mitirka xogta. Dabadeed, waxay ku baaqaysaa hawlahaaga caddaynta (tusaale, _== 5 ee cabbirka hubinta) cabbirahan si loo arko haddii xannibaaduhu ay hayaan xogta. Waxaan baari karnaa shayga "verificationResult" si aan u aragno haddii imtixaanku uu helay khaladaad: import com.amazon.deequ.constraints.ConstraintStatus if (verificationResult.status == CheckStatus.Success) { println("The data passed the test, everything is fine!") } else { println("We found errors in the data:\n") val resultsForAllConstraints = verificationResult.checkResults .flatMap { case (_, checkResult) => checkResult.constraintResults } resultsForAllConstraints .filter { _.status != ConstraintStatus.Success } .foreach { result => println(s"${result.constraint}: ${result.message.get}") } } Haddii aan wadno tusaalaha, waxaan helnaa wax soo saarka soo socda: We found errors in the data: CompletenessConstraint(Completeness(productName)): Value: 0.8 does not meet the requirement! PatternConstraint(containsURL(description)): Value: 0.4 does not meet the requirement! Tijaabadu waxay ogaatay in malahayaga la jabiyay! Kaliya 4 ka mid ah 5 (80%) ee qiyamka sheyga sifada Magaca ayaa ah mid aan waxba ka jirin, kaliya 2 ka mid ah 5 (ie, 40%) qiimayaasha sifada ayaa ka kooban URL. Nasiib wanaag, imtixaan ayaanu galnay oo waxaan helnay khaladaadka; qof waa inuu isla markiiba hagaajiyaa xogta! QA waxay ku hubisaa hababka tirakoobka Iyadoo Deequ ay bixiso qaab-dhismeed adag oo loogu talagalay xaqiijinta xogta, isku-darka hababka tirakoobka ayaa sii wanaajin kara hubintaada QA, gaar ahaan haddii aad la macaamilayso cabbirrada la isku daray ee xogta. Aan aragno sida aad u shaqaaleysiin karto hababka tirakoobka si aad ula socoto una hubiso tayada xogta. Dabagalka Tirada Diiwaangelinta Tixgeli xaalad ganacsi halkaas oo habka ETL (Soosaar, Beddel, Culays) uu soo saaro diiwaannada N ee shaqo maalinle ah loo qorsheeyay. Kooxaha taageerada ayaa laga yaabaa inay rabaan inay dejiyaan jeegaga QA si ay kor ugu qaadaan digniinta haddii uu jiro leexasho weyn oo ku yimid tirinta diiwaanka. Tusaale ahaan, haddii nidaamku caadi ahaan soo saaro inta u dhaxaysa 9,500 ilaa 10,500 diiwaanada maalin kasta laba bilood gudahood, koror kasta oo weyn ama hoos u dhac kasta wuxuu muujin karaa arrin ku saabsan xogta hoose. Waxaan isticmaali karnaa habka tirakoobka si aan u qeexno heerkan habka ay tahay inuu kor u qaado digniinta kooxda taageerada. Hoos waxaa ku yaal sawirka raadraaca tirinta diiwaanka muddo laba bilood ah: Si loo falanqeeyo tan, waxaan bedeli karnaa xogta tirinta diiwaanka si aan u ilaalino isbeddelada maalinlaha ah. Isbeddelladani guud ahaan waxay ku wareegaan eber, sida ku cad shaxda soo socota: Marka aan ku matalo heerkan isbeddelka qaybinta caadiga ah, waxay samaysaa qalooca gambaleelka, taasoo muujinaysa in xogta si caadi ah loo qaybiyay. Isbeddelka la filayo wuxuu ku dhow yahay 0%, oo leh isbeddel caadi ah 2.63%. Falanqayntan waxay soo jeedinaysaa in tirinta diiwaanku caadi ahaan hoos ugu dhacdo -5.26% ilaa +5.25% kala duwan oo leh 90% kalsooni. Iyada oo taas ku saleysan, waxaad dejin kartaa xeer aad kor ugu qaadayso digniin haddii tirinta diiwaanku ka weecdo xadkan, hubinta faragelinta waqtiga. Dabagalka Sifada Daboolista e waxa ay tilmaamaysaa saamiga aan NULL ahayn ee wadarta tirinta diiwaanka ee sawir-qaadista xogta. Tusaale ahaan, haddii 8 ka mid ah 100 diiwaanadu ay leeyihiin qiimo NULL ah sifo gaar ah, caymiska sifadaas waa 92%. Sifada coverag Aynu dib u eegno kiis ganacsi oo kale oo leh habka ETL ee soo saaraya sawir-qaade badeecad ah maalin kasta. Waxaan rabnaa in aan la socono daboolka sifooyinka sharaxaadda alaabta. Haddii caymisku hoos uga dhaco heer go'an, digniin waa in loo sameeyaa kooxda taageerada. Hoos waxaa ku yaal matalaad muuqaal ah oo ku saabsan daboolida sifo ee sharraxaadaha alaabta muddo laba bilood ah: Anagoo falanqaynayna kala duwanaanshaha maalinba-maalin ee caymiska, waxaanu aragnaa in isbedeladu ku wareegaan eber: U metela xogtan sida qaybinta caadiga ah waxay muujinaysaa in sida caadiga ah loo qaybiyo iyadoo la filayo isbeddel la filayo oo ku dhow 0% iyo leexashada caadiga ah ee 2.45%. Sida aan aragno, xogtan, sifada sifada daboolida sifada waxay u dhaxaysaa -4.9% ilaa +4.9% oo leh 90% kalsooni. Iyada oo ku saleysan tilmaan-bixiyahan, waxaan dejin karnaa sharci si kor loogu qaado digniinta haddii caymisku ka leexdo xadkan. QA waxay ku hubisaa Algorithms Taxanaha Wakhtiga Haddii aad la shaqeyso xog-ururin muujinaya kala duwanaansho la taaban karo oo ay ugu wacan tahay arrimo ay ka mid yihiin xilli-xilliyeedka ama isbeddellada, hababka tirakoobka ee dhaqameed waxay dhalin karaan digniino been ah. Algorithms-yada taxanaha wakhtigu waxay bixiyaan hab la sifeeyay, hagaajinta saxnaanta iyo isku halaynta hubintaada QA. Si aad u soo saarto digniino macquul ah, waxaad isticmaali kartaa midkood ama ka . Midka hore ayaa ku filan xog-ururinta oo leh isbeddellada, laakiin kan dambe wuxuu noo ogolaanayaa inaan wax ka qabanno xog-ururinta isbeddellada iyo xilliyada labadaba. Habkani wuxuu u adeegsadaa qaybaha heerka, isbeddelka, iyo xilliyada, taas oo u oggolaanaysa inay si dabacsan ula qabsato isbeddellada waqtiga. Celceliska Dhaqdhaqaaqa Isku-dhafan ee Aatooregressive (ARIMA) Habka Holt-Winters Aynu ku jeesjeesno-qaab iibinta maalinlaha ah ee soo bandhiga isbeddellada iyo qaababka xilliyeedka iyadoo la isticmaalayo Holt-Winters: import pandas as pd from statsmodels.tsa.holtwinters import ExponentialSmoothing # Load and preprocess the dataset data = pd.read_csv('sales_data.csv', index_col='date', parse_dates=True) data = data.asfreq('D').fillna(method='ffill') # Fit the Holt-Winters model model = ExponentialSmoothing(data, trend='add', seasonal='add', seasonal_periods=365) fit = model.fit() # Forecast and detect anomalies forecast = fit.fittedvalues residuals = data - forecast threshold = 3 * residuals.std() anomalies = residuals[abs(residuals) > threshold] print("Anomalies detected:") print(anomalies) Isticmaalka habkan, waxaad ogaan kartaa weecsanaan muhiim ah oo muujin kara arrimaha tayada xogta, iyadoo siinaya hab ka sii qotodheer hubinta QA. Waxaan rajeynayaa in maqaalkani uu kaa caawin doono inaad si hufan u hirgeliso hubinta QA ee kaydintaada waaweyn. Adigoo isticmaalaya maktabadda Deequ iyo isku dhafka hababka tirakoobka iyo algorithms-ka waqtiga, waxaad xaqiijin kartaa daacadnimada xogta iyo isku halaynta, ugu dambeyntii kor u qaadida hab-dhaqannada maaraynta xogtaada. Hirgelinta farsamooyinka kor lagu sharaxay waxay kaa caawin doontaa inaad ka hortagto arrimaha suurtagalka ah ee codsiyada hoose iyo hagaajinta tayada guud ee socodka xogtaada.