paint-brush
Ku Kuma Outlier: Leswi U Faneleke Ku Swi Tivahi@nataliaogneva
54,594 ku hlayiwa
54,594 ku hlayiwa

Ku Kuma Outlier: Leswi U Faneleke Ku Swi Tiva

hi Natalia Ogneva4m2024/04/23
Read on Terminal Reader
Read this story w/o Javascript

Ku leha ngopfu; Ku hlaya

Vakambisisi va tala ku hlangana na swilo leswi nga riki swa nkoka eka datha hi nkarhi wa ntirho wa vona. Swiboho hi ntolovelo swi seketeriwa eka xiringaniso xa xikombiso, lexi nga na vuxiyaxiya swinene eka swilo leswi nga ehandle. I swa nkoka swinene ku lawula swilo leswi nga riki swa nkoka ku teka xiboho lexinene. A hi kambisiseni tindlela to hlayanyana to olova ni leti hatlisaka to tirha hi mimpimanyeto leyi nga tolovelekangiki.

Companies Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - Ku Kuma Outlier: Leswi U Faneleke Ku Swi Tiva
Natalia Ogneva HackerNoon profile picture

Vakambisisi va tala ku hlangana na swilo leswi nga ehandle eka datha hi nkarhi wa ntirho wa vona, ku fana na hi nkarhi wa nxopaxopo wa AB-test, ku tumbuluxa timodeli to vhumbha, kumbe ku landzelerisa mikhuva. Swiboho hi ntolovelo swi seketeriwa eka xiringaniso xa xikombiso, lexi nga na vuxiyaxiya swinene eka swilo swa le handle naswona xi nga cinca swinene nkoka. Kutani, i swa nkoka swinene ku lawula swilo leswi nga ehandle ku teka xiboho lexinene.


A hi kambisiseni tindlela to hlayanyana to olova ni leti hatlisaka to tirha hi mimpimanyeto leyi nga tolovelekangiki.

Ku Vumbiwa ka Xiphiqo

Anakanya leswaku u fanele ku endla nxopaxopo wa xikambelo hi ku tirhisa ntikelo wa oda ya xikarhi tanihi metric ya masungulo. A hi nge metric ya hina hi ntolovelo yi na ku hangalasiwa ka ntolovelo. Nakambe, ha swi tiva leswaku ku hangalasiwa ka metric eka ntlawa wa xikambelo ku hambanile na loku nga eka vulawuri. Hi marito man’wana, xiringaniso xa ku hangalasiwa eka vulawuri i 10, naswona eka xikambelo i 12. Ku hambuka ka ntolovelo eka mintlawa leyimbirhi i 3.


Hambiswiritano, tisampulu leti hatimbirhi tina ti outliers leti skew ti sample means na sample standard deviation.

 import numpy as np N = 1000 mean_1 = 10 std_1 = 3 mean_2 = 12 std_2 = 3 x1 = np.concatenate((np.random.normal(mean_1, std_1, N), 10 * np.random.random_sample(50) + 20)) x2 = np.concatenate((np.random.normal(mean_2, std_2, N), 4 * np.random.random_sample(50) + 1))

NB leswaku ku tekela enhlokweni metric swi nga va na ti outliers ku suka eka matlhelo hamambirhi. Loko metric ya wena yingava na ti outliers ntsena kusuka eka tlhelo rin’we, maendlelo yanga hundzuriwa hiku olova eka xikongomelo xexo.

Ku Tsema Misisi

Ndlela yo olova i ku tsema swibumabumelo hinkwaswo emahlweni ka phesente ya 5% na le ndzhaku ka phesente ya 95% . Eka mhaka leyi, hi lahlekeriwe hi 10% wa mahungu tanihi con. Hambiswiritano, ku hangalasiwa ku languteka ku vumbiwile ngopfu, naswona minkarhi ya xikombiso yi le kusuhi na minkarhi ya ku hangalasiwa.

 import numpy as np x1_5pct = np.percentile(x1, 5) x1_95pct = np.percentile(x1, 95) x1_cutted = [i for i in x1 if i > x1_5pct and i < x1_95pct] x2_5pct = np.percentile(x2, 5) x2_95pct = np.percentile(x2, 95) x2_cutted = [i for i in x2 if i > x2_5pct and i < x2_95pct]


Ndlela yin'wana i ku susa swibumabumelo ehandle ka ndhawu yo karhi . Band ya le hansi yi ringana na 25% percentile ku susiwa hafu yin’we ya interquartile range, naswona band ya le henhla yi ringana na 75% percentile ku engeteriwa hafu yin’we. Laha, hi ta lahlekeriwa hi 0.7% ntsena wa vuxokoxoko. Ku hangalasiwa ku languteka ku vumbiwile ku tlula ko sungula. Ti moments ta xikombiso ti ringana swinene na ti moments ta distribution.

 import numpy as np low_band_1 = np.percentile(x1, 25) - 1.5 * np.std(x1) high_band_1 = np.percentile(x1, 75) + 1.5 * np.std(x1) x1_cutted = [i for i in x1 if i > low_band_1 and i < high_band_1] low_band_2 = np.percentile(x2, 25) - 1.5 * np.std(x2) high_band_2 = np.percentile(x2, 75) + 1.5 * np.std(x2) x2_cutted = [i for i in x2 if i > low_band_2 and i < high_band_2]

Xitirhisiwa xa ku pfula

Ndlela ya vumbirhi leyi hi yi langutiseke laha i bootstrap. Eka endlelo leri, xiringaniso xi akiwa ku fana na xiringaniso xa swikombiso leswitsongo. Eka xikombiso xa hina, xiringaniso eka ntlawa wa vulawuri xi ringana na 10.35, naswona ntlawa wa xikambelo i 11.78. Ya ha ri mbuyelo wo antswa loko wu pimanisiwa na ku lulamisiwa ka datha loku engetelekeke.

 import pandas as pd def create_bootstrap_samples( sample_list: np.array, sample_size: int, n_samples: int ): # create a list for sample means sample_means = [] # loop n_samples times for i in range(n_samples): # create a bootstrap sample of sample_size with replacement bootstrap_sample = pd.Series(sample_list).sample(n = sample_size, replace = True) # calculate the bootstrap sample mean sample_mean = bootstrap_sample.mean() # add this sample mean to the sample means list sample_means.append(sample_mean) return pd.Series(sample_means) (create_bootstrap_samples(x1, len(x1), 1000).mean(), create_bootstrap_samples(x2, len(x2), 1000).mean())

Mahetelelo

Ku kumiwa ka outlier na ku lulamisiwa i swa nkoka eka ku teka xiboho lexinene. Sweswi, kwalomu ka maendlelo manharhu yo hatlisa no kongoma ya nga ku pfuna ku kambela datha u nga si kambisisa.


Hambiswiritano, i swa nkoka ku tsundzuka leswaku swilo leswi nga riki swa nkoka leswi kumiweke swi nga va mimpimo leyi nga tolovelekangiki na xivumbeko xa mbuyelo wa swilo leswintshwa. Kambe i xitori xin'wana :)