paint-brush
Ukufunyanwa kwangaphandle: Into ekufuneka uyazinge@nataliaogneva
54,579 ukufunda
54,579 ukufunda

Ukufunyanwa kwangaphandle: Into ekufuneka uyazi

nge Natalia Ogneva4m2024/04/23
Read on Terminal Reader
Read this story w/o Javascript

Inde kakhulu; Ukufunda

Abahlalutyi bahlala bedibana nabangaphandle kwidatha ngexesha lomsebenzi wabo. Izigqibo zidla ngokusekwe kwisampulu yentsingiselo, enovakalelo kakhulu kwabangaphandle. Kubalulekile ukulawula abangaphandle ukwenza isigqibo esichanekileyo. Makhe siqwalasele iindlela ezininzi ezilula nezikhawulezayo zokusebenza ngamaxabiso angaqhelekanga.

Companies Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - Ukufunyanwa kwangaphandle: Into ekufuneka uyazi
Natalia Ogneva HackerNoon profile picture

Abahlalutyi bahlala bedibana nabangaphandle kwidatha ngexesha lomsebenzi wabo, njengaxa kuhlalutya uvavanyo lwe-AB, ukwenza iimodeli ezixelwe kwangaphambili, okanye iindlela zokulandelela. Izigqibo zidla ngokusekwe kwisampulu yentsingiselo, enovakalelo kakhulu kubangaphandle kwaye inokutshintsha kakhulu ixabiso. Ke, kubalulekile ukulawula abangaphandle ukwenza isigqibo esichanekileyo.


Makhe siqwalasele iindlela ezininzi ezilula nezikhawulezayo zokusebenza ngamaxabiso angaqhelekanga.

Ukuqulunqwa kweNgxaki

Khawucinge ukuba kufuneka uqhube uhlalutyo lovavanyo usebenzisa i-avareji ye-odolo yexabiso njenge-metric ephambili. Masithi i-metric yethu idla ngokuba nolwabiwo oluqhelekileyo. Kwakhona, siyazi ukuba ukuhanjiswa kweemetric kwiqela lovavanyo kwahlukile kuleyo ikulawulo. Ngamanye amazwi, intsingiselo yokusabalalisa kulawulo yi-10, kwaye kuvavanyo ngu-12. Ukuphambuka okusemgangathweni kumaqela omabini yi-3.


Nangona kunjalo, zombini iisampulu zineemveliso eziphambukayo kwiindlela zesampulu kunye nokutenxa okusemgangathweni kwesampulu.

 import numpy as np N = 1000 mean_1 = 10 std_1 = 3 mean_2 = 12 std_2 = 3 x1 = np.concatenate((np.random.normal(mean_1, std_1, N), 10 * np.random.random_sample(50) + 20)) x2 = np.concatenate((np.random.normal(mean_2, std_2, N), 4 * np.random.random_sample(50) + 1))

QAPHELA ukuba ukuthathela ingqalelo imetriki kunokuba nezinto eziphuma kumacala omabini. Ukuba i-metric yakho inokuba nabangaphandle kuphela kwicala elinye, iindlela zinokuguqulwa ngokulula ngaloo njongo.

Sika Imisila

Eyona ndlela ilula kukuqhawula yonke imigqaliselo phambi kwe-5% yepesenti nasemva kwe-95% yepesenti . Kule meko, silahlekelwe yi-10% yolwazi njenge-con. Nangona kunjalo, ukuhanjiswa kujongeka kuyenziwa ngakumbi, kwaye amaxesha eisampulu asondele kumaxesha okuhambisa.

 import numpy as np x1_5pct = np.percentile(x1, 5) x1_95pct = np.percentile(x1, 95) x1_cutted = [i for i in x1 if i > x1_5pct and i < x1_95pct] x2_5pct = np.percentile(x2, 5) x2_95pct = np.percentile(x2, 95) x2_cutted = [i for i in x2 if i > x2_5pct and i < x2_95pct]


Enye indlela kukukhuphela ngaphandle imigqaliselo engaphandle koluhlu oluthile . Ibhendi ephantsi ilingana ne-25% yepesenti thabatha isiqingatha esinye soluhlu lwe-interquartile, kwaye ibhendi ephezulu ilingana ne-75% yepesenti kunye nesiqingatha esinye. Apha, siya kulahlekelwa kuphela yi-0.7% yolwazi. Unikezelo lukhangeleka ngakumbi kunolokuqala. Izihlandlo zesampulu zilingana ngakumbi namaxesha okusasazwa.

 import numpy as np low_band_1 = np.percentile(x1, 25) - 1.5 * np.std(x1) high_band_1 = np.percentile(x1, 75) + 1.5 * np.std(x1) x1_cutted = [i for i in x1 if i > low_band_1 and i < high_band_1] low_band_2 = np.percentile(x2, 25) - 1.5 * np.std(x2) high_band_2 = np.percentile(x2, 75) + 1.5 * np.std(x2) x2_cutted = [i for i in x2 if i > low_band_2 and i < high_band_2]

I-Bootstrap

Indlela yesibini esiyiqwalasele apha yi-bootstrap. Kule ndlela, intsingiselo yakhiwe njengentsingiselo yeesampulu. Kumzekelo wethu, intsingiselo kwiqela lolawulo lilingana ne-10.35, kwaye iqela lokuvavanya li-11.78. Iseyisiphumo esingcono xa kuthelekiswa nokusetyenzwa kwedatha eyongezelelweyo.

 import pandas as pd def create_bootstrap_samples( sample_list: np.array, sample_size: int, n_samples: int ): # create a list for sample means sample_means = [] # loop n_samples times for i in range(n_samples): # create a bootstrap sample of sample_size with replacement bootstrap_sample = pd.Series(sample_list).sample(n = sample_size, replace = True) # calculate the bootstrap sample mean sample_mean = bootstrap_sample.mean() # add this sample mean to the sample means list sample_means.append(sample_mean) return pd.Series(sample_means) (create_bootstrap_samples(x1, len(x1), 1000).mean(), create_bootstrap_samples(x2, len(x2), 1000).mean())

Ukuqukumbela

Ukufunyaniswa kwangaphandle kunye nokucubungula kubalulekile ekwenzeni isigqibo esifanelekileyo. Ngoku, ubuncinci iindlela ezintathu ezikhawulezayo nezithe ngqo zinokukunceda ujonge idatha ngaphambi kohlalutyo.


Nangona kunjalo, kubalulekile ukukhumbula ukuba ii-outliers ezichongiweyo zinokuba ngamaxabiso angaqhelekanga kunye nenqaku lempembelelo entsha. Kodwa lelinye ibali :)