paint-brush
Ukutholwa Kwangaphandle: Okufanele Ukwazinge@nataliaogneva
54,594 ukufundwa
54,594 ukufundwa

Ukutholwa Kwangaphandle: Okufanele Ukwazi

nge Natalia Ogneva4m2024/04/23
Read on Terminal Reader
Read this story w/o Javascript

Kude kakhulu; Uzofunda

Abahlaziyi bavame ukuhlangana nabangaphandle kudatha phakathi nomsebenzi wabo. Izinqumo ngokuvamile zisekelwe kuncazelo yesampula, ezwela kakhulu kwabangaphandle. Kubalulekile ukuphatha abangaphandle ukuze wenze isinqumo esifanele. Ake sicabangele izindlela ezimbalwa ezilula nezisheshayo zokusebenza ngamavelu angajwayelekile.

Companies Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - Ukutholwa Kwangaphandle: Okufanele Ukwazi
Natalia Ogneva HackerNoon profile picture

Abahlaziyi bavame ukuhlangana nabangaphandle kudatha phakathi nomsebenzi wabo, njengaphakathi nokuhlaziywa kokuhlolwa kwe-AB, ukudala amamodeli aqagelayo, noma amathrendi okulandelela. Izinqumo ngokuvamile zisekelwe kuncazelo yesampula, ezwela kakhulu kwabangaphandle futhi engashintsha inani ngokuphawulekayo. Ngakho-ke, kubalulekile ukuphatha abangaphandle ukuze wenze isinqumo esifanele.


Ake sicabangele izindlela ezimbalwa ezilula nezisheshayo zokusebenza ngamavelu angajwayelekile.

Ukwakhiwa Kwenkinga

Cabanga ukuthi udinga ukwenza ukuhlaziya kokuhlolwa usebenzisa inani le-oda elimaphakathi njengemethrikhi eyinhloko. Ake sithi imethrikhi yethu ivamise ukuba nokusabalalisa okuvamile. Futhi, siyazi ukuthi ukusatshalaliswa kwemethrikhi eqenjini lokuhlola kuhlukile kulokho ekulawuleni. Ngamanye amazwi, incazelo yokusabalalisa okulawulwayo ngu-10, futhi ekuhlolweni ngu-12. Ukuchezuka okujwayelekile kuwo womabili amaqembu ngu-3.


Kodwa-ke, womabili amasampula anama-outliers ahlanekezela izindlela zesampula kanye nokuchezuka okujwayelekile kwesampula.

 import numpy as np N = 1000 mean_1 = 10 std_1 = 3 mean_2 = 12 std_2 = 3 x1 = np.concatenate((np.random.normal(mean_1, std_1, N), 10 * np.random.random_sample(50) + 20)) x2 = np.concatenate((np.random.normal(mean_2, std_2, N), 4 * np.random.random_sample(50) + 1))

QAPHELA ukuthi ukucabangela imethrikhi kungase kube nezinto eziphuma kuzo zombili izinhlangothi. Uma imethrikhi yakho ingaba nama-outliers ohlangothini olulodwa kuphela, izindlela zingaguqulelwa kalula ngaleyo njongo.

Sika Imisila

Indlela elula iwukunqamula konke okubonwayo ngaphambi kwephesenti elingu-5% nangemuva kwephesenti elingu-95% . Kulokhu, silahlekelwe u-10% wolwazi njengenkohliso. Kodwa-ke, ukusabalalisa kubukeka kwakheke kakhulu, futhi izikhathi zesampula ziseduze nezikhathi zokusabalalisa.

 import numpy as np x1_5pct = np.percentile(x1, 5) x1_95pct = np.percentile(x1, 95) x1_cutted = [i for i in x1 if i > x1_5pct and i < x1_95pct] x2_5pct = np.percentile(x2, 5) x2_95pct = np.percentile(x2, 95) x2_cutted = [i for i in x2 if i > x2_5pct and i < x2_95pct]


Enye indlela iwukuba ungabandakanyi ukubonwa ngaphandle kwebanga elithile . Ibhendi ephansi ilingana namaphesenti angu-25 susa ingxenye eyodwa yobubanzi be-interquartile, futhi ibhendi ephezulu ilingana no-75% wephesenti kanye nengxenye eyodwa. Lapha, sizolahlekelwa kuphela u-0.7% wolwazi. Ukusabalalisa kubukeka kwakheke kakhulu kunokuqala. Izikhathi zesampula zilingana nakakhulu nezikhathi zokusabalalisa.

 import numpy as np low_band_1 = np.percentile(x1, 25) - 1.5 * np.std(x1) high_band_1 = np.percentile(x1, 75) + 1.5 * np.std(x1) x1_cutted = [i for i in x1 if i > low_band_1 and i < high_band_1] low_band_2 = np.percentile(x2, 25) - 1.5 * np.std(x2) high_band_2 = np.percentile(x2, 75) + 1.5 * np.std(x2) x2_cutted = [i for i in x2 if i > low_band_2 and i < high_band_2]

I-Bootstrap

Indlela yesibili esiyicabangele lapha i-bootstrap. Kule ndlela, i-mean yakhiwa njengendlela yamasampula amancane. Esibonelweni sethu, isilinganiso eqenjini lokulawula silingana no-10.35, futhi iqembu lokuhlola ngu-11.78. Kusengumphumela ongcono uma kuqhathaniswa nokucubungula idatha eyengeziwe.

 import pandas as pd def create_bootstrap_samples( sample_list: np.array, sample_size: int, n_samples: int ): # create a list for sample means sample_means = [] # loop n_samples times for i in range(n_samples): # create a bootstrap sample of sample_size with replacement bootstrap_sample = pd.Series(sample_list).sample(n = sample_size, replace = True) # calculate the bootstrap sample mean sample_mean = bootstrap_sample.mean() # add this sample mean to the sample means list sample_means.append(sample_mean) return pd.Series(sample_means) (create_bootstrap_samples(x1, len(x1), 1000).mean(), create_bootstrap_samples(x2, len(x2), 1000).mean())

Isiphetho

Ukutholwa nokucutshungulwa kwangaphandle kubalulekile ekwenzeni isinqumo esifanele. Manje, okungenani izindlela ezintathu ezisheshayo neziqondile zingakusiza ukuthi uhlole idatha ngaphambi kokuhlaziya.


Kodwa-ke, kubalulekile ukukhumbula ukuthi ama-outliers atholiwe angaba amanani angajwayelekile kanye nesici somphumela omusha. Kodwa enye indaba :)