Abahlaziyi bavame ukuhlangana nabangaphandle kudatha phakathi nomsebenzi wabo, njengaphakathi nokuhlaziywa kokuhlolwa kwe-AB, ukudala amamodeli aqagelayo, noma amathrendi okulandelela. Izinqumo ngokuvamile zisekelwe kuncazelo yesampula, ezwela kakhulu kwabangaphandle futhi engashintsha inani ngokuphawulekayo. Ngakho-ke, kubalulekile ukuphatha abangaphandle ukuze wenze isinqumo esifanele.
Ake sicabangele izindlela ezimbalwa ezilula nezisheshayo zokusebenza ngamavelu angajwayelekile.
Cabanga ukuthi udinga ukwenza ukuhlaziya kokuhlolwa usebenzisa inani le-oda elimaphakathi njengemethrikhi eyinhloko. Ake sithi imethrikhi yethu ivamise ukuba nokusabalalisa okuvamile. Futhi, siyazi ukuthi ukusatshalaliswa kwemethrikhi eqenjini lokuhlola kuhlukile kulokho ekulawuleni. Ngamanye amazwi, incazelo yokusabalalisa okulawulwayo ngu-10, futhi ekuhlolweni ngu-12. Ukuchezuka okujwayelekile kuwo womabili amaqembu ngu-3.
Kodwa-ke, womabili amasampula anama-outliers ahlanekezela izindlela zesampula kanye nokuchezuka okujwayelekile kwesampula.
import numpy as np N = 1000 mean_1 = 10 std_1 = 3 mean_2 = 12 std_2 = 3 x1 = np.concatenate((np.random.normal(mean_1, std_1, N), 10 * np.random.random_sample(50) + 20)) x2 = np.concatenate((np.random.normal(mean_2, std_2, N), 4 * np.random.random_sample(50) + 1))
QAPHELA ukuthi ukucabangela imethrikhi kungase kube nezinto eziphuma kuzo zombili izinhlangothi. Uma imethrikhi yakho ingaba nama-outliers ohlangothini olulodwa kuphela, izindlela zingaguqulelwa kalula ngaleyo njongo.
Indlela elula iwukunqamula konke okubonwayo ngaphambi kwephesenti elingu-5% nangemuva kwephesenti elingu-95% . Kulokhu, silahlekelwe u-10% wolwazi njengenkohliso. Kodwa-ke, ukusabalalisa kubukeka kwakheke kakhulu, futhi izikhathi zesampula ziseduze nezikhathi zokusabalalisa.
import numpy as np x1_5pct = np.percentile(x1, 5) x1_95pct = np.percentile(x1, 95) x1_cutted = [i for i in x1 if i > x1_5pct and i < x1_95pct] x2_5pct = np.percentile(x2, 5) x2_95pct = np.percentile(x2, 95) x2_cutted = [i for i in x2 if i > x2_5pct and i < x2_95pct]
Enye indlela iwukuba ungabandakanyi ukubonwa ngaphandle kwebanga elithile . Ibhendi ephansi ilingana namaphesenti angu-25 susa ingxenye eyodwa yobubanzi be-interquartile, futhi ibhendi ephezulu ilingana no-75% wephesenti kanye nengxenye eyodwa. Lapha, sizolahlekelwa kuphela u-0.7% wolwazi. Ukusabalalisa kubukeka kwakheke kakhulu kunokuqala. Izikhathi zesampula zilingana nakakhulu nezikhathi zokusabalalisa.
import numpy as np low_band_1 = np.percentile(x1, 25) - 1.5 * np.std(x1) high_band_1 = np.percentile(x1, 75) + 1.5 * np.std(x1) x1_cutted = [i for i in x1 if i > low_band_1 and i < high_band_1] low_band_2 = np.percentile(x2, 25) - 1.5 * np.std(x2) high_band_2 = np.percentile(x2, 75) + 1.5 * np.std(x2) x2_cutted = [i for i in x2 if i > low_band_2 and i < high_band_2]
Indlela yesibili esiyicabangele lapha i-bootstrap. Kule ndlela, i-mean yakhiwa njengendlela yamasampula amancane. Esibonelweni sethu, isilinganiso eqenjini lokulawula silingana no-10.35, futhi iqembu lokuhlola ngu-11.78. Kusengumphumela ongcono uma kuqhathaniswa nokucubungula idatha eyengeziwe.
import pandas as pd def create_bootstrap_samples( sample_list: np.array, sample_size: int, n_samples: int ): # create a list for sample means sample_means = [] # loop n_samples times for i in range(n_samples): # create a bootstrap sample of sample_size with replacement bootstrap_sample = pd.Series(sample_list).sample(n = sample_size, replace = True) # calculate the bootstrap sample mean sample_mean = bootstrap_sample.mean() # add this sample mean to the sample means list sample_means.append(sample_mean) return pd.Series(sample_means) (create_bootstrap_samples(x1, len(x1), 1000).mean(), create_bootstrap_samples(x2, len(x2), 1000).mean())
Ukutholwa nokucutshungulwa kwangaphandle kubalulekile ekwenzeni isinqumo esifanele. Manje, okungenani izindlela ezintathu ezisheshayo neziqondile zingakusiza ukuthi uhlole idatha ngaphambi kokuhlaziya.
Kodwa-ke, kubalulekile ukukhumbula ukuthi ama-outliers atholiwe angaba amanani angajwayelekile kanye nesici somphumela omusha. Kodwa enye indaba :)