Abasesenguzi bakunze guhura nabasohoka mumibare mugihe cyakazi kabo, nko mugihe cyo gusesengura AB-ibizamini, gukora imiterere iteganijwe, cyangwa gukurikirana inzira. Ubusanzwe ibyemezo bishingiye kubisobanuro bisobanura, byunvikana cyane kubasohoka kandi birashobora guhindura agaciro. Rero, ni ngombwa gucunga hanze kugirango ufate icyemezo cyukuri.
Reka dusuzume uburyo bworoshye kandi bwihuse bwo gukorana nindangagaciro zidasanzwe.
Tekereza ko ukeneye gukora isesengura ryikigereranyo ukoresheje impuzandengo yumutungo nkigipimo cyibanze. Reka tuvuge ko ibipimo byacu mubisanzwe bifite isaranganya risanzwe. Kandi, tuzi ko gukwirakwiza ibipimo mu itsinda ryibizamini bitandukanye nibyo kugenzura. Muyandi magambo, uburyo bwo gukwirakwiza kugenzura ni 10, naho mu kizamini ni 12. Gutandukana bisanzwe mu matsinda yombi ni 3.
Nyamara, ibyitegererezo byombi bifite aho bihurira byerekana uburyo hamwe nicyitegererezo gisanzwe.
import numpy as np N = 1000 mean_1 = 10 std_1 = 3 mean_2 = 12 std_2 = 3 x1 = np.concatenate((np.random.normal(mean_1, std_1, N), 10 * np.random.random_sample(50) + 20)) x2 = np.concatenate((np.random.normal(mean_2, std_2, N), 4 * np.random.random_sample(50) + 1))
NB ko urebye ibipimo bishobora kugira outliers kuva kumpande zombi. Niba ibipimo byawe byashoboraga gusohoka kuruhande rumwe gusa, uburyo bushobora guhinduka byoroshye kubwintego.
Uburyo bworoshye ni uguhagarika ibyo wabonye byose mbere ya 5% kwijana na nyuma ya 95% . Muriki kibazo, twatakaje 10% yamakuru nka con. Ariko, isaranganya risa ninshi ryakozwe, kandi ibihe byintangarugero biri hafi yigihe cyo kugabura.
import numpy as np x1_5pct = np.percentile(x1, 5) x1_95pct = np.percentile(x1, 95) x1_cutted = [i for i in x1 if i > x1_5pct and i < x1_95pct] x2_5pct = np.percentile(x2, 5) x2_95pct = np.percentile(x2, 95) x2_cutted = [i for i in x2 if i > x2_5pct and i < x2_95pct]
Ubundi buryo ni ukureka kwitegereza hanze yurwego rwihariye . Itsinda rito rihwanye na 25% kwijana ukuyemo kimwe cya kabiri cyurwego rwimiterere, naho umurongo muremure uhwanye na 75% kwijana wongeyeho kimwe cya kabiri. Hano, tuzabura 0.7% gusa yamakuru. Isaranganya risa cyane kuruta iyambere. Ibihe byintangarugero birasa cyane no kugabana ibihe.
import numpy as np low_band_1 = np.percentile(x1, 25) - 1.5 * np.std(x1) high_band_1 = np.percentile(x1, 75) + 1.5 * np.std(x1) x1_cutted = [i for i in x1 if i > low_band_1 and i < high_band_1] low_band_2 = np.percentile(x2, 25) - 1.5 * np.std(x2) high_band_2 = np.percentile(x2, 75) + 1.5 * np.std(x2) x2_cutted = [i for i in x2 if i > low_band_2 and i < high_band_2]
Uburyo bwa kabiri twasuzumye hano ni bootstrap. Muri ubu buryo, uburyo bwubatswe nkuburyo bwa subamples. Murugero rwacu, uburyo bwo kugenzura itsinda bungana na 10.35, naho itsinda ryibizamini ni 11.78. Biracyari ibisubizo byiza ugereranije no gutunganya amakuru yinyongera.
import pandas as pd def create_bootstrap_samples( sample_list: np.array, sample_size: int, n_samples: int ): # create a list for sample means sample_means = [] # loop n_samples times for i in range(n_samples): # create a bootstrap sample of sample_size with replacement bootstrap_sample = pd.Series(sample_list).sample(n = sample_size, replace = True) # calculate the bootstrap sample mean sample_mean = bootstrap_sample.mean() # add this sample mean to the sample means list sample_means.append(sample_mean) return pd.Series(sample_means) (create_bootstrap_samples(x1, len(x1), 1000).mean(), create_bootstrap_samples(x2, len(x2), 1000).mean())
Gutahura no gutunganya hanze ni ngombwa mu gufata icyemezo gikwiye. Noneho, byibuze uburyo butatu bwihuse kandi bworoshye bushobora kugufasha kugenzura amakuru mbere yo gusesengura.
Ariko, ni ngombwa kwibuka ko byagaragaye ko hanze bishobora kuba indangagaciro zidasanzwe kandi biranga ingaruka nshya. Ariko niyindi nkuru :)