Analis yo souvan rankontre aktyalite nan done pandan travay yo, tankou pandan analiz AB-tès, kreye modèl prediksyon, oswa swiv tandans. Desizyon yo anjeneral baze sou echantiyon an vle di, ki trè sansib a outliers epi yo ka dramatikman chanje valè a. Se konsa, li enpòtan pou jere outliers pou pran desizyon ki kòrèk la.  Ann konsidere plizyè apwòch senp ak rapid pou travay ak valè etranj.  Fòmasyon pwoblèm  Imajine ke ou bezwen fè yon analiz eksperyans lè l sèvi avèk yon valè lòd mwayèn kòm yon metrik prensipal. Ann di ke metrik nou an anjeneral gen yon distribisyon nòmal. Epitou, nou konnen ke distribisyon metrik nan gwoup tès la diferan de sa ki nan kontwòl la. Nan lòt mo, mwayèn distribisyon an nan kontwòl se 10, ak nan tès la se 12. Devyasyon estanda a nan tou de gwoup yo se 3.  Sepandan, tou de echantiyon yo genyen outliers ki twonpe mwayen echantiyon yo ak devyasyon estanda echantiyon an.    import numpy as np N = 1000 mean_1 = 10 std_1 = 3 mean_2 = 12 std_2 = 3 x1 = np.concatenate((np.random.normal(mean_1, std_1, N), 10 * np.random.random_sample(50) + 20)) x2 = np.concatenate((np.random.normal(mean_2, std_2, N), 4 * np.random.random_sample(50) + 1))    ke konsidere metrik ta ka gen outliers nan tou de bò yo. Si metrik ou a ta ka gen outliers sèlman nan yon bò, metòd yo ta ka fasil transfòme pou objektif sa a. NB  Koupe ke  Metòd ki pi fasil la se koupe tout obsèvasyon   ak   . Nan ka sa a, nou pèdi 10% nan enfòmasyon an kòm yon kon. Sepandan, distribisyon yo sanble pi fòme, ak moman echantiyon yo pi pre moman distribisyon yo.  anvan percentile 5% apre percentile 95%   import numpy as np x1_5pct = np.percentile(x1, 5) x1_95pct = np.percentile(x1, 95) x1_cutted = [i for i in x1 if i > x1_5pct and i < x1_95pct] x2_5pct = np.percentile(x2, 5) x2_95pct = np.percentile(x2, 95) x2_cutted = [i for i in x2 if i > x2_5pct and i < x2_95pct]  Yon lòt fason se eskli obsèvasyon   . Gwoup ki ba a egal 25% percentile mwens yon mwatye nan seri entèrkwatil la, ak gwoup segondè a egal 75% percentile plis yon mwatye. Isit la, nou pral pèdi sèlman 0.7% nan enfòmasyon. Distribisyon yo sanble plis fòme pase inisyal la. Moman echantiyon yo menm plis egal ak moman distribisyon yo.  andeyò ranje espesifik la   import numpy as np low_band_1 = np.percentile(x1, 25) - 1.5 * np.std(x1) high_band_1 = np.percentile(x1, 75) + 1.5 * np.std(x1) x1_cutted = [i for i in x1 if i > low_band_1 and i < high_band_1] low_band_2 = np.percentile(x2, 25) - 1.5 * np.std(x2) high_band_2 = np.percentile(x2, 75) + 1.5 * np.std(x2) x2_cutted = [i for i in x2 if i > low_band_2 and i < high_band_2]  Bootstrap  Dezyèm metòd nou konsidere isit la se yon bootstrap. Nan apwòch sa a, se mwayen an konstwi tankou yon mwayen nan sou-echantiyon yo. Nan egzanp nou an, mwayèn nan gwoup kontwòl la egal 10.35, ak gwoup tès la se 11.78. Li se toujou yon pi bon rezilta konpare ak tretman done adisyonèl.   import pandas as pd def create_bootstrap_samples( sample_list: np.array, sample_size: int, n_samples: int ): # create a list for sample means sample_means = [] # loop n_samples times for i in range(n_samples): # create a bootstrap sample of sample_size with replacement bootstrap_sample = pd.Series(sample_list).sample(n = sample_size, replace = True) # calculate the bootstrap sample mean sample_mean = bootstrap_sample.mean() # add this sample mean to the sample means list sample_means.append(sample_mean) return pd.Series(sample_means) (create_bootstrap_samples(x1, len(x1), 1000).mean(), create_bootstrap_samples(x2, len(x2), 1000).mean())  Konklizyon  Deteksyon Outlier ak pwosesis enpòtan pou pran bon desizyon an. Koulye a, omwen twa apwòch rapid ak senp ta ka ede w tcheke done yo anvan analiz.  Sepandan, li esansyèl pou sonje ke detekte outliers ta ka valè etranj ak yon karakteristik pou efè a kado. Men se yon lòt istwa :)

effect

Series

Read My Stories

Odyo sa a pwodui nan lang orijinal istwa a!

Deteksyon Outlier: Sa ou bezwen konnen

About Author

KÒMANtè

KANDYE TAGS

ATIK SA A TE PREZANTE NAN

Related Stories

HN Editor Picks: Top Tech Stories of March 2023

Meet the Writer: HackerNoon's Contributor Konstantin Malkov - Product Manager

Code Refactoring Tips: No. 015 - Remove NULL

Women in Cybersecurity with SheSec Pakistan on The HackerNoon Podcast

HN Editor Picks: Top Tech Stories of March 2023

Meet the Writer: HackerNoon's Contributor Konstantin Malkov - Product Manager

Code Refactoring Tips: No. 015 - Remove NULL

Women in Cybersecurity with SheSec Pakistan on The HackerNoon Podcast

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps