paint-brush
Outlier Detection: Kuntï yatiñamäkiukata@nataliaogneva
54,579 ullart’awinaka
54,579 ullart’awinaka

Outlier Detection: Kuntï yatiñamäki

ukata Natalia Ogneva4m2024/04/23
Read on Terminal Reader
Read this story w/o Javascript

Sinti jaya pachanakawa; Uñxatt’añataki

Anatirinakax irnaqawipanx datos ukanx outliers ukanakamp jikisipxi. Amtawinakax jilpachax promedio de muestra ukarjam luratawa, ukax outliers ukanakarux wali sensibles ukhamawa. Wali askiw outliers ukanakar apnaqañax chiqap amtäwinak lurañataki. Jan uñt’at valoranakamp irnaqañatakix walja sapuru ukat jank’ak lurañ amtanakat amuyt’añäni.

Companies Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - Outlier Detection: Kuntï yatiñamäki
Natalia Ogneva HackerNoon profile picture

Anatirinakax irnaqawipanx datos ukanx outliers ukanakamp jikisipxi, sañäni, AB-test ukan análisis ukanx modelos predictivos uñstayañanx jan ukax tendencias ukar arknaqañ pachanx. Amtawinakax jilpachax promedio de muestra ukarjam luratawa, ukax wali sensibles ukhamawa outliers ukanakataki ukatx wali jach’a mayjt’ayaspawa valor. Ukhamarusa, wali askiw outliers ukanakar apnaqañax chiqapa amtäwi lurañataki.


Jan uñt’at valoranakamp irnaqañatakix walja sapuru ukat jank’ak lurañ amtanakat amuyt’añäni.

Jan walt’awinaka Formulación

Amuyt’añatakix mä análisis experimento lurañaw wakisi, mä valor promedio de orden ukampiw mä métrica primaria ukhama. Jiwasan métrica ukax jilpachax mä distribución normal ukaniwa sañani. Ukhamaraki, yattanwa distribución métrica ukaxa yant’awi qututxa mayjawa control ukatxa. Mä arunxa, promedio de distribución control uksanxa 10 ukjawa, ukatxa prueba uksanxa 12. Desviación estándar ukaxa pä qututxa 3 ukjawa.


Ukampirusa, panpacha muestras ukanakaxa outliers ukaniwa, ukaxa medios de muestra ukatxa desviación estándar de muestra ukanakampiwa skew.

 import numpy as np N = 1000 mean_1 = 10 std_1 = 3 mean_2 = 12 std_2 = 3 x1 = np.concatenate((np.random.normal(mean_1, std_1, N), 10 * np.random.random_sample(50) + 20)) x2 = np.concatenate((np.random.normal(mean_2, std_2, N), 4 * np.random.random_sample(50) + 1))

NB ukax métrica ukar amuyt’asax panpachan outliers ukanakax utjaspawa. Métrica ukax mä chiqat outliers ukax utjaspa ukhax métodos ukax ukatakix jasakiw mayjt’ayasispa.

Ukax Chuyma ch’ukuñawa

Jasaki lurawixa taqi uñjawinaka 5% percentil nayraqataru ukhamaraki 95% percentil ukjaruwa khuchhuqaña . Ukhamäpanx 10% yatiyawinak mä con ukhamaw chhaqhayasipxta. Ukampirus jaljawinakax juk’amp formados ukhamaw uñstapxi, ukatx muestra momentos ukax distribución momentos ukar jak’achatawa.

 import numpy as np x1_5pct = np.percentile(x1, 5) x1_95pct = np.percentile(x1, 95) x1_cutted = [i for i in x1 if i > x1_5pct and i < x1_95pct] x2_5pct = np.percentile(x2, 5) x2_95pct = np.percentile(x2, 95) x2_cutted = [i for i in x2 if i > x2_5pct and i < x2_95pct]


Yaqha thakhix uñjawinak anqäx markan jan uñt'ayañawa . Jisk’a banda ukaxa 25% percentil ukjamaraki chikata intercuartílico ukjamawa, ukatxa jach’a banda ukaxa 75% percentil ukjamaraki chikata ukjamawa. Aka chiqanx 0,7% yatiyawinakakiw chhaqhayañäni. Jaljawinakax qalltat sipanx juk’amp formados ukhamaw uñstapxi. Sample momentos ukax juk’amp kikipakiw distribución momentos ukanakampi.

 import numpy as np low_band_1 = np.percentile(x1, 25) - 1.5 * np.std(x1) high_band_1 = np.percentile(x1, 75) + 1.5 * np.std(x1) x1_cutted = [i for i in x1 if i > low_band_1 and i < high_band_1] low_band_2 = np.percentile(x2, 25) - 1.5 * np.std(x2) high_band_2 = np.percentile(x2, 75) + 1.5 * np.std(x2) x2_cutted = [i for i in x2 if i > low_band_2 and i < high_band_2]

Ukax mä juk’a pachanakanwa

Payïr thakhix aka chiqan amuyt’awayktan ukax mä bootstrap ukawa. Aka amuyunxa, promedio ukax mä promedio de submuestras ukham luratawa. Jiwasana uñacht awisanxa, promedio grupo control uksanxa 10,35 ukjawa, ukatxa yant awi qutuxa 11,78 ukjawa. Wali suma aski lurawiwa yaqha yatiyawinakampi chikachasiñataki.

 import pandas as pd def create_bootstrap_samples( sample_list: np.array, sample_size: int, n_samples: int ): # create a list for sample means sample_means = [] # loop n_samples times for i in range(n_samples): # create a bootstrap sample of sample_size with replacement bootstrap_sample = pd.Series(sample_list).sample(n = sample_size, replace = True) # calculate the bootstrap sample mean sample_mean = bootstrap_sample.mean() # add this sample mean to the sample means list sample_means.append(sample_mean) return pd.Series(sample_means) (create_bootstrap_samples(x1, len(x1), 1000).mean(), create_bootstrap_samples(x2, len(x2), 1000).mean())

Tukuyawi

Outlier uñt’ayawi ukhamaraki lurawixa wali askipuniwa chiqapa amtawi lurañataki. Jichhax kimsa jank’aki ukat chiqap thakhinakax janïr uñakipt’kasax datos uñakipañatakiw yanapt’iristam.


Ukampirus, amtañax wali askiwa, outliers uñt’atanakax jan uñt’at valores ukhamarak mä característica ukhamawa efecto novedad ukataki. Ukampis yaqha sarnaqäwiwa :)