Nenpòt eksperyans enplike nan yon echanj ant rezilta rapid ak sansiblite metrik. Si metrik yo chwazi a lajè an tèm de divèjans, nou dwe tann yon bon bout tan pou asire rezilta eksperyans lan egzat. Ann konsidere yon metòd pou ede analis yo ranfòse eksperyans yo san yo pa pèdi twòp tan oswa sansiblite metrik.  Fòmasyon pwoblèm  Sipoze nou fè yon eksperyans estanda pou teste yon nouvo algorithm klasman, ak longè sesyon kòm metrik prensipal la. Anplis de sa, konsidere ke odyans nou an ka apeprè kategorize an twa gwoup: 1 milyon adolesan, 2 milyon itilizatè ki gen laj 18-45, ak 3 milyon itilizatè ki gen laj 45 ak pi wo. Repons lan nan yon nouvo algorithm klasman ta varye anpil nan mitan gwoup odyans sa yo. Varyasyon lajè sa a diminye sansiblite metrik la.  Nan lòt mo, popilasyon an ka divize an twa kouch, ki dekri nan sa ki annapre yo:   Ann di ke chak eleman gen yon distribisyon nòmal. Lè sa a, metrik prensipal la pou popilasyon an tou gen yon distribisyon nòmal.    Metòd stratifikasyon  Nou   ki soti nan popilasyon an nan yon konsepsyon eksperyans klasik san yo pa konsidere diferans ki genyen ant itilizatè nou yo. Kidonk, nou konsidere echantiyon an ak valè sa yo espere ak divèjans.  owaza divize tout itilizatè  Yon lòt fason se   strat la nan popilasyon jeneral la.  divize owaza andedan chak strat selon pwa  Nan ka sa a, valè espere ak divèjans yo se sa ki annapre yo.   Valè espere se menm jan ak premye seleksyon an. Sepandan, divèjans la se mwens, ki garanti pi wo sansiblite metrik.  Koulye a, ann konsidere   . Yo sijere divize itilizatè yo owaza andedan chak strat ak pwa espesifik.  metòd Neyman  Se konsa, valè espere ak divèjans yo egal ak sa ki annapre yo nan ka sa a.   Valè espere egal a valè espere nan premye ka a asymptotically. Sepandan, divèjans la se anpil mwens.   Tès anpirik  Nou te pwouve efikasite metòd sa a teyorikman. Ann simulation echantiyon yo epi teste metòd stratifikasyon an anpirik.  Ann konsidere twa ka:  tout strat ak mwayen egal ak divèjans,  tout strat ak mwayen diferan ak divèjans egal,  tout strat ak mwayen egal ak divèjans diferan.  Nou pral aplike tout twa metòd yo nan tout ka yo epi trase yon istogram ak yon trase bwat pou konpare yo.  Preparasyon Kòd  Premyèman, ann kreye yon klas nan Python ki simulation popilasyon jeneral nou an ki gen twa strat.   class GeneralPopulation: def __init__(self, means: [float], stds: [float], sizes: [int], random_state: int = 15 ): """ Initializes our General Population and saves the given distributions :param means: List of expectations for normal distributions :param stds: List of standard deviations for normal distributions :param sizes: How many objects will be in each strata :param random_state: Parameter fixing randomness. Needed so that when conducting experiment repeatedly with the same input parameters, the results remained the same """ self.strats = [st.norm(mean, std) for mean, std in zip(means, stds)] self._sample(sizes) self.random_state = random_state def _sample(self, sizes): """Creates a general population sample as a mixture of strata :param sizes: List with sample sizes of the corresponding normal distributions """ self.strats_samples = [rv.rvs(size) for rv, size in zip(self.strats, sizes)] self.general_samples = np.hstack(self.strats_samples) self.N = self.general_samples.shape[0] # number of strata self.count_strats = len(sizes) # ratios for every strata in GP self.ws = [size/self.N for size in sizes] # ME and Std for GP self.m = np.mean(self.general_samples) self.sigma = np.std(self.general_samples) # ME and std for all strata self.ms = [np.mean(strat_sample) for strat_sample in self.strats_samples] self.sigmas = [np.std(strat_sample) for strat_sample in self.strats_samples]  Lè sa a, ann ajoute fonksyon pou twa metòd echantiyon yo dekri nan pati teyorik la.  def random_subsampling(self, size): """Creates a random subset of the entire population :param sizes: subsample size """ rc = np.random.choice(self.general_samples, size=size) return rc def proportional_subsampling(self, size): """Creates a subsample with the number of elements, proportional shares of strata :param sizes: subsample size """ self.strats_size_proport = [int(np.floor(size*w)) for w in self.ws] rc = [] for k in range(len(self.strats_size_proport)): rc.append(np.random.choice(self.strats_samples[k], size=self.strats_size_proport[k])) return rc def optimal_subsampling(self, size): """Creates a subsample with the optimal number of elements relative to strata :param sizes: subsample size """ sum_denom = 0 for k in range(self.count_strats): sum_denom += self.ws[k] * self.sigmas[k] self.strats_size_optimal = [int(np.floor((size*w*sigma)/sum_denom)) for w, sigma in zip(self.ws, self.sigmas)] if 0 in self.strats_size_optimal: raise ValueError('Strats size is 0, please change variance of smallest strat!') rc = [] for k in range(len(self.strats_size_optimal)): rc.append(np.random.choice(self.strats_samples[k], size=self.strats_size_optimal[k])) return rc  Epitou, pou pati anpirik la, nou toujou bezwen yon fonksyon pou similye pwosesis eksperyans la.  def run_experiments(self, n_sub, subsampling_method, n_experiments=1000): """Conducts a series of experiments and saves the results :param n_sub: size of sample :param subsampling_method: method for creating a subsample :param n_experiments: number of experiment starts """ means_s = [] if(len(self.general_samples)<100): n_sub = 20 if(subsampling_method == 'random_subsampling'): for n in range(n_experiments): rc = self.random_subsampling(n_sub) mean = rc.sum()/len(rc) means_s.append(mean) else: for n in range(n_experiments): if(subsampling_method == 'proportional_subsampling'): rc = self.proportional_subsampling(n_sub) elif(subsampling_method == 'optimal_subsampling'): rc = self.optimal_subsampling(n_sub) strats_mean = [] for k in range(len(rc)): strats_mean.append(sum(rc[k])/len(rc[k])) # Mean for a mixture means_s.append(sum([w_k*mean_k for w_k, mean_k in zip(self.ws, strats_mean)])) return means_s  Rezilta simulation  Si nou gade nan popilasyon jeneral la, kote tout strat nou yo gen menm valè ak divèjans, rezilta yo nan tout twa metòd yo espere yo dwe plis oswa mwens egal.   Diferan vle di ak divèjans egal jwenn rezilta plis enteresan. Sèvi ak stratifikasyon redwi dramatikman divèjans.   Nan ka ki gen mwayen egal ak divèjans diferan, nou wè yon rediksyon divèjans nan metòd Neyman a.    Konklizyon  Koulye a, ou ka aplike metòd stratifikasyon an pou redwi divèjans metrik la epi ranfòse eksperyans la si ou gwoup odyans ou a epi teknikman divize yo owaza andedan chak gwoup ak pwa espesifik!

This story contains new, firsthand information uncovered by the writer.

Empirical

Read My Stories

Odyo sa a pwodui nan lang orijinal istwa a!

Sèvi ak Metòd Estratifikasyon pou analiz Eksperyans lan

About Author

KÒMANtè

KANDYE TAGS

ATIK SA A TE PREZANTE NAN

Related Stories

The Role of Pathos, Logos, and Ethos in Business Storytelling... and John Mulaney

STOP: In the name of the brand police!

Meet the Writer: HackerNoon's Contributor Konstantin Malkov - Product Manager

Meet Fastex: HackerNoon Company of the Week

The Role of Pathos, Logos, and Ethos in Business Storytelling... and John Mulaney

STOP: In the name of the brand police!

Meet the Writer: HackerNoon's Contributor Konstantin Malkov - Product Manager

Meet Fastex: HackerNoon Company of the Week

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps