paint-brush
Kosalela Méthode ya Stratification pona Analyse ya Expériencepene@nataliaogneva
33,155 botángi
33,155 botángi

Kosalela Méthode ya Stratification pona Analyse ya Expérience

pene Natalia Ogneva8m2024/04/19
Read on Terminal Reader
Read this story w/o Javascript

Molai mingi; Mpo na kotánga

Échantillonnage stratifié ezali technique ya makasi pona kotombola efficacité ya expérience pe sensibilité métrique na analyse ya ba données. Na kosangisaka bayoki na yo mpe kokabola bango na kilo ya sikisiki, okoki kosala ete ba expériences ezala malamu, kokitisa variance, mpe kotombola fidélité ya ba résultats.

Company Mentioned

Mention Thumbnail
featured image - Kosalela Méthode ya Stratification pona Analyse ya Expérience
Natalia Ogneva HackerNoon profile picture
0-item


Expérience nionso esangisi trade-off entre ba résultats ya mbangu na sensibilité métrique. Soki métrique oyo eponami ezali monene na oyo etali variance, esengeli tozela tango molayi mpo na kosala ete ba résultats ya expérience ezala ya sikisiki. Totalela lolenge moko ya kosalisa ba analystes ba booster ba expériences na bango sans ko perdre temps mingi to sensibilité métrique.


Formulation ya ba problèmes

Kanisa ete tosali expérience standard mpo na komeka algorithme ya sika ya classement, na longueur ya session lokola métrique primaire. Longola yango, kanisá ete bato oyo bazali koyoka biso bakoki kokabolama pene na bituluku misato: bilenge milio moko, basaleli milio 2 oyo bazali na mbula 18-45, mpe basaleli milio 3 oyo bazali na mbula 45 mpe koleka. Eyano na algorithme ya sika ya classement ekokesana mingi kati na bituluku wana ya bayoki. Bokeseni monene oyo ekitisaka sensibilité ya métrique.


Na maloba mosusu, motango ya bato ekoki kokabolama na ba strates misato, oyo elimbolami na oyo elandi :


Toloba que composante nionso ezalaka na distribution normale. Sima, métrique principale pona population ezali pe na distribution normale.

Méthode ya stratification

Tokabolaka na ndenge ya pwasa basaleli banso uta na population na design ya expérience classique sans ko considérer ba différences entre ba usagers na biso. Na yango, totaleli échantillon na valeur mpe variance oyo ezelamaki.


Lolenge mosusu ezali kokabola na ndenge ya pwasa na kati ya strat nionso selon kilo ya strat na population en général.

Na likambo oyo, motuya oyo ezelamaki mpe bokeseni ezali oyo elandi.


Motuya oyo ezelamaki ezali ndenge moko na oyo ezali na boponi ya liboso. Kasi, variance ezali moke, oyo ezali ko garantir sensibilité métrique ya likolo.

Sikawa, tótalela mayele ya Neyman . Bazali kopesa likanisi ya kokabola basaleli na ndenge ya pwasa na kati ya strat nyonso oyo ezali na kilo ya sikisiki.

Donc, valeur prévue na variance ekokani na oyo elandi na cas oyo.

Valeur oyo ezelamaki ekokani na valeur oyo ezelamaki na cas ya liboso na ndenge ya asymptotique. Kasi, bokeseni yango ezali moke mpenza.

Momekano ya empirique

To prouvé efficacité ya méthode oyo na théorique. To simuler ba échantillons pe to tester méthode ya stratification na ndenge ya empirique.

Tótalela makambo misato:

  • ba strats nionso na ba moyennes na ba variances ekokani, .
  • ba strats nionso oyo ezali na ba moyennes différentes na ba variances égales, .
  • ba strats nionso na ba moyens égales na ba variances différentes.

Tokosalela ba méthodes nionso misato na ba cas nionso pe tokosala histogramme pe boxplot pona kokokanisa yango.

Bobongisi ya code

Ya liboso, tosala classe na Python oyo e simuler population générale na biso oyo ezali na ba strats misatu.

 class GeneralPopulation: def __init__(self, means: [float], stds: [float], sizes: [int], random_state: int = 15 ): """ Initializes our General Population and saves the given distributions :param means: List of expectations for normal distributions :param stds: List of standard deviations for normal distributions :param sizes: How many objects will be in each strata :param random_state: Parameter fixing randomness. Needed so that when conducting experiment repeatedly with the same input parameters, the results remained the same """ self.strats = [st.norm(mean, std) for mean, std in zip(means, stds)] self._sample(sizes) self.random_state = random_state def _sample(self, sizes): """Creates a general population sample as a mixture of strata :param sizes: List with sample sizes of the corresponding normal distributions """ self.strats_samples = [rv.rvs(size) for rv, size in zip(self.strats, sizes)] self.general_samples = np.hstack(self.strats_samples) self.N = self.general_samples.shape[0] # number of strata self.count_strats = len(sizes) # ratios for every strata in GP self.ws = [size/self.N for size in sizes] # ME and Std for GP self.m = np.mean(self.general_samples) self.sigma = np.std(self.general_samples) # ME and std for all strata self.ms = [np.mean(strat_sample) for strat_sample in self.strats_samples] self.sigmas = [np.std(strat_sample) for strat_sample in self.strats_samples]


Na sima, tobakisa ba fonctions pona ba méthodes misato ya échantillonnage oyo elimbolami na eteni ya théorique.

 def random_subsampling(self, size): """Creates a random subset of the entire population :param sizes: subsample size """ rc = np.random.choice(self.general_samples, size=size) return rc def proportional_subsampling(self, size): """Creates a subsample with the number of elements, proportional shares of strata :param sizes: subsample size """ self.strats_size_proport = [int(np.floor(size*w)) for w in self.ws] rc = [] for k in range(len(self.strats_size_proport)): rc.append(np.random.choice(self.strats_samples[k], size=self.strats_size_proport[k])) return rc def optimal_subsampling(self, size): """Creates a subsample with the optimal number of elements relative to strata :param sizes: subsample size """ sum_denom = 0 for k in range(self.count_strats): sum_denom += self.ws[k] * self.sigmas[k] self.strats_size_optimal = [int(np.floor((size*w*sigma)/sum_denom)) for w, sigma in zip(self.ws, self.sigmas)] if 0 in self.strats_size_optimal: raise ValueError('Strats size is 0, please change variance of smallest strat!') rc = [] for k in range(len(self.strats_size_optimal)): rc.append(np.random.choice(self.strats_samples[k], size=self.strats_size_optimal[k])) return rc


Lisusu, mpo na eteni ya empirique, tozali ntango nyonso na mposa ya fonction mpo na ko simuler processus ya expérience.

 def run_experiments(self, n_sub, subsampling_method, n_experiments=1000): """Conducts a series of experiments and saves the results :param n_sub: size of sample :param subsampling_method: method for creating a subsample :param n_experiments: number of experiment starts """ means_s = [] if(len(self.general_samples)<100): n_sub = 20 if(subsampling_method == 'random_subsampling'): for n in range(n_experiments): rc = self.random_subsampling(n_sub) mean = rc.sum()/len(rc) means_s.append(mean) else: for n in range(n_experiments): if(subsampling_method == 'proportional_subsampling'): rc = self.proportional_subsampling(n_sub) elif(subsampling_method == 'optimal_subsampling'): rc = self.optimal_subsampling(n_sub) strats_mean = [] for k in range(len(rc)): strats_mean.append(sum(rc[k])/len(rc[k])) # Mean for a mixture means_s.append(sum([w_k*mean_k for w_k, mean_k in zip(self.ws, strats_mean)])) return means_s


Ba résultats ya simulation

Soki totali population générale, esika ba strats na biso nionso ezali na ba valeurs pe ba variances ndenge moko, ba résultats ya ba méthodes nionso misato esengelaki ezala plus ou moins égales.

Ba moyens différents mpe ba variances égales ezuaki ba résultats ya kosepelisa mingi. Kosalela stratification ekitisaka mingi variance.

Na ba cas oyo ezali na ba moyennes égales mpe ba variances différentes, tomoni réduction ya variance na méthode ya Neyman.

Maloba ya nsuka

Sikoyo, okoki kosalela méthode ya stratification pona ko réduire variance métrique pe ko booster expérience soki o cluster audience na yo pe techniquement okabola bango au hasard na kati ya cluster moko na moko na ba poids spécifiques!