paint-brush
Kushandisa iyo Stratification Method yeKuedza Ongororoby@nataliaogneva
33,138 kuverenga
33,138 kuverenga

Kushandisa iyo Stratification Method yeKuedza Ongororo

by Natalia Ogneva8m2024/04/19
Read on Terminal Reader
Read this story w/o Javascript

Kurebesa; Kuverenga

Stratified sampling inzira ine simba yekusimudzira kugona kwekuyedza uye metric senits mukuongorora data. Nekuunganidza vateereri vako uye nekuvapatsanura nehuremu chaihwo, unogona kukwidziridza zviedzo, kuderedza musiyano, uye kuwedzera kuvimbika kwemhedzisiro.

Company Mentioned

Mention Thumbnail
featured image - Kushandisa iyo Stratification Method yeKuedza Ongororo
Natalia Ogneva HackerNoon profile picture
0-item


Chero kuyedza kunosanganisira kutengeserana pakati pemhedzisiro nekukurumidza uye metric senits. Kana metric yakasarudzwa yakafara maererano nekusiyana, tinofanira kumirira kwenguva yakareba kuti tive nechokwadi chekuti mhedzisiro yekuyedza ndeyechokwadi. Ngatifungei nzira imwe yekubatsira vanoongorora kuti vawedzere zviedzo zvavo pasina kurasikirwa nenguva yakawandisa kana metric senitivity.


Dambudziko Kugadzira

Ngatitii tinoita chiyedzo chakajairwa kuyedza algorithm nyowani, nehurefu hwesesheni seyekutanga metric. Pamusoro pezvo, funga kuti vateereri vedu vanogona kuiswa mumapoka matatu: miriyoni vechidiki, 2 miriyoni vashandisi vane makore 18-45, uye 3 miriyoni vashandisi vane makore makumi mana nemashanu zvichikwira. Mhinduro kune itsva chinzvimbo algorithm yaizosiyana zvakanyanya pakati pemapoka evateereri aya. Uku kusiyanisa kwakakura kunoderedza kunzwisiswa kwemetric.


Mune mamwe mazwi, huwandu hwevanhu hunogona kukamurwa kuita matatu strata, inotsanangurwa mune zvinotevera:


Ngatitii chikamu chega chega chine kugoverwa kwakajairika. Zvadaro, iyo huru metric yehuwandu zvakare ine yakajairwa kugovera.

Stratification nzira

Isu tinogovanisa vashandisi vese kubva muhuwandu mune yekirasi yekuyedza dhizaini tisingatarise mutsauko pakati pevashandisi vedu. Nokudaro, isu tinotarisa muenzaniso neinotevera inotarisirwa kukosha uye kusiyana.


Imwe nzira ndeyekukamura zvisina tsarukano mukati memutsetse wega wega zvichienderana nehuremu hweiyo strat muhuwandu hwevanhu.

Muchiitiko ichi, kukosha kunotarisirwa uye kusiyana ndekunotevera.


Kukosha kunotarisirwa kwakafanana nekutanga kusarudzwa. Nekudaro, iyo musiyano ishoma, iyo inovimbisa yakakwirira metric senitivity.

Zvino, ngatitarisei nzira yaNeyman . Ivo vanokurudzira kupatsanura vashandisi zvisina tsarukano mukati mese strat ine chaiwo uremu.

Saka, iyo inotarisirwa kukosha uye kusiyana kwakaenzana neanotevera munyaya iyi.

Iko kukosha kunotarisirwa kuenzana nemutengo unotarisirwa mune yekutanga kesi asymptotically. Zvisinei, kusiyana kwacho kuduku zvikuru.

Empirical Testing

Isu takaratidza kugona kwenzira iyi nedzidziso. Ngatitevedzerei masampuli uye tiedze iyo stratification nzira empirically.

Ngatitarisei nyaya nhatu:

  • mitsara yose ine nzira dzakaenzana uye zvakasiyana,
  • mitsara yese ine nzira dzakasiyana uye misiyano yakaenzana,
  • mitsara yose ine nzira dzakaenzana uye zvakasiyana-siyana.

Isu tichashandisa ese matatu nzira muzviitiko zvese uye kuronga histogram uye boxplot kuti tizvienzanise.

Kugadzirira kwekodhi

Chekutanga, ngatigadzire kirasi muPython inoteedzera huwandu hwedu hwese hunosanganisira matatu matatu.

 class GeneralPopulation: def __init__(self, means: [float], stds: [float], sizes: [int], random_state: int = 15 ): """ Initializes our General Population and saves the given distributions :param means: List of expectations for normal distributions :param stds: List of standard deviations for normal distributions :param sizes: How many objects will be in each strata :param random_state: Parameter fixing randomness. Needed so that when conducting experiment repeatedly with the same input parameters, the results remained the same """ self.strats = [st.norm(mean, std) for mean, std in zip(means, stds)] self._sample(sizes) self.random_state = random_state def _sample(self, sizes): """Creates a general population sample as a mixture of strata :param sizes: List with sample sizes of the corresponding normal distributions """ self.strats_samples = [rv.rvs(size) for rv, size in zip(self.strats, sizes)] self.general_samples = np.hstack(self.strats_samples) self.N = self.general_samples.shape[0] # number of strata self.count_strats = len(sizes) # ratios for every strata in GP self.ws = [size/self.N for size in sizes] # ME and Std for GP self.m = np.mean(self.general_samples) self.sigma = np.std(self.general_samples) # ME and std for all strata self.ms = [np.mean(strat_sample) for strat_sample in self.strats_samples] self.sigmas = [np.std(strat_sample) for strat_sample in self.strats_samples]


Zvadaro, ngatiwedzerei mabasa emhando nhatu dzesampling dzinotsanangurwa muchikamu chedzidziso.

 def random_subsampling(self, size): """Creates a random subset of the entire population :param sizes: subsample size """ rc = np.random.choice(self.general_samples, size=size) return rc def proportional_subsampling(self, size): """Creates a subsample with the number of elements, proportional shares of strata :param sizes: subsample size """ self.strats_size_proport = [int(np.floor(size*w)) for w in self.ws] rc = [] for k in range(len(self.strats_size_proport)): rc.append(np.random.choice(self.strats_samples[k], size=self.strats_size_proport[k])) return rc def optimal_subsampling(self, size): """Creates a subsample with the optimal number of elements relative to strata :param sizes: subsample size """ sum_denom = 0 for k in range(self.count_strats): sum_denom += self.ws[k] * self.sigmas[k] self.strats_size_optimal = [int(np.floor((size*w*sigma)/sum_denom)) for w, sigma in zip(self.ws, self.sigmas)] if 0 in self.strats_size_optimal: raise ValueError('Strats size is 0, please change variance of smallest strat!') rc = [] for k in range(len(self.strats_size_optimal)): rc.append(np.random.choice(self.strats_samples[k], size=self.strats_size_optimal[k])) return rc


Zvakare, kune empirical chikamu, isu tinogara tichida basa rekutevedzera maitiro ekuyedza.

 def run_experiments(self, n_sub, subsampling_method, n_experiments=1000): """Conducts a series of experiments and saves the results :param n_sub: size of sample :param subsampling_method: method for creating a subsample :param n_experiments: number of experiment starts """ means_s = [] if(len(self.general_samples)<100): n_sub = 20 if(subsampling_method == 'random_subsampling'): for n in range(n_experiments): rc = self.random_subsampling(n_sub) mean = rc.sum()/len(rc) means_s.append(mean) else: for n in range(n_experiments): if(subsampling_method == 'proportional_subsampling'): rc = self.proportional_subsampling(n_sub) elif(subsampling_method == 'optimal_subsampling'): rc = self.optimal_subsampling(n_sub) strats_mean = [] for k in range(len(rc)): strats_mean.append(sum(rc[k])/len(rc[k])) # Mean for a mixture means_s.append(sum([w_k*mean_k for w_k, mean_k in zip(self.ws, strats_mean)])) return means_s


Simulation results

Kana tikatarisa huwandu hwevanhu, uko matinji edu ese ane maitiro akafanana uye akasiyana, mhedzisiro yenzira nhatu dzese dzinotarisirwa kunge dzakaenzana kana kushoma.

Nzira dzakasiyana uye misiyano yakaenzana yakawana mhedzisiro inonakidza. Kushandisa stratification kunoderedza zvakanyanya kusiyana.

Muzviitiko zvine nzira dzakaenzana uye mutsauko wakasiyana, tinoona kuderedzwa kwekusiyana munzira yaNeyman.

Mhedziso

Ikozvino, iwe unogona kushandisa iyo stratification nzira yekudzikisa iyo metric musiyano uye kuwedzera kuyedza kana iwe ukaunganidza vateereri vako uye nehunyanzvi ugovapatsanura zvisina tsarukano mukati mesumbu rega rega rine huremu chaihwo!