paint-brush
Isticmaalka Habka Stratification ee Falanqaynta Tijaabadaby@nataliaogneva
33,172 akhrin
33,172 akhrin

Isticmaalka Habka Stratification ee Falanqaynta Tijaabada

by Natalia Ogneva8m2024/04/19
Read on Terminal Reader
Read this story w/o Javascript

Aad u dheer; In la akhriyo

Muunad la habeeyey waa farsamo awood leh si kor loogu qaado waxtarka tijaabada iyo dareenka cabbirka ee falanqaynta xogta. Adoo ururinaya dhagaystayaashaada oo aad u qaybin karto miisaan gaar ah, waxaad wanaajin kartaa tijaabooyinka, yaraynta kala duwanaanshiyaha, waxaadna xoojisaa isku halaynta natiijada.

Company Mentioned

Mention Thumbnail
featured image - Isticmaalka Habka Stratification ee Falanqaynta Tijaabada
Natalia Ogneva HackerNoon profile picture
0-item


Tijaabo kasta waxay ku lug leedahay is dhaafsi u dhexeeya natiijooyinka degdega ah iyo dareenka cabbirka. Haddii mitirka la doortay uu ballaaran yahay marka loo eego kala duwanaanshiyaha, waa inaan sugno waqti dheer si aan u hubinno in natiijooyinka tijaabada ay yihiin kuwo sax ah. Aynu ka fiirsanno hal hab oo lagu caawinayo falanqeeyayaasha inay kordhiyaan tijaabadooda iyaga oo aan lumin waqti badan ama xasaasiga cabbirka.


Samaynta dhibaatada

Ka soo qaad in aanu samayno tijaabo caadi ah si aanu u tijaabino qiimaynta algorithm cusub, oo leh dhererka fadhiga sida mitirka aasaasiga ah. Intaa waxaa dheer, tixgeli in dhagaystayaasheenna loo qaybin karo qiyaas ahaan saddex kooxood: 1 milyan oo dhalinyaro ah, 2 milyan isticmaalayaasha da'doodu tahay 18-45, iyo 3 milyan isticmaalayaasha da'doodu tahay 45 iyo wixii ka sareeya. Jawaabta algorithm darajada cusub ayaa si weyn ugu kala duwanaan doonta kooxahan daawadayaasha ah. Kala duwanaanshiyahan ballaaran wuxuu yareynayaa dareenka cabbirka.


Si kale haddii loo dhigo, dadku waxa loo qaybin karaa saddex qaybood, oo lagu tilmaamay kuwan soo socda:


Aynu sheegno in qayb kastaa ay leedahay qaybinta caadiga ah. Kadib, mitirka ugu muhiimsan ee dadku sidoo kale wuxuu leeyahay qaybinta caadiga ah.

Habka Stratification

Waxaan si aan kala sooc lahayn u kala qaybinay dhammaan isticmaalayaasha dadweynaha qaab tijaabo ah iyadoon la tixgalinin faraqa u dhexeeya isticmaalayaashayada. Markaa, waxaanu tixgelinaynaa muunada leh qiimaha iyo kala duwanaanshahan la filayo.


Si kale ayaa ah in si aan kala sooc lahayn loogu qaybiyo qayb kasta iyada oo loo eegayo miisaanka strat ee dadweynaha guud.

Xaaladdan oo kale, qiimaha la filayo iyo kala duwanaanshiyaha ayaa ah kuwan soo socda.


Qiimaha la filayo wuxuu la mid yahay xulashada koowaad. Si kastaba ha ahaatee, kala duwanaanshuhu wuu yar yahay, kaas oo dammaanad qaadaya dareenka cabbirka sare.

Hadda, aan tixgelinno habka Neyman . Waxay soo jeedinayaan in si aan kala sooc lahayn loogu qaybiyo isticmaaleyaasha gudaha strat kasta oo leh miisaan gaar ah.

Markaa, qiimaha la filayo iyo kala duwanaanshiyaha ayaa la mid ah kuwan soo socda kiiskan.

Qiimaha la filayo wuxuu la mid yahay qiimaha la filayo kiiska kowaad si aan asymptotic ahayn. Si kastaba ha ahaatee, farqiga ayaa aad u yar.

Tijaabada Dhabta ah

Waxaan caddaynay waxtarka habkan aragti ahaan. Aynu isku dayno shaybaarrada oo aynu tijaabino habka shaandhaynta si macquul ah.

Aynu tixgelinno saddex xaaladood:

  • dhammaan strats oo leh macne iyo kala duwanaansho siman,
  • dhammaan xargaha oo leh habab kala duwan iyo kala duwanaansho siman,
  • dhammaan strats oo leh macne siman iyo kala duwanaansho kala duwan.

Waxaan ku dabaqi doonaa dhammaan seddexda qaab kiisaska oo dhan waxaanan diyaarin doonaa histogram iyo sanduuq si aan isku barbar dhigno.

diyaarinta code

Marka hore, aan ku abuurno Python fasal u dhigma dadkeenna guud oo ka kooban saddex dabaq.

 class GeneralPopulation: def __init__(self, means: [float], stds: [float], sizes: [int], random_state: int = 15 ): """ Initializes our General Population and saves the given distributions :param means: List of expectations for normal distributions :param stds: List of standard deviations for normal distributions :param sizes: How many objects will be in each strata :param random_state: Parameter fixing randomness. Needed so that when conducting experiment repeatedly with the same input parameters, the results remained the same """ self.strats = [st.norm(mean, std) for mean, std in zip(means, stds)] self._sample(sizes) self.random_state = random_state def _sample(self, sizes): """Creates a general population sample as a mixture of strata :param sizes: List with sample sizes of the corresponding normal distributions """ self.strats_samples = [rv.rvs(size) for rv, size in zip(self.strats, sizes)] self.general_samples = np.hstack(self.strats_samples) self.N = self.general_samples.shape[0] # number of strata self.count_strats = len(sizes) # ratios for every strata in GP self.ws = [size/self.N for size in sizes] # ME and Std for GP self.m = np.mean(self.general_samples) self.sigma = np.std(self.general_samples) # ME and std for all strata self.ms = [np.mean(strat_sample) for strat_sample in self.strats_samples] self.sigmas = [np.std(strat_sample) for strat_sample in self.strats_samples]


Dabadeed, aynu ku darno hawlaha saddexda hab ee muunad ee lagu tilmaamay qaybta aragtida.

 def random_subsampling(self, size): """Creates a random subset of the entire population :param sizes: subsample size """ rc = np.random.choice(self.general_samples, size=size) return rc def proportional_subsampling(self, size): """Creates a subsample with the number of elements, proportional shares of strata :param sizes: subsample size """ self.strats_size_proport = [int(np.floor(size*w)) for w in self.ws] rc = [] for k in range(len(self.strats_size_proport)): rc.append(np.random.choice(self.strats_samples[k], size=self.strats_size_proport[k])) return rc def optimal_subsampling(self, size): """Creates a subsample with the optimal number of elements relative to strata :param sizes: subsample size """ sum_denom = 0 for k in range(self.count_strats): sum_denom += self.ws[k] * self.sigmas[k] self.strats_size_optimal = [int(np.floor((size*w*sigma)/sum_denom)) for w, sigma in zip(self.ws, self.sigmas)] if 0 in self.strats_size_optimal: raise ValueError('Strats size is 0, please change variance of smallest strat!') rc = [] for k in range(len(self.strats_size_optimal)): rc.append(np.random.choice(self.strats_samples[k], size=self.strats_size_optimal[k])) return rc


Sidoo kale, qaybta la taaban karo, waxaan had iyo jeer u baahanahay shaqo si aan u ekayno habka tijaabada.

 def run_experiments(self, n_sub, subsampling_method, n_experiments=1000): """Conducts a series of experiments and saves the results :param n_sub: size of sample :param subsampling_method: method for creating a subsample :param n_experiments: number of experiment starts """ means_s = [] if(len(self.general_samples)<100): n_sub = 20 if(subsampling_method == 'random_subsampling'): for n in range(n_experiments): rc = self.random_subsampling(n_sub) mean = rc.sum()/len(rc) means_s.append(mean) else: for n in range(n_experiments): if(subsampling_method == 'proportional_subsampling'): rc = self.proportional_subsampling(n_sub) elif(subsampling_method == 'optimal_subsampling'): rc = self.optimal_subsampling(n_sub) strats_mean = [] for k in range(len(rc)): strats_mean.append(sum(rc[k])/len(rc[k])) # Mean for a mixture means_s.append(sum([w_k*mean_k for w_k, mean_k in zip(self.ws, strats_mean)])) return means_s


Natiijooyinka jilitaanka

Haddii aan eegno dadweynaha guud, halkaas oo dhammaan stratyadayadu ay leeyihiin isku qiimo iyo kala duwanaansho, natiijooyinka dhammaan saddexda hab ayaa la filayaa inay noqdaan kuwo siman ama ka yar.

Siyaalo kala duwan iyo kala duwanaansho siman ayaa helay natiijooyin aad u xiiso badan. Isticmaalka stratification waxay si weyn u yaraynaysaa kala duwanaanshaha.

Kiisaska leh siyaabo siman iyo kala duwanaansho kala duwan, waxaan aragnaa hoos u dhac ku yimaada habka Neyman.

Gabagabo

Hadda, waxaad codsan kartaa habka stratification si aad u yarayso kala duwanaanshiyaha mitirka oo aad u xoojiso tijaabada haddii aad ururiso dhagaystayaashaada oo aad si farsamaysan ugu qaybiso si aan kala sooc lahayn gudaha koox kasta oo leh miisaan gaar ah!