paint-brush
Gukoresha Uburyo bwa Stratification Uburyo bwo Gusesengura Ubushakashatsina@nataliaogneva
33,155 gusoma
33,155 gusoma

Gukoresha Uburyo bwa Stratification Uburyo bwo Gusesengura Ubushakashatsi

na Natalia Ogneva8m2024/04/19
Read on Terminal Reader
Read this story w/o Javascript

Birebire cyane; Gusoma

Gutondekanya ibyiciro ni tekinike ikomeye yo kuzamura imikorere yubushakashatsi hamwe na metric sensitivite mu isesengura ryamakuru. Muguhuza abakwumva no kubigabanya nuburemere bwihariye, urashobora guhitamo igerageza, kugabanya ibitandukanye, no kuzamura ibisubizo byizewe.

Company Mentioned

Mention Thumbnail
featured image - Gukoresha Uburyo bwa Stratification Uburyo bwo Gusesengura Ubushakashatsi
Natalia Ogneva HackerNoon profile picture
0-item


Ubushakashatsi ubwo aribwo bwose burimo gucuruza hagati y ibisubizo byihuse no kwiyumvisha ibipimo. Niba ibipimo byatoranijwe ari binini mubijyanye no gutandukana, tugomba gutegereza umwanya muremure kugirango tumenye ibisubizo byubushakashatsi. Reka dusuzume uburyo bumwe bwo gufasha abasesengura kuzamura ubushakashatsi bwabo badatakaje umwanya munini cyangwa sensibilité metric.


Gutegura Ikibazo

Dufate ko dukora igeragezwa risanzwe kugirango tugerageze urwego rushya rwa algorithm, hamwe nuburebure bwamasomo nkibipimo byibanze. Byongeye kandi, tekereza ko abaduteze amatwi bashobora gushyirwa mubice bitatu: ingimbi miriyoni 1, abakoresha miliyoni 2 bafite imyaka 18-45, na miliyoni 3 bakoresha bafite imyaka 45 nayirenga. Igisubizo ku rutonde rushya rwa algorithm rwatandukana cyane muri aya matsinda. Uku gutandukana kwagutse kugabanya sensibilité ya metric.


Muyandi magambo, abaturage barashobora kugabanywamo ibice bitatu, byasobanuwe muri ibi bikurikira:


Reka tuvuge ko buri kintu cyose gifite isaranganya risanzwe. Noneho, ibipimo nyamukuru kubaturage nabyo bifite isaranganya risanzwe.

Uburyo bwo gutondekanya

Turashaka gutandukanya abakoresha bose mubaturage muburyo bwa kera bwo kugerageza tutitaye kubitandukanya abakoresha bacu. Rero, dusuzumye icyitegererezo hamwe nagaciro kateganijwe guteganijwe.


Ubundi buryo ni ugucamo ibice muri buri cyiciro ukurikije uburemere bwurwego mubaturage muri rusange.

Muri iki kibazo, agaciro kateganijwe nibitandukaniro nibi bikurikira.


Agaciro kateganijwe ni nkako mu guhitamo kwambere. Ariko, itandukaniro ni rito, ryemeza ko metric sensibilité yo hejuru.

Noneho, reka dusuzume uburyo bwa Neyman . Basabye kugabanya abakoresha uko bishakiye muri buri cyiciro gifite uburemere bwihariye.

Rero, ibyateganijwe agaciro nibitandukanye bingana nibi bikurikira muriki kibazo.

Agaciro kateganijwe kangana nagaciro kateganijwe murubanza rwa mbere rudasanzwe. Ariko, itandukaniro ni rito cyane.

Kwipimisha

Twerekanye imikorere yubu buryo mubyukuri. Reka twigane ingero kandi tugerageze uburyo bwo gutondeka muburyo bwiza.

Reka dusuzume ibibazo bitatu:

  • ibice byose bifite uburyo bungana kandi butandukanye,
  • ibyiciro byose hamwe nuburyo butandukanye kandi butandukanye,
  • ibyiciro byose bifite uburyo bungana kandi bitandukanye.

Tuzakoresha uburyo butatu muburyo bwose hanyuma dutegure histogramu na boxplot yo kubigereranya.

Gutegura kode

Ubwa mbere, reka dushyireho icyiciro muri Python yigana abaturage bacu muri rusange igizwe nibice bitatu.

 class GeneralPopulation: def __init__(self, means: [float], stds: [float], sizes: [int], random_state: int = 15 ): """ Initializes our General Population and saves the given distributions :param means: List of expectations for normal distributions :param stds: List of standard deviations for normal distributions :param sizes: How many objects will be in each strata :param random_state: Parameter fixing randomness. Needed so that when conducting experiment repeatedly with the same input parameters, the results remained the same """ self.strats = [st.norm(mean, std) for mean, std in zip(means, stds)] self._sample(sizes) self.random_state = random_state def _sample(self, sizes): """Creates a general population sample as a mixture of strata :param sizes: List with sample sizes of the corresponding normal distributions """ self.strats_samples = [rv.rvs(size) for rv, size in zip(self.strats, sizes)] self.general_samples = np.hstack(self.strats_samples) self.N = self.general_samples.shape[0] # number of strata self.count_strats = len(sizes) # ratios for every strata in GP self.ws = [size/self.N for size in sizes] # ME and Std for GP self.m = np.mean(self.general_samples) self.sigma = np.std(self.general_samples) # ME and std for all strata self.ms = [np.mean(strat_sample) for strat_sample in self.strats_samples] self.sigmas = [np.std(strat_sample) for strat_sample in self.strats_samples]


Noneho, reka twongere imikorere kuburyo butatu bwo gutoranya bwasobanuwe mugice cya theoretical.

 def random_subsampling(self, size): """Creates a random subset of the entire population :param sizes: subsample size """ rc = np.random.choice(self.general_samples, size=size) return rc def proportional_subsampling(self, size): """Creates a subsample with the number of elements, proportional shares of strata :param sizes: subsample size """ self.strats_size_proport = [int(np.floor(size*w)) for w in self.ws] rc = [] for k in range(len(self.strats_size_proport)): rc.append(np.random.choice(self.strats_samples[k], size=self.strats_size_proport[k])) return rc def optimal_subsampling(self, size): """Creates a subsample with the optimal number of elements relative to strata :param sizes: subsample size """ sum_denom = 0 for k in range(self.count_strats): sum_denom += self.ws[k] * self.sigmas[k] self.strats_size_optimal = [int(np.floor((size*w*sigma)/sum_denom)) for w, sigma in zip(self.ws, self.sigmas)] if 0 in self.strats_size_optimal: raise ValueError('Strats size is 0, please change variance of smallest strat!') rc = [] for k in range(len(self.strats_size_optimal)): rc.append(np.random.choice(self.strats_samples[k], size=self.strats_size_optimal[k])) return rc


Na none, kubice bifatika, burigihe dukenera imikorere yo kwigana inzira yo kugerageza.

 def run_experiments(self, n_sub, subsampling_method, n_experiments=1000): """Conducts a series of experiments and saves the results :param n_sub: size of sample :param subsampling_method: method for creating a subsample :param n_experiments: number of experiment starts """ means_s = [] if(len(self.general_samples)<100): n_sub = 20 if(subsampling_method == 'random_subsampling'): for n in range(n_experiments): rc = self.random_subsampling(n_sub) mean = rc.sum()/len(rc) means_s.append(mean) else: for n in range(n_experiments): if(subsampling_method == 'proportional_subsampling'): rc = self.proportional_subsampling(n_sub) elif(subsampling_method == 'optimal_subsampling'): rc = self.optimal_subsampling(n_sub) strats_mean = [] for k in range(len(rc)): strats_mean.append(sum(rc[k])/len(rc[k])) # Mean for a mixture means_s.append(sum([w_k*mean_k for w_k, mean_k in zip(self.ws, strats_mean)])) return means_s


Ibisubizo byo kwigana

Niba turebye mubaturage muri rusange, aho ibyiciro byacu byose bifite indangagaciro zitandukanye kandi bitandukanye, ibisubizo byuburyo butatu byitezwe kuba byinshi cyangwa bike.

Uburyo butandukanye nuburyo butandukanye bwabonye ibisubizo bishimishije. Gukoresha ibyiciro bigabanya cyane itandukaniro.

Mugihe gifite uburyo bungana nuburyo butandukanye, tubona kugabanuka muburyo bwa Neyman.

Umwanzuro

Noneho, urashobora gukoresha uburyo bwa stratifike kugirango ugabanye ibipimo bya metricike kandi uzamure igeragezwa niba uhuza abakwumva hanyuma ukabigabanya muburyo butandukanye muri buri cluster hamwe nuburemere bwihariye!