paint-brush
Go šomiša Mokgwa wa Stratification bakeng sa Tshekatsheko ya Tekoka@nataliaogneva
33,166 dipuku tša go balwa
33,166 dipuku tša go balwa

Go šomiša Mokgwa wa Stratification bakeng sa Tshekatsheko ya Teko

ka Natalia Ogneva8m2024/04/19
Read on Terminal Reader
Read this story w/o Javascript

Nako e telele kudu; Go bala

Stratified sampling ke thekniki ye maatla ya go godiša bokgoni bja teko le go kwagala ga metric ka tshekatshekong ya datha. Ka go kgoboketša batheetši ba gago ka dihlopha le go ba arola ka dikelo tše itšego, o ka dira gore diteko di šome gabotse, wa fokotša phapano le go godiša go botega ga dipoelo.

Company Mentioned

Mention Thumbnail
featured image - Go šomiša Mokgwa wa Stratification bakeng sa Tshekatsheko ya Teko
Natalia Ogneva HackerNoon profile picture
0-item


Teko efe goba efe e akaretša kgwebišano magareng ga dipoelo tša ka pela le go kwa ga metric. Ge e le gore metric ye e kgethilwego e nabile go ya ka phapano, re swanetše go leta nako ye telele go netefatša gore dipoelo tša teko di nepagetše. A re ke re ela hloko mokgwa o tee wa go thuša basekaseki go godiša diteko tša bona ntle le go lahlegelwa ke nako ye ntši goba go kwa ga metric.


Tlhamo ya Bothata

A re re re dira teko ya maemo go leka algorithm ye mpsha ya maemo, ka botelele bja sešene bjalo ka metric ya mathomo. Go oketša moo, ela hloko gore batheetši ba rena ba ka arolwa ka magoro ka dihlopha tše tharo: bafsa ba dimilione tše 1, badiriši ba dimilione tše 2 ba nywaga e 18-45 le badiriši ba dimilione tše 3 ba nywaga e 45 le go feta. Karabelo ya algorithm ye mpsha ya maemo e be e tla fapana kudu gare ga dihlopha tše tša batheetši. Phapano ye e nabilego e fokotša go kwagala ga metric.


Ka mantšu a mangwe, palo ya batho e ka arolwa ka strata tše tharo, tšeo di hlalošitšwego ka tše di latelago:


A re re karolo e nngwe le e nngwe e na le kabo e tloaelehileng. Ka morago ga moo, metric ka sehloohong bakeng sa baahi le eona e na le kabo e tloaelehileng.

Mokgwa wa stratification

Re arola ka go se kgethe badiriši ka moka go tšwa go baagi ka tlhamo ya teko ya kgale ntle le go ela hloko diphapano magareng ga badiriši ba rena. Ka go realo, re ela hloko sampole yeo e nago le boleng bjo bo letetšwego le phapano ye e latelago.


Tsela ye nngwe ke go arola ka go se kgethe ka gare ga strat ye nngwe le ye nngwe go ya ka boima bja strat mo setšhabeng ka kakaretšo.

Tabeng ye, boleng bjo bo letetšwego le phapano ke tše di latelago.


Boleng bjo bo letetšwego bo swana le bja kgetho ya mathomo. Le ge go le bjalo, phapano e ka fase, yeo e netefatšago go kwagala ga metric ye e phagamego.

Bjale, a re ela hloko mokgwa wa Neyman . Ba šišinya go arola badiriši ka go se kgethe ka gare ga strat ye nngwe le ye nngwe ka boima bjo bo itšego.

Ka fao, boleng bjo bo letetšwego le phapano di lekana le tše di latelago tabeng ye.

Boleng bjo bo letetšwego bo lekana le boleng bjo bo letetšwego tabeng ya mathomo ka asymptotically. Lega go le bjalo, phapano e nyenyane kudu.

Teko ya Diphihlelo

Re hlatsetše bokgoni bja mokgwa wo ka teori. A re ke re etsisa disampole le ho leka mokgwa wa stratification empirically.

Anke re hlahlobeng melato e meraro:

  • di-strat ka moka tšeo di nago le bolela le diphapano tše di lekanago, .
  • di-strat ka moka tšeo di nago le bolela tše di fapanego le diphapano tše di lekanago, .
  • strats tsohle le bolela e lekanang le diphapano fapaneng.

Re tla diriša mekgwa ye meraro ka moka maemong ka moka gomme ra plota histogram le boxplot go di bapetša.

Go lokišetša khoutu

Sa pele, a re bopeng sehlopha ka Python seo se ekišago palo ya rena ya kakaretšo yeo e bopilwego ka di-strat tše tharo.

 class GeneralPopulation: def __init__(self, means: [float], stds: [float], sizes: [int], random_state: int = 15 ): """ Initializes our General Population and saves the given distributions :param means: List of expectations for normal distributions :param stds: List of standard deviations for normal distributions :param sizes: How many objects will be in each strata :param random_state: Parameter fixing randomness. Needed so that when conducting experiment repeatedly with the same input parameters, the results remained the same """ self.strats = [st.norm(mean, std) for mean, std in zip(means, stds)] self._sample(sizes) self.random_state = random_state def _sample(self, sizes): """Creates a general population sample as a mixture of strata :param sizes: List with sample sizes of the corresponding normal distributions """ self.strats_samples = [rv.rvs(size) for rv, size in zip(self.strats, sizes)] self.general_samples = np.hstack(self.strats_samples) self.N = self.general_samples.shape[0] # number of strata self.count_strats = len(sizes) # ratios for every strata in GP self.ws = [size/self.N for size in sizes] # ME and Std for GP self.m = np.mean(self.general_samples) self.sigma = np.std(self.general_samples) # ME and std for all strata self.ms = [np.mean(strat_sample) for strat_sample in self.strats_samples] self.sigmas = [np.std(strat_sample) for strat_sample in self.strats_samples]


Ka morago ga moo, a re oketšeng mešomo ya mekgwa ye meraro ya go tšea mehlala yeo e hlalošitšwego karolong ya teori.

 def random_subsampling(self, size): """Creates a random subset of the entire population :param sizes: subsample size """ rc = np.random.choice(self.general_samples, size=size) return rc def proportional_subsampling(self, size): """Creates a subsample with the number of elements, proportional shares of strata :param sizes: subsample size """ self.strats_size_proport = [int(np.floor(size*w)) for w in self.ws] rc = [] for k in range(len(self.strats_size_proport)): rc.append(np.random.choice(self.strats_samples[k], size=self.strats_size_proport[k])) return rc def optimal_subsampling(self, size): """Creates a subsample with the optimal number of elements relative to strata :param sizes: subsample size """ sum_denom = 0 for k in range(self.count_strats): sum_denom += self.ws[k] * self.sigmas[k] self.strats_size_optimal = [int(np.floor((size*w*sigma)/sum_denom)) for w, sigma in zip(self.ws, self.sigmas)] if 0 in self.strats_size_optimal: raise ValueError('Strats size is 0, please change variance of smallest strat!') rc = [] for k in range(len(self.strats_size_optimal)): rc.append(np.random.choice(self.strats_samples[k], size=self.strats_size_optimal[k])) return rc


Gape, bakeng sa karolo ya diphihlelo, ka mehla re hloka mošomo wa go ekiša tshepedišo ya teko.

 def run_experiments(self, n_sub, subsampling_method, n_experiments=1000): """Conducts a series of experiments and saves the results :param n_sub: size of sample :param subsampling_method: method for creating a subsample :param n_experiments: number of experiment starts """ means_s = [] if(len(self.general_samples)<100): n_sub = 20 if(subsampling_method == 'random_subsampling'): for n in range(n_experiments): rc = self.random_subsampling(n_sub) mean = rc.sum()/len(rc) means_s.append(mean) else: for n in range(n_experiments): if(subsampling_method == 'proportional_subsampling'): rc = self.proportional_subsampling(n_sub) elif(subsampling_method == 'optimal_subsampling'): rc = self.optimal_subsampling(n_sub) strats_mean = [] for k in range(len(rc)): strats_mean.append(sum(rc[k])/len(rc[k])) # Mean for a mixture means_s.append(sum([w_k*mean_k for w_k, mean_k in zip(self.ws, strats_mean)])) return means_s


Dipoelo tša go ekiša

Ge re lebelela palo ya batho ka kakaretšo, moo di-strat tša rena ka moka di nago le dikelo le diphapano tše di swanago, dipoelo tša mekgwa ye meraro ka moka di letetšwe go lekana go feta goba ka fase ga moo.

Ditekanyetšo tše di fapanego le diphapano tše di lekanago di hweditše dipoelo tše di kgahlišago kudu. Go šomiša stratification go fokotša kudu phapano.

Maemong ao a nago le ditekanyetšo tše di lekanago le diphapano tše di fapanego, re bona phokotšo ya phapano mokgweng wa Neyman.

Mafetšo

Bjale, o ka diriša mokgwa wa stratification go fokotša phapano ya metric le go godiša teko ge o kgoboketša batheetši ba gago gomme ka setegeniki o ba arola ka go se kgethe ka gare ga sehlopha se sengwe le se sengwe ka boima bjo bo itšego!