paint-brush
Ukusebenzisa iNdlela yoLungiselelo loVavanyo lokuVavanyange@nataliaogneva
33,138 ukufunda
33,138 ukufunda

Ukusebenzisa iNdlela yoLungiselelo loVavanyo lokuVavanya

nge Natalia Ogneva8m2024/04/19
Read on Terminal Reader
Read this story w/o Javascript

Inde kakhulu; Ukufunda

Iisampulu ecwangcisiweyo bubuchule obunamandla bokwandisa imvavanyo esebenzayo kunye nobuntununtunu bemetric kuhlalutyo lwedatha. Ngokudibanisa abaphulaphuli bakho kwaye ubahlule ngobunzima obuthile, unokwandisa imifuniselo, unciphise umahluko, kwaye uphucule ukuthembeka kweziphumo.

Company Mentioned

Mention Thumbnail
featured image - Ukusebenzisa iNdlela yoLungiselelo loVavanyo lokuVavanya
Natalia Ogneva HackerNoon profile picture
0-item


Nawuphi na umfuniselo ubandakanya urhwebo phakathi kweziphumo ezikhawulezayo kunye nobuntununtunu bemetriki. Ukuba i-metric ekhethiweyo ibanzi ngokomahluko, kufuneka silinde ixesha elide ukuze siqinisekise ukuba iziphumo zovavanyo zichanekile. Makhe siqwalasele indlela enye yokunceda abahlalutyi bonyuse imifuniselo yabo ngaphandle kokuphulukana nexesha elininzi okanye ubuntununtunu bemetriki.


Ukuqulunqwa kweNgxaki

Masithi siqhuba umfuniselo osemgangathweni wokuvavanya umgangatho omtsha we-algorithm, kunye nobude beseshoni njengeyona metric yokuqala. Ukongeza, qwalasela ukuba abaphulaphuli bethu banokuhlelwa ngokwamaqela amathathu: isigidi esi-1 solutsha, abasebenzisi abazizigidi ezi-2 abaneminyaka eyi-18-45, kunye nezigidi ezi-3 zabasebenzisi abaneminyaka engama-45 nangaphezulu. Impendulo kwi-algorithm entsha yokubeka iya kwahluka kakhulu phakathi kwala maqela abaphulaphuli. Olu lwahluko lubanzi lunciphisa uvakalelo lwemethrikhi.


Ngamanye amazwi, uluntu lunokwahlulwa lube ngamacandelo amathathu, achazwe koku kulandelayo:


Masithi icandelo ngalinye linonikezelo oluqhelekileyo. Emva koko, i-metric ephambili yabemi nayo inokuhanjiswa okuqhelekileyo.

Indlela yestratification

Sahlula ngokungenamkhethe bonke abasebenzisi kubemi kuyilo lovavanyo lwakudala ngaphandle kokuqwalasela umahluko phakathi kwabasebenzisi bethu. Ngaloo ndlela, siqwalasela isampuli kunye nexabiso elilandelayo elilindelekileyo kunye nokwahluka.


Enye indlela kukwahlulahlula ngokungakhethiyo ngaphakathi kwistrat nganye ngokobunzima bestrat kubemi ngokubanzi.

Kule meko, ixabiso elilindelekileyo kunye nokuhluka oku kulandelayo.


Ixabiso elilindelekileyo liyafana nolokuqala ukhetho. Nangona kunjalo, umahluko ungaphantsi, oqinisekisa uvakalelo oluphezulu lwemetric.

Ngoku, makhe siqwalasele indlela kaNeyman . Bacebisa ukwahlula abasebenzisi ngokungenamkhethe ngaphakathi kwistrat nganye enezisindo ezithile.

Ngoko ke, ixabiso elilindelekileyo kunye nokwahluka kuyalingana nokulandelayo kule meko.

Ixabiso elilindelekileyo lilingana nexabiso elilindelekileyo kwimeko yokuqala ngokungafaniyo. Nangona kunjalo, umahluko uncinci kakhulu.

UVavanyo lwaMava

Sibonakalise ukusebenza kakuhle kwale ndlela ngokwethiyori. Masilinganise iisampulu kwaye sivavanye indlela yohlulo ngamandla.

Makhe siqwalasele iimeko ezintathu:

  • zonke iintambo ngeendlela ezilinganayo kunye nokwahluka,
  • zonke iintlobo ngeentlobo ezahlukeneyo kunye neentlobo ezahlukeneyo,
  • zonke iistrats zineendlela ezilinganayo kunye nokwahluka okwahlukileyo.

Siza kusebenzisa zonke iindlela ezintathu kuzo zonke iimeko kwaye sicwangcise i-histogram kunye nebhokisi yebhokisi ukuyithelekisa.

Ukulungiswa kwekhowudi

Okokuqala, masenze iklasi kwiPython elinganisa inani labantu bethu ngokubanzi elibandakanya iistrats ezintathu.

 class GeneralPopulation: def __init__(self, means: [float], stds: [float], sizes: [int], random_state: int = 15 ): """ Initializes our General Population and saves the given distributions :param means: List of expectations for normal distributions :param stds: List of standard deviations for normal distributions :param sizes: How many objects will be in each strata :param random_state: Parameter fixing randomness. Needed so that when conducting experiment repeatedly with the same input parameters, the results remained the same """ self.strats = [st.norm(mean, std) for mean, std in zip(means, stds)] self._sample(sizes) self.random_state = random_state def _sample(self, sizes): """Creates a general population sample as a mixture of strata :param sizes: List with sample sizes of the corresponding normal distributions """ self.strats_samples = [rv.rvs(size) for rv, size in zip(self.strats, sizes)] self.general_samples = np.hstack(self.strats_samples) self.N = self.general_samples.shape[0] # number of strata self.count_strats = len(sizes) # ratios for every strata in GP self.ws = [size/self.N for size in sizes] # ME and Std for GP self.m = np.mean(self.general_samples) self.sigma = np.std(self.general_samples) # ME and std for all strata self.ms = [np.mean(strat_sample) for strat_sample in self.strats_samples] self.sigmas = [np.std(strat_sample) for strat_sample in self.strats_samples]


Emva koko, masidibanise imisebenzi yeendlela ezintathu zesampulu ezichazwe kwicandelo lethiyori.

 def random_subsampling(self, size): """Creates a random subset of the entire population :param sizes: subsample size """ rc = np.random.choice(self.general_samples, size=size) return rc def proportional_subsampling(self, size): """Creates a subsample with the number of elements, proportional shares of strata :param sizes: subsample size """ self.strats_size_proport = [int(np.floor(size*w)) for w in self.ws] rc = [] for k in range(len(self.strats_size_proport)): rc.append(np.random.choice(self.strats_samples[k], size=self.strats_size_proport[k])) return rc def optimal_subsampling(self, size): """Creates a subsample with the optimal number of elements relative to strata :param sizes: subsample size """ sum_denom = 0 for k in range(self.count_strats): sum_denom += self.ws[k] * self.sigmas[k] self.strats_size_optimal = [int(np.floor((size*w*sigma)/sum_denom)) for w, sigma in zip(self.ws, self.sigmas)] if 0 in self.strats_size_optimal: raise ValueError('Strats size is 0, please change variance of smallest strat!') rc = [] for k in range(len(self.strats_size_optimal)): rc.append(np.random.choice(self.strats_samples[k], size=self.strats_size_optimal[k])) return rc


Kwakhona, kwinxalenye yobungqina, sihlala sifuna umsebenzi wokulinganisa inkqubo yovavanyo.

 def run_experiments(self, n_sub, subsampling_method, n_experiments=1000): """Conducts a series of experiments and saves the results :param n_sub: size of sample :param subsampling_method: method for creating a subsample :param n_experiments: number of experiment starts """ means_s = [] if(len(self.general_samples)<100): n_sub = 20 if(subsampling_method == 'random_subsampling'): for n in range(n_experiments): rc = self.random_subsampling(n_sub) mean = rc.sum()/len(rc) means_s.append(mean) else: for n in range(n_experiments): if(subsampling_method == 'proportional_subsampling'): rc = self.proportional_subsampling(n_sub) elif(subsampling_method == 'optimal_subsampling'): rc = self.optimal_subsampling(n_sub) strats_mean = [] for k in range(len(rc)): strats_mean.append(sum(rc[k])/len(rc[k])) # Mean for a mixture means_s.append(sum([w_k*mean_k for w_k, mean_k in zip(self.ws, strats_mean)])) return means_s


Iziphumo zokulinganisa

Ukuba sijonga abantu ngokubanzi, apho zonke iistrats zethu zinexabiso elifanayo kunye nokuhluka, iziphumo zazo zonke iindlela ezintathu zilindeleke ukuba zilingane okanye zilingane.

Iindlela ezahlukeneyo kunye nokwahluka okulinganayo kufumene iziphumo ezinomdla ngakumbi. Ukusebenzisa istratification kunciphisa kakhulu umahluko.

Kwiimeko ezineendlela ezilinganayo kunye nokuhluka okuhlukeneyo, sibona ukunciphisa ukuhluka kwindlela ye-Neyman.

Ukuqukumbela

Ngoku, unokusebenzisa indlela yohlulo lokunciphisa umahluko wemetric kwaye unyuse umfuniselo ukuba udibanisa abaphulaphuli bakho kwaye ngobuchule ubahlule ngokungenamkhethe ngaphakathi kweqela ngalinye kunye nobunzima obuthile!