paint-brush
Ukusebenzisa Indlela Yokuhlanganisa Yokuhlola Ukuhlaziyange@nataliaogneva
33,170 ukufundwa
33,170 ukufundwa

Ukusebenzisa Indlela Yokuhlanganisa Yokuhlola Ukuhlaziya

nge Natalia Ogneva8m2024/04/19
Read on Terminal Reader
Read this story w/o Javascript

Kude kakhulu; Uzofunda

Ukusampula okuhleliwe kuyindlela enamandla yokuthuthukisa ukusebenza kahle kokuhlolwa kanye nokuzwela kwe-metric ekuhlaziyeni idatha. Ngokuhlanganisa izethameli zakho futhi uzihlukanise ngezisindo ezithile, ungakwazi ukuthuthukisa izivivinyo, unciphise ukuhluka, futhi uthuthukise ukwethembeka kwemiphumela.

Company Mentioned

Mention Thumbnail
featured image - Ukusebenzisa Indlela Yokuhlanganisa Yokuhlola Ukuhlaziya
Natalia Ogneva HackerNoon profile picture
0-item


Noma yikuphi ukuhlola kuhilela ukuhwebelana phakathi kwemiphumela esheshayo nokuzwela kwemethrikhi. Uma i-metric ekhethiwe ibanzi ngokokwehluka, kufanele silinde isikhathi eside ukuze siqinisekise ukuthi imiphumela yokuhlolwa inembile. Ake sicabangele indlela eyodwa yokusiza abahlaziyi bathuthukise ukuhlola kwabo ngaphandle kokulahlekelwa isikhathi esiningi noma ukuzwela kwemethrikhi.


Ukwakhiwa Kwenkinga

Ake sithi senza isilingo esijwayelekile ukuze sihlole i-algorithm entsha yezinga, enobude beseshini njengemethrikhi eyinhloko. Ukwengeza, cabanga ukuthi izethameli zethu zingahlukaniswa cishe ngamaqembu amathathu: intsha eyisigidi, abasebenzisi abayizigidi ezingu-2 abaneminyaka engu-18-45, kanye nabasebenzisi abayizigidi ezingu-3 abaneminyaka engu-45 nangaphezulu. Impendulo ku-algorithm entsha yezinga izohluka kakhulu phakathi kwala maqembu ezithameli. Lokhu kuhluka okubanzi kunciphisa ukuzwela kwemethrikhi.


Ngamanye amazwi, inani labantu lingahlukaniswa libe yizigaba ezintathu, ezichazwe kulokhu okulandelayo:


Ake sithi yonke ingxenye inokusabalalisa okuvamile. Bese, imethrikhi eyinhloko yabantu nayo inokusabalalisa okuvamile.

Indlela ye-stratification

Sihlukanisa ngokungahleliwe bonke abasebenzisi kubantu ngedizayini yokuhlola yakudala ngaphandle kokucabangela umehluko phakathi kwabasebenzisi bethu. Ngakho, sicabangela isampula ngevelu elandelayo elindelekile kanye nokwehluka.


Enye indlela ihlukanisa ngokungahleliwe ngaphakathi kwe-strat ngayinye ngokwesisindo se-strat kubantu abaningi.

Kulokhu, inani elilindelekile kanye nokwehluka yilokhu okulandelayo.


Inani elilindelekile liyafana nelikukhetho lokuqala. Nokho, ukuhluka kuncane, okuqinisekisa ukuzwela okuphezulu kwemethrikhi.

Manje, ake sicabangele indlela kaNeyman . Baphakamisa ukuhlukanisa abasebenzisi ngokungahleliwe ngaphakathi kwe-strat ngayinye enesisindo esithile.

Ngakho-ke, inani elilindelekile nokuhluka kuyalingana nokulandelayo kulesi simo.

Inani elilindelekile lilingana nenani elilindelekile esimweni sokuqala ngokungafani ne-symptotically. Nokho, ukuhluka kuncane kakhulu.

Ukuhlolwa Kokuqina

Sibonise ukusebenza kahle kwale ndlela ngokombono. Masilingise amasampuli futhi sihlole indlela yokuhlukanisa ngokunamandla.

Ake sicabangele izimo ezintathu:

  • wonke ama-strats ngezindlela ezilinganayo nokuhluka,
  • wonke ama-strats ngezindlela ezahlukene kanye nokuhluka okulinganayo,
  • wonke ama-strats anezindlela ezilinganayo kanye nokuhlukahluka okuhlukene.

Sizosebenzisa zonke izindlela ezintathu kuzo zonke izimo futhi sihlele i-histogram ne-boxplot ukuze siziqhathanise.

Ukulungiswa kwekhodi

Okokuqala, ake sakhe ikilasi kuPython elingisa inani labantu bethu elihlanganisa ama-strats amathathu.

 class GeneralPopulation: def __init__(self, means: [float], stds: [float], sizes: [int], random_state: int = 15 ): """ Initializes our General Population and saves the given distributions :param means: List of expectations for normal distributions :param stds: List of standard deviations for normal distributions :param sizes: How many objects will be in each strata :param random_state: Parameter fixing randomness. Needed so that when conducting experiment repeatedly with the same input parameters, the results remained the same """ self.strats = [st.norm(mean, std) for mean, std in zip(means, stds)] self._sample(sizes) self.random_state = random_state def _sample(self, sizes): """Creates a general population sample as a mixture of strata :param sizes: List with sample sizes of the corresponding normal distributions """ self.strats_samples = [rv.rvs(size) for rv, size in zip(self.strats, sizes)] self.general_samples = np.hstack(self.strats_samples) self.N = self.general_samples.shape[0] # number of strata self.count_strats = len(sizes) # ratios for every strata in GP self.ws = [size/self.N for size in sizes] # ME and Std for GP self.m = np.mean(self.general_samples) self.sigma = np.std(self.general_samples) # ME and std for all strata self.ms = [np.mean(strat_sample) for strat_sample in self.strats_samples] self.sigmas = [np.std(strat_sample) for strat_sample in self.strats_samples]


Bese, ake sengeze imisebenzi yezindlela ezintathu zamasampula ezichazwe engxenyeni yetiyori.

 def random_subsampling(self, size): """Creates a random subset of the entire population :param sizes: subsample size """ rc = np.random.choice(self.general_samples, size=size) return rc def proportional_subsampling(self, size): """Creates a subsample with the number of elements, proportional shares of strata :param sizes: subsample size """ self.strats_size_proport = [int(np.floor(size*w)) for w in self.ws] rc = [] for k in range(len(self.strats_size_proport)): rc.append(np.random.choice(self.strats_samples[k], size=self.strats_size_proport[k])) return rc def optimal_subsampling(self, size): """Creates a subsample with the optimal number of elements relative to strata :param sizes: subsample size """ sum_denom = 0 for k in range(self.count_strats): sum_denom += self.ws[k] * self.sigmas[k] self.strats_size_optimal = [int(np.floor((size*w*sigma)/sum_denom)) for w, sigma in zip(self.ws, self.sigmas)] if 0 in self.strats_size_optimal: raise ValueError('Strats size is 0, please change variance of smallest strat!') rc = [] for k in range(len(self.strats_size_optimal)): rc.append(np.random.choice(self.strats_samples[k], size=self.strats_size_optimal[k])) return rc


Futhi, engxenyeni ye-empirical, sihlala sidinga umsebenzi wokulingisa inqubo yokuhlola.

 def run_experiments(self, n_sub, subsampling_method, n_experiments=1000): """Conducts a series of experiments and saves the results :param n_sub: size of sample :param subsampling_method: method for creating a subsample :param n_experiments: number of experiment starts """ means_s = [] if(len(self.general_samples)<100): n_sub = 20 if(subsampling_method == 'random_subsampling'): for n in range(n_experiments): rc = self.random_subsampling(n_sub) mean = rc.sum()/len(rc) means_s.append(mean) else: for n in range(n_experiments): if(subsampling_method == 'proportional_subsampling'): rc = self.proportional_subsampling(n_sub) elif(subsampling_method == 'optimal_subsampling'): rc = self.optimal_subsampling(n_sub) strats_mean = [] for k in range(len(rc)): strats_mean.append(sum(rc[k])/len(rc[k])) # Mean for a mixture means_s.append(sum([w_k*mean_k for w_k, mean_k in zip(self.ws, strats_mean)])) return means_s


Imiphumela yokulingisa

Uma sibheka inani labantu elivamile, lapho wonke ama-strats ethu enamanani afanayo kanye nokuhluka, imiphumela yazo zonke izindlela ezintathu kulindeleke ukuthi ilingane kakhulu noma ilingane.

Izindlela ezihlukene nokuhluka okulinganayo kuthole imiphumela ejabulisa kakhudlwana. Ukusebenzisa i-stratification kunciphisa kakhulu ukuhluka.

Ezimeni ezinezindlela ezilinganayo nokuhluka okuhlukile, sibona ukuncipha kokuhluka endleleni kaNeyman.

Isiphetho

Manje, ungasebenzisa indlela yokuhlukanisa ukuze unciphise ukuhluka kwemethrikhi futhi uthuthukise isilingo uma uhlanganisa izethameli zakho futhi ngobuchwepheshe uzihlukanise ngokungahleliwe ngaphakathi kweqoqo ngalinye ngezisindo ezithile!