Umbala we-Left Introduction Hypothesis testing 2.1 Introduction 2.2 Bayesian statistics 2.3 Test martingales 2.4 p-values 2.5 Optional Stopping and Peeking 2.6 Combining p-values and Optional Continuation 2.7 A/B testing Safe Tests 3.1 Introduction 3.2 Classical t-test 3.3 Safe t-test 3.4 χ2 -test 3.5 Safe Proportion Test Safe Testing Simulations 4.1 Introduction and 4.2 Python Implementation 4.3 Comparing the t-test with the Safe t-test 4.4 Comparing the χ2 -test with the safe proportion test Mixture sequential probability ratio test 5.1 Sequential Testing 5.2 Mixture SPRT 5.3 mSPRT and the safe t-test Online Controlled Experiments 6.1 Safe t-test on OCE datasets Vinted A/B tests and 7.1 Safe t-test for Vinted A/B tests 7.2 Safe proportion test for sample ratio mismatch Conclusion and References 7 Vinted A / B Tests Vinted kuyinto emakethe ye-online ye-clothes and accessories. Kusukela kokufundwa e-2008, i-Vinted iye yenza abasebenzisi angaphezu kuka-75 million ukuze yakhule ngokushesha emakethe ye-Europe's largest second-hand clothes marketplace. Nge inani elikhulu le-user, ivela inani elikhulu le-A/B test ngokushesha ukuze inikeze imiphumela enhle kumasebenzisi. Lokhu kwenza i-Vinted emkhakheni enhle yokuhlola ukusebenza kwe-safe testing. Kulesi isigaba, sinikeze i-safe t-test kanye ne-safe proportion tests ku-Vinted's data experimental. I-safe t-test iyahambisana ne-test ye-classical yokuhlola imiphumela ye-A/B. Ngaphezu kwalokho 7.1 Ukulungiselela T-Test for Vinted A / B Tests I-metric ye-162 ama-Vinted ama-experiments kusuka kuMarch 2023 kuya ku-June 2023 iyatholakala ekupheleleni le-analysis. Sihlanganisa i-snapshots ezingu-143 ama-metric, ebandakanya ubukhulu be-metric, i-standard deviation, kanye ne-sampling ye-both control and test groups. I-experiments nge-variants eziningana zihlanganiswa njenge-tests ezahlukile ne-control group. I-safe t-test ne-classical t-test lihlanganiswa kuzo zonke i-42115 ama-experiment/metric combinations kulesi dataset. I-Table 7 ibonisa imiphumela ye-test ye-statistical e-level α = 0.05. Iziphumo zeTabela 7 zibonisa ukuthi i-test ye-safe kanye ne-test ye-classic zihlanganisa ngokuvamile ngokulinganayo mayelana ne-significance ye-metrics. I-379 izimo lapho i-test ye-safe ibonise i-H0 ukuthi i-test ayikho, zihlanganisa ukuthi izibalo zihlanganisa ukuthi iziphumo zihlanganisa ukuthi akugcwele. Inani elikhulu le-1645, lapho i-test ibonise i-H0 lapho i-test ye-safe ayikho, iyatholakala kakhulu. I-test ye-safe iyatholakala kakhulu lapho ibonise idatha ngokuhlukile, okuvumela izinzuzo ezininzi yokukhangisa i-H0. Lezi zithatha zihlanganiswa ngosuku zonke, okuv I-mixture sequential probability ratio test (mSPRT) ifakwe ku-set efanayo ye-experiments. Imiphumela ingatholakala ku-Table 8. Ukubala imiphumela yeTabela 8 neTabela 7 kubonisa ukuthi i-mSPRT iyatholakala kakhulu Nangona lokhu kubaluleke isakhiwo se-group-sequential, imiphumela yethu ye-simulation ibonisa ukuthi i-mSPRT kuyinto kuphela isizinda se-statistical engaphansi kwe-safe t-test. Ukuguqulwa ku-safe t-test imiphumela, kuboniswa ukuthi i-safe t-test isebenza kahle kakhulu eminye imithombo kunezinye. Lapha, siza kusetshenziselwa ukucubungula imithombo ukuze ufunde ukuthi lokhu kuyimfuneko. Ukuze ukwanele ukusebenza kwe-safe t-test ku-metric, sicela usebenzisa i-pi coefficient ukucubungula imiphumela yayo ne-classic t-test. I-pi coefficient, eyaziwa nangokuthi i-Matthews correlation coefficient, isetshenziselwa ukucubungula ukucubungula kwama-variables amabini. Ukuze uthole ukucubungula izicelo ye-metric ngamunye, kukhona ucwaningo lwe-text ye-use-case ku-Vinted's A Ngo-Introduction to A/B Testing, kubhalwe ukuthi ezinye izibalo zithunywe isikhathi eside kakhulu ukuze zithunywe. Lokhu kubalulekile ukuthi idatha akuyona ngokuzimela futhi zihlanganisa ngokulinganayo phakathi kwezinsuku ze-test. Ukubuyekeza I-Table 9, sinamathela ukuxhumana eliphezulu phakathi kokusebenza kwe-safe t-test kanye ne-t-test ye-classical ku-metric ezinxulumene ne-searches, seses, kanye ne-impressions. Lezi zithunywe izinga ezincinane phakathi kokuphumelela ku-test kanye nokuphumelela kwe-metric. Ngokungafani, i-safe t-test akufanele ukusebenza kahle ku-metric ezingaphezu kuka-transactions ne-order cancellations. Ngokusho, lezi ziph 7.2 Ukuhlolwa kwe-safe proportion for sample ratio mismatch Ukuze ukunqoba ukusebenza kwe-safe proportion test kanye ne- χ2 test ekubunjweni kwe-sampling ratio mismatch (SRM), i-distribution ye195 ama-experiments kusuka ku-Vinted isetshenziswe. I-safe test isetshenziselwa i-snapshots yosuku zonke ze-distribution, kanti i- χ2 test isetshenziselwa i-distribution ngosuku lokugqibela le-experiment. Ukuze i-SRM, isisindo se-significance ye-α = 0.01 isetshenziselwa ukunciphisa inani le-false positives. I-beta-pre-values ye-α1, β1 = 1000 isetshenziselwa i-safe proportion test. Ukubalwa kweziphumo phakathi kwe-safe proportion test ne- χ2 isetshenziswe ku Umbhali: (1) U-Daniel Beasley Author: (1) U-Daniel Beasley U-Archiv iyatholakala ngaphansi kwe-ATTRIBUTION-NONCOMMERCIAL-SHAREALIKE 4.0 INTERNATIONAL. Okuzenzakalelayo Ngaphansi kwe-ATTRIBUTION-NONCOMMERCIAL-SHAREALIKE 4.0 INTERNATIONAL. I-Archive ye-Archive I-Archive ye-Archive