Hi fellows! In this two-part article, I would like to focus on a common problem in statistics - multiple comparisons. In the we will dive into the main terminology of this problem and the most common solutions. In the we will explore practical implementation with code and interpret the results. first part, second part, Python I will use metaphors to aid immersion in the topic and make it more fun. Let's get started! 😎 The Multiple Comparisons Problem: A Nutshell Imagine that you come to the party where everyone is wearing masks on their face and you are trying to guess if there is a celebrity behind a mask. The more assumptions you make the more likely you are to make a mistake at least once (hello, !). This is the difficulty of the multiple comparisons problem in statistics: for every hypothesis you test, another pops up, increasing your chances of being wrong. Type I errors Essential Jargon for the Party The null hypothesis is your baseline assumption that this particular guest is just a regular visitor, not a hidden celebrity. But when we are at the party there are a lot of guests around and we need to make a lot of assumptions. This is how testing multiple hypotheses appears. Null Hypothesis (H0): A Type I error is when you identify some guest as a celebrity, but it turns out not true. In the language of statistics, it means that we wrongly reject the null hypothesis, thinking that we detect a real difference when there isn’t one. Type I Error: FWER is the probability of making one or more false discoveries, or when performing multiple hypotheses tests. In other words, it is when we are afraid of making even one mistake among all our assumptions (tests). For example, let’s say we’re testing 10 hypotheses with Type I Error = 0.05, then our (50%, Karl!). But we don’t want to take such a risk that is why we need to control the probability of mistakes somehow. comes to help us (but more on that later) (FWER): Family-Wise Error Rate type I errors family-wise error = 0.05 * 10 = 0.5 Bonferroni correction In statistics, FDR is the expected proportion of "discoveries" (rejected null hypotheses) that are false (incorrect rejections of the null). So, we are still at the party, but we've already had a glass of sparkling wine and become more risky. This means that we are not afraid of making one mistake because we would like to catch as many real celebrities as possible. Of course, we would like to be right in the large proportion of our assumption, and here FDR-controlling procedures like come to help us (but more on that later) (FDR): False Discovery Rate The Benjamin-Hochberg Correction FWER: Bonferroni Correction As I mentioned above, is designed for those who are afraid of making even one mistake. It demands you be extra sure about each discovery when you are looking at many possibilities at once. How does it do this? It just makes criteria for deciding significance stricter and does not allow you to choose the “wrong“ celebrity Bonferroni correction Let's turn to our previous example with 10 hypotheses. For each finding to be considered true, it must meet a much stricter standard. If you are testing 10 hypotheses and your standard certainty level is 0.05, Bonferroni adjusts this to 0.005 for each test. Formula: Adjusted significance level = α / n α is your initial level of certainty (usually 0.05) n is the number of hypotheses you are testing Impact: This method greatly reduces the chance of false discoveries (Type I errors) by setting the bar higher for what counts as a significant result. However, its strictness can also prevent you from recognizing true findings, like you don't recognize a celebrity because you are too focused on not making a mistake. In essence, the Bonferroni correction prioritizes avoiding false positives at the risk of missing out on true discoveries, making it a conservative choice in hypothesis testing. FDR: The Benjamin-Hochberg Correction As we have already discussed, is like a more risky guy who allows you to confidently identify celebrities without being too strict. The Benjamin-Hochberg correction This method adjusts the significance levels based on the rank of each p-value, controlling FDR. This approach allows more flexibility compared to the Bonferroni correction. The Process: From the smallest to the largest. Rank P-values: For each hypothesis, it calculates a different threshold, which becomes more lenient for hypotheses with smaller p-values. This is based on their rank and the total number of tests (more details can be found in the next part of this article) Adjust Significance Levels: So, by focusing on controlling FDR, the Benjamin-Hochberg correction allows you to find more celebrities among all the guests at the party. This approach is particularly useful when you variety of hypotheses and agree on some level of making mistakes in order not to miss out on important findings. In summary, the Benjamin-Hochberg correction offers a practical balance between discovering true effects and controlling the rate of false positives , we discussed the main terminology of multiple comparison problem and the most common ways to deal with them. In the next part, I will focus on a practice interpretation with Python code. In conclusion See you!