There are many ways to quantify variability, however, here we will focus on the most common ones: , , and . In the field of , we typically use different formulas when working with . variance standard deviation coefficient of variation statistics population data and sample data Sample Formulas vs Population Formulas When we have the whole population, each data point is known so you are 100% sure of the measures we are calculating. When we take a sample of this population and compute a sample statistic, it is interpreted as an approximation of the population parameter. Moreover, if we extract 10 different samples from the same population, we will get 10 different measures. Statisticians have solved the problem by adjusting the algebraic formulas for many statistics to reflect this issue. Therefore, we will explore both and , as they are both used. population sample formulas The Mean, Median and Mode You must be asking yourself why there are unique formulas for the . Well, actually, the is the average of the sample data points, while the is the average of the population data points. As you can see in the picture below, there are two different formulas, but technically, they are computed in the same way. , and mean median mode sample mean population mean After this short clarification, it’s time to get onto . variance Variance formula measures the dispersion of a set of data points around their value. Variance mean , denoted by squared, is equal to the sum of squared differences between the observed values and the , divided by the total number of observations. Population variance sigma population mean , on the other hand, is denoted by s squared and is equal to the sum of squared differences between observed values and the , divided by the number of sample observations minus 1. Sample variance sample sample mean A Closer Look at the Formula for Population Variance When you are , it is hard to grasp everything right away. Therefore, let’s stop for a second to examine the formula for the population and try to clarify its meaning. The main part of the formula is its , so that’s what we want to comprehend. getting acquainted with statistics numerator The sum of differences between the observations and the , squared. So, this means that the closer a number is to the , the lower the result we obtain will be. And the further away from the it lies, the larger this difference. mean mean mean Why do we Elevate to the Second Degree Squaring the differences has two main purposes. First, by squaring the numbers, we always get non-negative computations. Without going too deep into the mathematics of it, it is intuitive that dispersion cannot be negative. Dispersion is about distance and . distance cannot be negative If, on the other hand, we calculate the difference and do not elevate to the second degree, we would obtain both positive and negative values that, when summed, would cancel out, leaving us with no information about the dispersion. Second, squaring amplifies the effect of large differences. For example, if the is 0 and you have an observation of 100, the squared spread is 10,000! mean Putting the Population Formula to Use Alright, enough dry theory. It is time for a practical example. We have a population of five observations – 1, 2, 3, 4 and 5. Let’s find its . variance We start by calculating the : (1 + 2 + 3 + 4 + 5) / 5 = 3. mean Then we apply the formula which we just discussed: ((1 – 3)2 + (2 – 3)2+ (3 – 3)2 + (4 – 3)2 + (5 – 3)2) / 5. When we do the math, we get 2. So, the of the data set is 2. population variance Calculating the Sample Variance But what about the ? This would only be suitable if we were told that these five observations were a sample drawn from a population. So, let’s imagine that’s the case. The is once again 3. The numerator is the same, but the denominator is going to be 4, instead of 5. sample variance sample mean This gives us a of 2.5. sample variance Why the Results are not the Same To conclude the topic, we should interpret the result. Why is the bigger than the ? In the first case, we knew the population. That is, we had all the data and we calculated the . In the second case, we were told that 1, 2, 3, 4 and 5 was a sample, drawn from a bigger population. variance sample variance population variance variance The Population of the Sample Imagine that the population of the sample were the following 9 numbers: 1, 1, 1, 2, 3, 4, 5, 5 and 5. Clearly, the numbers are the same, but there is a concentration around the two extremes of the data set – 1 and 5. The of this population is 2.96. variance So, our has rightfully corrected upwards in order to reflect the higher variability. This is the reason why there are different formulas for sample and population data. sample variance potential Why we Use Standard Deviation While is a common measure of data dispersion, in most cases the figure you will obtain is pretty large. Moreover, it is hard to compare because the unit of measurement is squared. The easy fix is to calculate its square root and obtain a statistic known as . variance standard deviation In most analyses, is much more meaningful than . standard deviation variance The Formulas Similar to the there is also and . The formulas are: the square root of the and square root of the respectively. I believe there is no need for an example of the calculation. Anyone with a calculator in their hands will be able to do the job. variance population sample standard deviation population variance sample variance The Coefficient of Variation (CV) The last measure which we will introduce is the . It is equal to the , divided by the . coefficient of variation standard deviation mean Another name for the term is . This is an easy way to remember its formula – it is simply the relative to the . relative standard deviation standard deviation mean As you probably guessed, there is a population and sample formula once again. Why We Need the Coefficient of Variation So, is the most common measure of variability for a single data set. But why do we need yet another measure such as the ? Well, comparing the of two different data sets is meaningless, but comparing is not. standard deviation coefficient of variation standard deviations coefficients of variation once said: Aristotle “Tell me, I’ll forget. Show me, I’ll remember. Involve me, I’ll understand.” Comparing Standard Deviations To make sure you remember, here’s an example of a comparison between . Let’s take the prices of pizza at 10 different places in New York. As you can see in the picture below, they range from 1 to 11 dollars. standard deviations Now, imagine that you only have Mexican pesos. To you, the prices will look more like 18.81 pesos to 206.91 pesos, given the exchange rate of 18.81 pesos for one dollar. Let’s combine our knowledge so far and find the and of these two data sets. standard deviations coefficients of variation First, we have to see if this is a sample or a population. Are there only 11 restaurants in New York? Of course not. This is obviously a sample drawn from all the restaurants in the city. Then we have to use the formulas for sample . Second, we have to find the . The in dollars is equal to 5.5 and the in pesos to 103.46. The third step of the process is finding the . Following the formula that we went over earlier, we can obtain 10.72 dollars squared and 3793.69 pesos squared. The respective sample are 3.27 dollars and 61.59 pesos, as shown in the picture below. Sample or Population Data measures of variability Finding the Mean mean mean mean Calculating the Sample Variance and the Standard Deviation sample variance standard deviations A Few Observations Let’s make a couple of observations. First, gives results in squared units, while in original units, as shown below. variance standard deviation This is the main reason why professionals prefer to use as the main measure of variability. It is directly interpretable. Squared dollars mean nothing, even in the field of statistics. standard deviation Second, we got of 3.27 and 61.59 for the same pizza at the same 11 restaurants in New York City. However, this seems wrong. Let’s make it right by using our last tool – . standard deviations the coefficient of variation The Advantage of the Coefficient of Variation We can divide the by the respective . As you can see in the picture below, we get the two . standard deviations means coefficients of variation The result is the same – 0.60. Notice that it is not dollars, pesos, dollars squared or pesos squared. It is just 0.60. Important: This shows us the great advantage that the gives us. Now, we can confidently say that the two data sets have the same variability, which was what we expected beforehand. coefficient of variation In the picture above, you can see the main advantages of the . coefficient of variation The Pros and Cons of Each of the Measures of Variability To recap, there are three main – , and . Each of them has different strengths and applications. Usually, we prefer over because it is directly interpretable. However, the has its edge over when it comes to comparing data. After reading this tutorial, you should feel confident using all of them. measures of variability variance standard deviation coefficient of variation standard deviation variance coefficient of variation standard deviation Now, using measures when working with one variable probably seems like a piece of cake. However, what if there were 2 variables? Will you be able to represent their relationship? If your answer is , feel free to jump onto our next tutorial, in order to turn that into a . no no yes Or, if you’re considering a career in data science, check out our articles: , , , and . The Data Scientist Profile The 5 Skills You Need to Match Any Data Science Job Description 10 Pro Tips to Make Your Resume One of a Kind 15 Data Science Consulting Companies Hiring Now *** Interested in learning more? You can take your skills from good to great with our statistics tutorials ! Ready to take the first step towards a career in data science? Check out the complete Data Science Program today. We also offer a free preview version of the Data Science Program. You’ll receive 12 hours of beginner to advanced content for free. It’s a great way to see if the program is right for you.