When working in the tech world (or at any job, for that matter), knowing how to harness statistics empowers you to make data-driven decisions. Whether you’re a marketer, designer, or developer, it is absolutely critical that you understand statistical terminology, how to interpret findings, and when to transform those findings into action.
The most important take away should be that statistics alone will not necessarily make your arguments better. Statistics are fuel for your stories, but they are not stories in themselves. Make sure that you frame your findings in a way that persuasively move your audience, enriching your data with meaning and a call to action.
“Once something has occurred and we can put together a story to explain it, it starts to seem like the outcome was predestined. Statistics don’t appeal to our need to understand cause and effect, which is why they are so frequently ignored or misinterpreted. Stories, on the other hand, are a rich means to communicate precisely because they emphasize cause and effect.” ― Michael J. Mauboussin, The Success Equation
A population is any large collection of objects or individuals, such as Americans, students, or trees about which information is desired.
A parameter is any summary number, like an average or percentage, that describes the entire population.
A sample is a representative group drawn from the population.
A statistic is any summary number, like an average or percentage, that describes the sample.
National Center for Education Statistics
The mean of a set of numbers, sometimes simply called the average, is the sum of the data divided by the total number of data.
The median of a set of numbers is the middle number in the set (after the numbers have been arranged from least to greatest) — or, if there are an even number of data, the median is the average of the middle two numbers.
The mode of a set of numbers is the number which occurs most often.
The difference between the lowest and highest values in a set.
The general idea of hypothesis testing involves:
A null hypothesis proposes that no statistical significance exists in a set of given observations. It is the hypothesis that the researcher is trying to disprove.
An alternative hypothesis simply is the inverse, or opposite, of the null hypothesis. So, if we continue with the above example, the alternative hypothesis would be that there IS indeed a statistically-significant relationship between multiple variables.
A Type 1 Error is the incorrect rejection of a true null hypothesis (also known as a “false positive” finding).
A Type 2 Error is incorrectly retaining a false null hypothesis (also known as a “false negative” finding)
LibGuides at La Trobe University
The p-value is a number between 0 and 1, that can be interpreted as follows:
A technique for determining the statistical relationship between two or more variables where a change in a dependent variable is associated with, and depends on, a change in one or more independent variables. See also correlation.
Independent Variable — It is a variable that stands alone and isn’t changed by the other variables you are trying to measure. For example, someone’s age might be an independent variable.
Dependent Variable — A dependent variable is the variable being tested and measured in a scientific experiment. The dependent variable is ‘dependent’ on the independent variable. As the experimenter changes the independent variable, the effect on the dependent variable is observed and recorded.
Regression Analysis — Regression is a statistical measure that attempts to determine the strength of the relationship between one dependent variable and a series of other changing variables (the independent variables).
Simple Linear Regression — Regression that uses only one independent variable and describes the relationship between the independent and dependent variables as a straight line.
Correlation Coefficient (r) — the correlation coefficient r measures the strength and direction of a linear relationship between two variables. It ranges from -1.0 to +1.0. The closer r is to +1 or -1, the more closely the two variables are related. If r is close to 0, it means there is no relationship between the variables.
R-Squared — R-squared is a statistical measure of how close the data are to the fitted regression line. It is the percentage of the variation that can be explained by a linear model.
First: work out the difference (increase) between the two numbers you are comparing.
Increase = New Number — Original Number
Then: divide the increase by the original number and multiply the answer by 100.
% increase = Increase ÷ Original Number × 100.
If your answer is a negative number then this is a percentage decrease.
First: work out the difference (decrease) between the two numbers you are comparing.
Decrease = Original Number — New Number
Then: divide the decrease by the original number and multiply the answer by 100.
% Decrease = Decrease ÷ Original Number × 100
If your answer is a negative number then this is a percentage increase.
Here are some great online statistics guides to help you.