In the data analysis and decision sector the terms correlation and causation are quite often confused, however, they are not synonyms, and here are the reasons why:
TL;DR
A correlation does not imply causation, but causation always implies correlation.
The third variable problem and the directionality problem are two of the main reasons why correlation does not imply causation.
Use correlational research designs to identify the correlation between variables, whereas you should use experimental designs to test causation.
Correlation means there is an association between variables, i.e. when one variable changes so does the other- put it more simply, it's when the variables of your dataset look like they are moving together in some way. More specifically, a correlation reflects the strength and/or direction of the association between two or more variables: a positive correlation means that both variables change in the same direction (e.g. when x is higher, y tends to be higher), whereas when the variables change in opposite directions there is a negative correlation (e.g. when x is higher, y tends to be lower). And as expected, a zero correlation means that there is no relationship between the variables.
In other words, when there is a correlation between two variables then those variables covary, and that represents a statistical indicator of the relationship between the variables. However, the reasons behind this covariation are not necessarily because of a causal link (causation), neither a direct nor an indirect causal link. Instead, there are mainly two reasons why correlation is not causation: the third variable and the directionality problem. Let's break them down:
The
And the second main reason comprises the
Note that when you want to describe the correlation between variables, it is correct to use the word relationship instead of association interchangeably, but not causation, because:
Causation (also known as causality) means that changes in one variable
All in all, a correlation does not imply causation, but causation always implies correlation. It is essential to distinguish the terms in order to infer if causality exists when two variables correlate with each other, or if they are simply correlated without a cause-and-effect relationship. For example, if you optimized part of your app during the last month and at the same time a significant increase in your app downloads occurred, then you would like to know if that particular optimization brought more users, or if it was just a coincidence.
But how can you test your data and claim if causality exists when correlation incurs? Well, you may use correlational research designs to identify the correlation between variables, whereas you should use experimental designs (e.g. randomized and experimental studies, quasi-experimental studies, etc.) to test causation.
Also Published here