Variance vs Covariance vs Correlation: What is the Difference?
Twins from Different Universes
Variance, Covariance, and Correlation are common terms used in statistics. They are often used in the same context with different purposes. In this post, we will explore what are they and how they are different from each other.
What is Variance?
Variance measures the variability, which is defined as the spread of data from the average in a given dataset.
We can use the following formulas to compute the variance.

- N = The number of observations in the population
- n = The number of observations in the sample
- Xi = ith observation in the data
- μ = The population mean
- x̄ = The sample mean
We use “n-1” instead of “n” (aka, Bessel’s correction) to correct bias in the sample variance.
Variance can only have a positive value. The higher the variance is, the larger variability of data values in a dataset.
What is Covariance?
Covariance measures how the two variables are varying together and the degree to which the deviation of one variable (X) from its mean is related to the deviation of another variable (Y) from its mean.
We can use the following formulas to compute the covariance.

Unlike variance, which can only have a positive value, covariance could have both positive and negative values. The value of Covariance lies in the range of -∞ and +∞.
What does the sign of a covariance indicate?
Positive covariance indicates the two variables (X and Y), on average, move in the same direction. The greater value of X corresponds with the greater value of Y. When X is greater than its mean, Y is likely greater than its mean. Similarly, when X is less than its mean, Y is likely less than its mean.
Negative covariance indicates the two variables (X and Y), on average, move in the opposite direction. The greater value of X corresponds with the less value of Y. When X is greater than its mean, Y is likely less than its mean. Similarly, when X is less than its mean, Y is likely greater than its mean.
Zero covariance indicates there is no relationship between the two variables (X and Y)
How to Create a Variance-Covariance Matrix?
Variance and covariance usually appear together in a Variance-Covariance Matrix. The variance-Covariance matrix is constructed as a symmetric matrix where the diagonal elements are variances and the off-diagonal elements are covariance.
Suppose we have a matrix X with a dimension of “n x k”. This matrix includes n observations of k variables (i.e., X1, X2, X3, …, Xk).

we can define means of the these k variables in the following matrix

Then we subtract each column by its mean in matrix X to create the de-meaned version of X, Xc.

Lastly, we compute the cross product of the transpose of Xc and Xc and divide it by n. Then we have the variance-covariance matrix shown in the following format.

What is Correlation?
While Covariance measures how the two variables are varying together, Correlation (or Correlation Coefficient) indicates how strongly the two variables are related to each other and measures both the direction and strength of the relationship.
We can use the following formulas to compute the correlation.

The value of Correlation lies in the range of -1 and +1.
Positive correlation indicates when one variable increases, the other variable will also increase. When the correlation value is closer to 1, it means the two variables are more likely moving by the exact same percentage and direction.
Negative correlation indicates when one variable increases, the other variable will decrease. When the correlation value is closer to -1, it means the two variables are more likely moving by the exact same percentage but in the opposite direction.
Zero correlation indicates there is no relationship between the two variables (X and Y).
How to Create a Correlation Matrix?
The correlation matrix is a symmetric matrix where the diagonal elements are 1 and the off-diagonal elements are pairwise correlations.
Let’s first construct matrix D.

Then we subtract each column by its mean and divide by its standard deviation in matrix X to create a matrix, Xs.

Lastly, we compute the cross product of the transpose of Xs and Xs and divide it by n. Then we have the correlation matrix shown in the following format.

What is the difference between Covariance and Correlation?
Although both covariance and correlation measure how a change in one variable reflects in another variable, correlation is preferred over covariance for the following reasons.
- Measurement units: Correlation is a unit-free measure that takes a value between -1 and 1. This makes it easier to interpret than covariance.
- Change in scale: Covariance will be affected by scaling the variables. For example, if we multiply one variable by a constant value and multiply another variable by a different constant value, then the covariance will change. However, correlation will not change in this case.
Summary

If you would like to explore more posts related to Statistics, please check out my articles:
- 7 Most Asked Questions on Central Limit Theorem
- Standard Deviation vs Standard Error: What’s the Difference?
- 3 Most Common Misinterpretations: Hypothesis Testing, Confidence Interval, P-Value
- Are the Error Terms Normally Distributed in a Linear Regression Model?
- Are the OLS Estimators Normally Distributed in a Linear Regression Model?
- What is Regularization: Bias-Variance Tradeoff
- Variance vs Covariance vs Correlation: What is the Difference?
- Confidence Interval vs Prediction Interval: What is the Difference?
- Which is Worse, Type I or Type II errors?
Thank you for reading !!!
If you enjoy this article and would like to Buy Me a Coffee, please click here.
You can sign up for a membership to unlock full access to my articles, and have unlimited access to everything on Medium. Please subscribe if you’d like to get an email notification whenever I post a new article.