How to pick uncorrelated stocks for an investment portfolio in Python
A simple Python code to pick the lowest correlated stocks from S&P 500
Portfolio investing is a fascinating kind of investment that can potentially lead to satisfactory returns. According to Modern Portfolio Theory, it’s always a good idea to select stocks or ETFs that show a low correlation.
Let’s see why and how to select stocks measuring their correlation in Python.
Note from Towards Data Science’s editors: While we allow independent authors to publish articles in accordance with our rules and guidelines, we do not endorse each author’s contribution. You should not rely on an author’s works without seeking professional advice. See our Reader Terms for details.
What is linear correlation?
Correlation between stocks is a measure of how the returns of a stock interfere with the returns of another one. If two stocks are highly correlated, they will likely move in the same direction (i.e. if a stock price rises, the other stock price rises too) or in the opposite direction.
Mathematically speaking, the linear correlation index between two stocks, say a and b, is defined as:
where the numerator is the covariance between the stocks and denominator is the product of the standard deviations.
This number spans from -1 to 1. If it’s -1, the stocks move in the opposite directions (i.e. if one stock rises, the other stock goes down), if it’s equal to 1, the stock move perfectly in the same direction. If it’s equal to 0, the stocks are uncorrelated and their movements are independent of each other. We must look for these uncorrelated stocks.
Why should we select uncorrelated stocks?
If we build a portfolio made by some stocks and their weights are x, the variance of the portfolio is:
So, as long as the correlation between stocks is positive, the variance of our portfolio increases with respect to the sum of the variances and so does its risk. Some may argue that we would like stocks that are negatively correlated, but in this case, there wouldn’t be any return, because if a stock rises, the other stock falls and the net return is 0.
So, the idea is to keep our stocks uncorrelated in order to remove the second term and avoid a higher variance. That is the purpose of a branch of Modern Portfolio Theory and there are mathematical tools that allow us to optimize variance according to the weights x. For this article, we are going to focus on selecting those stocks that show an absolute value of the linear correlation nearly equal to 0.
The code
In this part of the article, we are going to work with S&P 500 stocks and we are going to find the couples of stocks that are less correlated to each other. Everything can be found in my GitHub repository: https://github.com/gianlucamalato/machinelearning/blob/master/Stocks_correlation.ipynb
We’re going to use pandas library to perform all the calculations, while the list of S&P constituents will be downloaded from this GitHub repository: https://github.com/datasets/s-and-p-500-companies/tree/master/data
First of all, let’s import some useful libraries:







