Analyzing Correlation between Market Index and Stocks — Python Tutorial
Photo by Ilya Pavlow on Unsplash.com
I have recently posted an article on analyzing the correlation between a market index and different stocks.
In this post I’m going to show you step-by-step how to do this analysis in Python.
This story is solely for general information purposes, and should not be relied upon for trading recommendations or financial advice. Source code and information is provided for educational purposes only, and should not be relied upon to make an investment decision. Please review my full cautionary guidance before continuing.
Trade Ideas provides AI stock suggestions, AI alerts, scanning, automated trading, real-time stock market data, charting, educational resources, and more. Get a 15% discount with promo code ‘BOTRADING15’.
Implementation
You can download the complete script from my blog ‘B/O Trading Blog’.
Create a text file called ‘requirments.txt’ and paste the lines in below. Then run ‘pip install -r requirements.txt’.
pandas
pandas_ta
yfinance
numpy
plotly
sklearn
dashAdd the necessary Python imports:
import pandas as pd
import pandas_ta as ta
import yfinance as yf
import numpy as np
import math
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import os
from sklearn.preprocessing import MinMaxScaler
from dash import Dash, html, dccThis function downloads the price data from Yahoo Finance and prepares it for processing.
def download_data(symbol, interval, period):
data = yf.download(tickers=symbol, period=period, interval=interval)
df = pd.DataFrame(data)
df.dropna(inplace=True)
df.reset_index(inplace=True)
df = df.drop('Datetime', axis=1)
df = df.drop('Adj Close', axis=1) return dfIn this function we scale the data to values between 0 and 1 to be able to compare them. For this we are using MinMaxScaler from sklearn.
def scale_data(df, index):
scaler = MinMaxScaler(feature_range=(0, 1))
df[[f"Open_{index}", f"High_{index}",f"Low_{index}",f"Close_{index}",f"Volume_{index}"]] = \
scaler.fit_transform(df[["Open", "High","Low","Close","Volume"]]) # Drop columns not scaled
df = df.drop('Open', axis=1)
df = df.drop('Close', axis=1)
df = df.drop('High', axis=1)
df = df.drop('Low', axis=1)
df = df.drop('Volume', axis=1)
return dfThis function calculates the statistics and the correlation coefficients.
def calculate_stats(s1_s2_scaled_df):
stats = {}
stats['s1_close_avg'] = s1_s2_scaled_df['Close_1'].mean()
stats['s2_close_avg'] = s1_s2_scaled_df['Close_2'].mean() stats['s1_close_std'] = s1_s2_scaled_df['Close_1'].std()
stats['s2_close_std'] = s1_s2_scaled_df['Close_2'].std() stats['s1_s2_corr_pearson'] = s1_s2_scaled_df.corr(method="pearson")
stats['s1_s2_corr_spearman'] = s1_s2_scaled_df.corr(method="spearman")
stats['s1_s2_corr_kendall'] = s1_s2_scaled_df.corr(method="kendall") return statsThis function plots the visualization chart, scatterplot diagram and the heatmaps for the different coefficients.
def plot_charts(symbols, s1_s2_scaled_df, stats):
light_palette = {}
light_palette["bg_color"] = "#ffffff"
light_palette["plot_bg_color"] = "#ffffff"
light_palette["grid_color"] = "#e6e6e6"
light_palette["text_color"] = "#2e2e2e"
light_palette["dark_candle"] = "#4d98c4"
light_palette["light_candle"] = "#b1b7ba"
light_palette["volume_color"] = "#c74e96"
light_palette["border_color"] = "#2e2e2e"
light_palette["color_1"] = "#5c285b"
light_palette["color_2"] = "#802c62"
light_palette["color_3"] = "#a33262"
light_palette["color_4"] = "#c43d5c"
light_palette["color_5"] = "#de4f51"
light_palette["color_6"] = "#f26841"
light_palette["color_7"] = "#fd862b"
light_palette["color_8"] = "#ffa600"
light_palette["color_9"] = "#3295a8"
palette = light_palette # Create sub plots
fig = make_subplots(rows=5, cols=1, subplot_titles=[f'{", ".join(symbols)} Close Prices', \
f'{", ".join(symbols)} Scatter Plot', \
f'{", ".join(symbols)} Pearson Correlation', \
f'{", ".join(symbols)} Spearman Correlation', \
f'{", ".join(symbols)} Kendall Correlation'], \
specs=[[{"secondary_y": True}],[{"secondary_y": True}],[{"secondary_y": True}],\
[{"secondary_y": True}],[{"secondary_y": True}]], \
vertical_spacing=0.1, shared_xaxes=False) # Add legend with the support/resistance prices
correlation_info = f"{symbols[0]} avg: {'{:.3f}'.format(stats['s1_close_avg'])}, {symbols[1]}: {'{:.3f}'.format(stats['s2_close_avg'])}<br>"
correlation_info += f"{symbols[0]} std: {'{:.3f}'.format(stats['s1_close_std'])}, {symbols[1]}: {'{:.3f}'.format(stats['s2_close_std'])}<br>"
pearson_close_corr = stats['s1_s2_corr_pearson']['Close_1'][8]
spearman_close_corr = stats['s1_s2_corr_spearman']['Close_1'][8]
kendall_close_corr = stats['s1_s2_corr_kendall']['Close_1'][8]
correlation_info += f"Pearson Close Correlation: {'{:.3f}'.format(pearson_close_corr)}<br>"
correlation_info += f"Spearman Close Correlation: {'{:.3f}'.format(spearman_close_corr)}<br>"
correlation_info += f"Kendall Close Correlation: {'{:.3f}'.format(kendall_close_corr)}<br>" fig.add_annotation(text=correlation_info,
align='left',
showarrow=False,
xref='paper',
yref='paper',
x=1.0,
y=1.0,
bordercolor='black',
borderwidth=1,
bgcolor='white') # Prices
fig.add_trace(go.Scatter(x=s1_s2_scaled_df.index, y=s1_s2_scaled_df['Close_1'], line=dict(color=light_palette["color_9"], width=1), name=f"{symbols[0]} Close"),
row=1, col=1)
fig.add_trace(go.Scatter(x=s1_s2_scaled_df.index, y=s1_s2_scaled_df['Close_2'], line=dict(color=light_palette["color_2"], width=1), name=f"{symbols[1]} Close"),
row=1, col=1) # Scatter plot
fig.add_trace(go.Scatter(x=s1_s2_scaled_df['Close_1'],y=s1_s2_scaled_df['Close_2'], mode='markers', marker=dict(
color=light_palette["color_9"],
showscale=False
)), row=2, col=1) # Pearson Heatmap
fig.add_trace(
go.Heatmap(
showscale=False,
showlegend=False,
xgap=1,
ygap=1,
x=stats['s1_s2_corr_pearson'].columns,
y=stats['s1_s2_corr_pearson'].index,
z=np.array(stats['s1_s2_corr_pearson'])
), row=3, col=1) # Spearman Heatmap
fig.add_trace(
go.Heatmap(
showscale=False,
showlegend=False,
xgap=1,
ygap=1,
x=stats['s1_s2_corr_spearman'].columns,
y=stats['s1_s2_corr_spearman'].index,
z=np.array(stats['s1_s2_corr_spearman'],
)
), row=4, col=1) # Kendall Heatmap
fig.add_trace(
go.Heatmap(
showscale=False,
showlegend=False,
xgap=1,
ygap=1,
x=stats['s1_s2_corr_kendall'].columns,
y=stats['s1_s2_corr_kendall'].index,
z=np.array(stats['s1_s2_corr_kendall'],
)
), row=5, col=1)
fig.update_layout(
title={'text': '', 'x': 0.5},
font=dict(family="Verdana", size=12, color=palette["text_color"]),
autosize=True,
width=1280, height=1280,
xaxis={"rangeslider": {"visible": False}},
plot_bgcolor=palette["plot_bg_color"],
paper_bgcolor=palette["bg_color"])
fig.update_yaxes(visible=False, secondary_y=True)
# Change grid color
fig.update_xaxes(showline=True, linewidth=1, linecolor=palette["grid_color"], gridcolor=palette["grid_color"])
fig.update_yaxes(showline=True, linewidth=1, linecolor=palette["grid_color"], gridcolor=palette["grid_color"]) return figFinally, here is the main function, which performs the following steps:
- Download the price data for NASDAQ 100 index (^NDX) and Fastenal (FAST)
- Scale the two data sets
- Concatenate the data sets
- Calculate the statistics and correlation coefficients
- Plot the data
- Start a dash server with the plots to create interactive charts.
__name__ == '__main__':
symbols = ['^NDX','FAST',] # Download data
interval = "1m"
period = "2d"
s1_df = download_data(symbols[0], interval, period)
s2_df = download_data(symbols[1], interval, period) # Scale data
s1_scaled_df = scale_data(s1_df, 1)
s2_scaled_df = scale_data(s2_df, 2) # Concatenate data
s1_s2_scaled_df = pd.concat([s1_scaled_df, s2_scaled_df], axis=1, ignore_index=False) # Calculate correlation stats
stats = calculate_stats(s1_s2_scaled_df) # Plot the charts
fig = plot_charts(symbols, s1_s2_scaled_df, stats) app = Dash()
app.layout = html.Div(children=[
html.H1(children='Correlation Charts'),
dcc.Graph(
id='correlation-graphs',
figure=fig)
])
app.run_server(debug=True)Open a browser and paste this URL into the address field: http://127.0.0.1:8050/
You should now be able to see the plots in your browser window.
Results
Visual Correlation
The easiest way to check the correlation of price data sets is by plotting them out in a chart. By plotting them, we can immediately tell if their price movement are closely related, somewhat or not at all.
In the graph below we plotted 1-minute prices of Fastenal Corporation (FAST) against the NASDAQ Composite Index. Fastenal is part of that index.
The price data has been normalized to remove the difference in scale between the prices to be compared.
Here you see that the Close price of Fastenal seems to match the Close price of the NASDAQ 100 index very closely.

Scatterplots
Another way to analyze linear correlation between two values is to plot them out in a scatter plot.
When plotting the close prices of Fastenal and the NADAQ 100, we see a clear linear relationship and can imagine a that the dots align with a diagonal line drawn through the center of the dot distribution.

Pearson Correlation
The Pearson correlation — invented by Karl Pearson — is used to assess the quality of a linear relationship between two sets of data. It is calculated as the covariance of the two variables divided by the product of the standard deviation of each data set.
The Pearson value I calculated for Fastenal is 0.783, so a very close positive correlation. This is also apparent in the heatmap below. The map has a light orange color where Close_1 (Fastenal Close) and Close_2 (NADAQ 100) intersect, which indicates a close positive relation.

Spearman Correlation
The Spearman Correlation is another measure of the relationship between two variables or data sets invented by Charles Spearman. It assesses how well the relationship between two variables can be described using a monotonic function. A monotonic function is a function between dataset that preserves or reverses the given order.
As with the Pearson correlation coefficient, the scores range between -1 and 1. The meaning of the range is the same as for the Pearson Correlation.
For the Spearman coefficient I calculated for Fastenal was 0.73 so again a high positive correlation.

Kendall Correlation
A third method for assessing the relationship between variables is the Kendall Correlation rank coefficient, named after Maurice Kendall. It is used to measure the ordinal association between two measured quantities.
The Kendall coefficient for Fastenal is 0.54, which represents a positive correlation.

Wrapping Up
In this post we looked at the different ways to assess correlation between a market index and stock prices and went over the steps how to perform this analysis in Python.
I hope you found this post worth your time. Thanks for reading.
You can support my writing for free using this link. Don’t miss another story — subscribe to my stories by email. For more premium content, check out my ‘B/O Trading Blog’ on Substack.
This post contains affiliate marketing links.
Have a great day!
