Free AI web copilot to create summaries, insights and extended knowledge, download it at here

Abstract

div><div id="1bc3"><pre> # Create sub plots fig = make_subplots(rows=5, cols=1, subplot_titles=[f'{", ".join(symbols)} Close Prices',
f'{", ".join(symbols)} Scatter Plot',
f'{", ".join(symbols)} Pearson Correlation',
f'{", ".join(symbols)} Spearman Correlation',
f'{", ".join(symbols)} Kendall Correlation'],
specs=[[{"secondary_y": True}],[{"secondary_y": True}],[{"secondary_y": True}],
[{"secondary_y": True}],[{"secondary_y": True}]],
vertical_spacing=0.1, shared_xaxes=False)</pre></div><div id="0c70"><pre> # Add legend with the support/resistance prices correlation_info = f"{symbols[0]} avg: {'{:.3f}'.format(stats['s1_close_avg'])}, {symbols[1]}: {'{:.3f}'.format(stats['s2_close_avg'])} " correlation_info += f"{symbols[0]} std: {'{:.3f}'.format(stats['s1_close_std'])}, {symbols[1]}: {'{:.3f}'.format(stats['s2_close_std'])} " pearson_close_corr = stats['s1_s2_corr_pearson']['Close_1'][8] spearman_close_corr = stats['s1_s2_corr_spearman']['Close_1'][8] kendall_close_corr = stats['s1_s2_corr_kendall']['Close_1'][8] correlation_info += f"Pearson Close Correlation: {'{:.3f}'.format(pearson_close_corr)} " correlation_info += f"Spearman Close Correlation: {'{:.3f}'.format(spearman_close_corr)} " correlation_info += f"Kendall Close Correlation: {'{:.3f}'.format(kendall_close_corr)} "</pre></div><div id="e55d"><pre> fig.add_annotation(text=correlation_info, align='left', showarrow=False, xref='paper', yref='paper', x=1.0, y=1.0, bordercolor='black', borderwidth=1, bgcolor='white')</pre></div><div id="8a53"><pre> # Prices fig.add_trace(go.Scatter(x=s1_s2_scaled_df.index, y=s1_s2_scaled_df['Close_1'], line=dict(color=light_palette["color_9"], width=1), name=f"{symbols[0]} Close"), row=1, col=1) fig.add_trace(go.Scatter(x=s1_s2_scaled_df.index, y=s1_s2_scaled_df['Close_2'], line=dict(color=light_palette["color_2"], width=1), name=f"{symbols[1]} Close"), row=1, col=1)</pre></div><div id="e1fd"><pre> # Scatter plot fig.add_trace(go.Scatter(x=s1_s2_scaled_df['Close_1'],y=s1_s2_scaled_df['Close_2'], mode='markers', marker=dict( color=light_palette["color_9"], showscale=False )), row=2, col=1)</pre></div><div id="82b9"><pre> # Pearson Heatmap fig.add_trace( go.Heatmap( showscale=False, showlegend=False, xgap=1, ygap=1, x=stats['s1_s2_corr_pearson'].columns, y=stats['s1_s2_corr_pearson'].index, z=np.array(stats['s1_s2_corr_pearson']) ), row=3, col=1)</pre></div><div id="dbf3"><pre> # Spearman Heatmap fig.add_trace( go.Heatmap( showscale=False, showlegend=False, xgap=1, ygap=1, x=stats['s1_s2_corr_spearman'].columns, y=stats['s1_s2_corr_spearman'].index, z=np.array(stats['s1_s2_corr_spearman'], ) ), row=4, col=1)</pre></div><div id="198a"><pre> # Kendall Heatmap fig.add_trace( go.Heatmap( showscale=False, showlegend=False, xgap=1, ygap=1, x=stats['s1_s2_corr_kendall'].columns, <span class="hljs-attr

Options

ibute">y=stats['s1_s2_corr_kendall'].index, z=np.array(stats['s1_s2_corr_kendall'], ) ), row=5, col=1) </pre></div><div id="6018"><pre> fig.update_layout( title={'text': '', 'x': 0.5}, font=dict(family="Verdana", size=12, color=palette["text_color"]), autosize=True, width=1280, height=1280, xaxis={"rangeslider": {"visible": False}}, plot_bgcolor=palette["plot_bg_color"], paper_bgcolor=palette["bg_color"]) fig.update_yaxes(visible=False, secondary_y=True) # Change grid color fig.update_xaxes(showline=True, linewidth=1, linecolor=palette["grid_color"], gridcolor=palette["grid_color"]) fig.update_yaxes(showline=True, linewidth=1, linecolor=palette["grid_color"], gridcolor=palette["grid_color"])</pre></div><div id="5500"><pre> return fig</pre></div>Finally, here is the main function, which performs the following steps:<ol><li>Download the price data for NASDAQ 100 index (^NDX) and Fastenal (FAST)</li><li>Scale the two data sets</li><li>Concatenate the data sets</li><li>Calculate the statistics and correlation coefficients</li><li>Plot the data</li><li>Start a dash server with the plots to create interactive charts.</li></ol><div id="889d"><pre>name == 'main': symbols = ['^NDX','FAST',]</pre></div><div id="743b"><pre> # Download data interval = "1m" period = "2d" s1_df = download_data(symbols[0], interval, period) s2_df = download_data(symbols[1], interval, period)</pre></div><div id="7e14"><pre> # Scale data s1_scaled_df = scale_data(s1_df, 1) s2_scaled_df = scale_data(s2_df, 2)</pre></div><div id="dea5"><pre> # Concatenate data s1_s2_scaled_df = pd.concat([s1_scaled_df, s2_scaled_df], axis=1, ignore_index=False)</pre></div><div id="85e5"><pre> # Calculate correlation stats stats = calculate_stats(s1_s2_scaled_df)</pre></div><div id="f0e4"><pre> # Plot the charts fig = plot_charts(symbols, s1_s2_scaled_df, stats)</pre></div><div id="0d6c"><pre> app = Dash() app.layout = html.Div(children=[ html.H1(children='Correlation Charts'), dcc.Graph( id='correlation-graphs', figure=fig) ]) app.run_server(debug=True)</pre></div>Open a browser and paste this URL into the address field: <a href="http://127.0.0.1:8050/">http://127.0.0.1:8050/</a>You should now be able to see the plots in your browser window.<h1 id="6e64">Results</h1><h2 id="4cdb">Visual Correlation</h2>The easiest way to check the correlation of price data sets is by plotting them out in a chart. By plotting them, we can immediately tell if their price movement are closely related, somewhat or not at all.In the graph below we plotted 1-minute prices of Fastenal Corporation (FAST) against the NASDAQ Composite Index. Fastenal is part of that index.The price data has been normalized to remove the difference in scale between the prices to be compared.Here you see that the Close price of Fastenal seems to match the Close price of the NASDAQ 100 index very closely.<figure id="ea93"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*sS9BhKX6qE3CxGbq.png"><figcaption></figcaption></figure><h1 id="4581">Scatterplots</h1>Another way to analyze linear correlation between two values is to plot them out in a scatter plot.When plotting the close prices of Fastenal and the NADAQ 100, we see a clear linear relationship and can imagine a that the dots align with a diagonal line drawn through the center of the dot distribution.<figure id="4b26"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*-tvD95x1EFBLjwCM.png"><figcaption></figcaption></figure><h2 id="a074">Pearson Correlation</h2>The <a href="https://en.wikipedia.org/wiki/Pearson_correlation_coefficient">Pearson correlation</a> — invented by Karl Pearson — is used to assess the quality of a linear relationship between two sets of data. It is calculated as the covariance of the two variables divided by the product of the standard deviation of each data set.The Pearson value I calculated for Fastenal is 0.783, so a very close positive correlation. This is also apparent in the heatmap below. The map has a light orange color where Close_1 (Fastenal Close) and Close_2 (NADAQ 100) intersect, which indicates a close positive relation.<figure id="6876"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*GJcWxQAz-PFVYhJ5.png"><figcaption></figcaption></figure><h2 id="9e28">Spearman Correlation</h2>The <a href="https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient">Spearman Correlation</a> is another measure of the relationship between two variables or data sets invented by Charles Spearman. It assesses how well the relationship between two variables can be described using a <a href="https://en.wikipedia.org/wiki/Monotonic_function">monotonic function</a>. A monotonic function is a function between dataset that preserves or reverses the given order.As with the Pearson correlation coefficient, the scores range between -1 and 1. The meaning of the range is the same as for the Pearson Correlation.For the Spearman coefficient I calculated for Fastenal was 0.73 so again a high positive correlation.<figure id="32dd"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*jYvL6riUVFOe83EY.png"><figcaption></figcaption></figure><h2 id="2783">Kendall Correlation</h2>A third method for assessing the relationship between variables is the<a href="https://en.wikipedia.org/wiki/Kendall_rank_correlation_coefficient"> Kendall Correlation rank coefficient</a>, named after Maurice Kendall. It is used to measure the ordinal association between two measured quantities.The Kendall coefficient for Fastenal is 0.54, which represents a positive correlation.<figure id="5611"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*GCZI3m03tLmp-GNf.png"><figcaption></figcaption></figure><h1 id="8b21">Wrapping Up</h1>In this post we looked at the different ways to assess correlation between a market index and stock prices and went over the steps how to perform this analysis in Python.I hope you found this post worth your time. Thanks for reading.You can support my writing for free <a href="https://medium.com/@chris_42047/membership">using this link</a>. Don’t miss another story — <a href="https://medium.com/subscribe/@chris_42047">subscribe to my stories by email</a>. For more premium content, check out my ‘<a href="https://algorithmictrading.substack.com/">B/O Trading Blog</a>’ on Substack.This post contains affiliate marketing links.Have a great day!</article></body>

Analyzing Correlation between Market Index and Stocks — Python Tutorial

Photo by Ilya Pavlow on Unsplash.com

I have recently posted an article on analyzing the correlation between a market index and different stocks.

In this post I’m going to show you step-by-step how to do this analysis in Python.

This story is solely for general information purposes, and should not be relied upon for trading recommendations or financial advice. Source code and information is provided for educational purposes only, and should not be relied upon to make an investment decision. Please review my full cautionary guidance before continuing.

Trade Ideas provides AI stock suggestions, AI alerts, scanning, automated trading, real-time stock market data, charting, educational resources, and more. Get a 15% discount with promo code ‘BOTRADING15’.

Implementation

You can download the complete script from my blog ‘B/O Trading Blog’.

Create a text file called ‘requirments.txt’ and paste the lines in below. Then run ‘pip install -r requirements.txt’.

pandas
pandas_ta
yfinance
numpy
plotly
sklearn
dash

Add the necessary Python imports:

import pandas as pd
import pandas_ta as ta
import yfinance as yf
import numpy as np
import math
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import os
from sklearn.preprocessing import MinMaxScaler
from dash import Dash, html, dcc

This function downloads the price data from Yahoo Finance and prepares it for processing.

def download_data(symbol, interval, period):
    data = yf.download(tickers=symbol, period=period, interval=interval)
    df = pd.DataFrame(data)
    df.dropna(inplace=True)
    df.reset_index(inplace=True)
    df = df.drop('Datetime', axis=1)
    df = df.drop('Adj Close', axis=1)

    return df

In this function we scale the data to values between 0 and 1 to be able to compare them. For this we are using MinMaxScaler from sklearn.

def scale_data(df, index):
    scaler = MinMaxScaler(feature_range=(0, 1))
    df[[f"Open_{index}", f"High_{index}",f"Low_{index}",f"Close_{index}",f"Volume_{index}"]] = \
        scaler.fit_transform(df[["Open", "High","Low","Close","Volume"]])

    #  Drop columns not scaled
    df = df.drop('Open', axis=1)
    df = df.drop('Close', axis=1)
    df = df.drop('High', axis=1)
    df = df.drop('Low', axis=1)
    df = df.drop('Volume', axis=1)
    return df

This function calculates the statistics and the correlation coefficients.

def calculate_stats(s1_s2_scaled_df):
    stats = {}
    stats['s1_close_avg'] = s1_s2_scaled_df['Close_1'].mean()
    stats['s2_close_avg'] = s1_s2_scaled_df['Close_2'].mean()

    stats['s1_close_std'] = s1_s2_scaled_df['Close_1'].std()
    stats['s2_close_std'] = s1_s2_scaled_df['Close_2'].std()

    stats['s1_s2_corr_pearson'] = s1_s2_scaled_df.corr(method="pearson")
    stats['s1_s2_corr_spearman'] = s1_s2_scaled_df.corr(method="spearman")
    stats['s1_s2_corr_kendall'] = s1_s2_scaled_df.corr(method="kendall")

    return stats

This function plots the visualization chart, scatterplot diagram and the heatmaps for the different coefficients.

def plot_charts(symbols, s1_s2_scaled_df, stats):
    light_palette = {}
    light_palette["bg_color"] = "#ffffff"
    light_palette["plot_bg_color"] = "#ffffff"
    light_palette["grid_color"] = "#e6e6e6"
    light_palette["text_color"] = "#2e2e2e"
    light_palette["dark_candle"] = "#4d98c4"
    light_palette["light_candle"] = "#b1b7ba"
    light_palette["volume_color"] = "#c74e96"
    light_palette["border_color"] = "#2e2e2e"
    light_palette["color_1"] = "#5c285b"
    light_palette["color_2"] = "#802c62"
    light_palette["color_3"] = "#a33262"
    light_palette["color_4"] = "#c43d5c"
    light_palette["color_5"] = "#de4f51"
    light_palette["color_6"] = "#f26841"
    light_palette["color_7"] = "#fd862b"
    light_palette["color_8"] = "#ffa600"
    light_palette["color_9"] = "#3295a8"
    palette = light_palette

    #  Create sub plots
    fig = make_subplots(rows=5, cols=1, subplot_titles=[f'{", ".join(symbols)} Close Prices', \
                                                        f'{", ".join(symbols)} Scatter Plot', \
                                                        f'{", ".join(symbols)} Pearson Correlation', \
                                                        f'{", ".join(symbols)} Spearman Correlation', \
                                                        f'{", ".join(symbols)} Kendall Correlation'], \
                        specs=[[{"secondary_y": True}],[{"secondary_y": True}],[{"secondary_y": True}],\
                               [{"secondary_y": True}],[{"secondary_y": True}]], \
                        vertical_spacing=0.1, shared_xaxes=False)

    #  Add legend with the support/resistance prices
    correlation_info = f"{symbols[0]} avg: {'{:.3f}'.format(stats['s1_close_avg'])}, {symbols[1]}: {'{:.3f}'.format(stats['s2_close_avg'])}<br>"
    correlation_info += f"{symbols[0]} std: {'{:.3f}'.format(stats['s1_close_std'])}, {symbols[1]}: {'{:.3f}'.format(stats['s2_close_std'])}<br>"
    pearson_close_corr = stats['s1_s2_corr_pearson']['Close_1'][8]
    spearman_close_corr = stats['s1_s2_corr_spearman']['Close_1'][8]
    kendall_close_corr = stats['s1_s2_corr_kendall']['Close_1'][8]
    correlation_info += f"Pearson Close Correlation: {'{:.3f}'.format(pearson_close_corr)}<br>"
    correlation_info += f"Spearman Close Correlation: {'{:.3f}'.format(spearman_close_corr)}<br>"
    correlation_info += f"Kendall Close Correlation: {'{:.3f}'.format(kendall_close_corr)}<br>"

    fig.add_annotation(text=correlation_info,
                       align='left',
                       showarrow=False,
                       xref='paper',
                       yref='paper',
                       x=1.0,
                       y=1.0,
                       bordercolor='black',
                       borderwidth=1,
                       bgcolor='white')

    #  Prices
    fig.add_trace(go.Scatter(x=s1_s2_scaled_df.index, y=s1_s2_scaled_df['Close_1'], line=dict(color=light_palette["color_9"], width=1), name=f"{symbols[0]} Close"),
                  row=1, col=1)
    fig.add_trace(go.Scatter(x=s1_s2_scaled_df.index, y=s1_s2_scaled_df['Close_2'], line=dict(color=light_palette["color_2"], width=1), name=f"{symbols[1]} Close"),
                  row=1, col=1)

    #  Scatter plot
    fig.add_trace(go.Scatter(x=s1_s2_scaled_df['Close_1'],y=s1_s2_scaled_df['Close_2'], mode='markers', marker=dict(
        color=light_palette["color_9"],
        showscale=False
    )), row=2, col=1)

    #  Pearson Heatmap
    fig.add_trace(
        go.Heatmap(
            showscale=False,
            showlegend=False,
            xgap=1,
            ygap=1,
            x=stats['s1_s2_corr_pearson'].columns,
            y=stats['s1_s2_corr_pearson'].index,
            z=np.array(stats['s1_s2_corr_pearson'])
        ), row=3, col=1)

    #  Spearman Heatmap
    fig.add_trace(
        go.Heatmap(
            showscale=False,
            showlegend=False,
            xgap=1,
            ygap=1,
            x=stats['s1_s2_corr_spearman'].columns,
            y=stats['s1_s2_corr_spearman'].index,
            z=np.array(stats['s1_s2_corr_spearman'],
                       )
        ), row=4, col=1)

    #  Kendall Heatmap
    fig.add_trace(
        go.Heatmap(
            showscale=False,
            showlegend=False,
            xgap=1,
            ygap=1,
            x=stats['s1_s2_corr_kendall'].columns,
            y=stats['s1_s2_corr_kendall'].index,
            z=np.array(stats['s1_s2_corr_kendall'],
                       )
        ), row=5, col=1)

    fig.update_layout(
        title={'text': '', 'x': 0.5},
        font=dict(family="Verdana", size=12, color=palette["text_color"]),
        autosize=True,
        width=1280, height=1280,
        xaxis={"rangeslider": {"visible": False}},
        plot_bgcolor=palette["plot_bg_color"],
        paper_bgcolor=palette["bg_color"])
    fig.update_yaxes(visible=False, secondary_y=True)
    #  Change grid color
    fig.update_xaxes(showline=True, linewidth=1, linecolor=palette["grid_color"], gridcolor=palette["grid_color"])
    fig.update_yaxes(showline=True, linewidth=1, linecolor=palette["grid_color"], gridcolor=palette["grid_color"])

    return fig

Finally, here is the main function, which performs the following steps:

Download the price data for NASDAQ 100 index (^NDX) and Fastenal (FAST)
Scale the two data sets
Concatenate the data sets
Calculate the statistics and correlation coefficients
Plot the data
Start a dash server with the plots to create interactive charts.

__name__ == '__main__':
    symbols = ['^NDX','FAST',]

    #  Download data
    interval = "1m"
    period = "2d"
    s1_df = download_data(symbols[0], interval, period)
    s2_df = download_data(symbols[1], interval, period)

    #  Scale data
    s1_scaled_df = scale_data(s1_df, 1)
    s2_scaled_df = scale_data(s2_df, 2)

    #  Concatenate data
    s1_s2_scaled_df = pd.concat([s1_scaled_df, s2_scaled_df], axis=1, ignore_index=False)

    #  Calculate correlation stats
    stats = calculate_stats(s1_s2_scaled_df)

    #  Plot the charts
    fig = plot_charts(symbols, s1_s2_scaled_df, stats)

    app = Dash()
    app.layout = html.Div(children=[
        html.H1(children='Correlation Charts'),
        dcc.Graph(
            id='correlation-graphs',
            figure=fig)
    ])
    app.run_server(debug=True)

Open a browser and paste this URL into the address field: http://127.0.0.1:8050/

You should now be able to see the plots in your browser window.

Results

Visual Correlation

The easiest way to check the correlation of price data sets is by plotting them out in a chart. By plotting them, we can immediately tell if their price movement are closely related, somewhat or not at all.

In the graph below we plotted 1-minute prices of Fastenal Corporation (FAST) against the NASDAQ Composite Index. Fastenal is part of that index.

The price data has been normalized to remove the difference in scale between the prices to be compared.

Here you see that the Close price of Fastenal seems to match the Close price of the NASDAQ 100 index very closely.

Scatterplots

Another way to analyze linear correlation between two values is to plot them out in a scatter plot.

When plotting the close prices of Fastenal and the NADAQ 100, we see a clear linear relationship and can imagine a that the dots align with a diagonal line drawn through the center of the dot distribution.

Pearson Correlation

The Pearson correlation — invented by Karl Pearson — is used to assess the quality of a linear relationship between two sets of data. It is calculated as the covariance of the two variables divided by the product of the standard deviation of each data set.

The Pearson value I calculated for Fastenal is 0.783, so a very close positive correlation. This is also apparent in the heatmap below. The map has a light orange color where Close_1 (Fastenal Close) and Close_2 (NADAQ 100) intersect, which indicates a close positive relation.

Spearman Correlation

The Spearman Correlation is another measure of the relationship between two variables or data sets invented by Charles Spearman. It assesses how well the relationship between two variables can be described using a monotonic function. A monotonic function is a function between dataset that preserves or reverses the given order.

As with the Pearson correlation coefficient, the scores range between -1 and 1. The meaning of the range is the same as for the Pearson Correlation.

For the Spearman coefficient I calculated for Fastenal was 0.73 so again a high positive correlation.

Kendall Correlation

A third method for assessing the relationship between variables is the Kendall Correlation rank coefficient, named after Maurice Kendall. It is used to measure the ordinal association between two measured quantities.

The Kendall coefficient for Fastenal is 0.54, which represents a positive correlation.

Wrapping Up

In this post we looked at the different ways to assess correlation between a market index and stock prices and went over the steps how to perform this analysis in Python.

I hope you found this post worth your time. Thanks for reading.

You can support my writing for free using this link. Don’t miss another story — subscribe to my stories by email. For more premium content, check out my ‘B/O Trading Blog’ on Substack.

This post contains affiliate marketing links.

Have a great day!