Using Probabilistic Machine Learning to improve your Stock Trading

Note from Towards Data Science’s editors: While we allow independent authors to publish articles in accordance with our rules and guidelines, we do not endorse each author’s contribution. You should not rely on an author’s works without seeking professional advice. See our Reader Terms for details.

Probabilistic Machine Learning comes hand in hand with Stock Trading: Probabilistic Machine Learning uses past instances to predict probabilities of certain events happening in future instances. This can be directly applied to stock trading, to predict future stock prices.

The Concept:

This program will use Gaussian Naive Bayes to classify data into increasing stock price, or decreasing stock price.

Because of the volatility of the stocks, I will not be using the closing price of the stock to predict it, but rather be using the ratio between the past and current closing prices. To understand how the program works, we must first understand the underling algorithm at play:

What is Gaussian Naive Bayes Classifier?

Gaussian Naive Bayes is an algorithm that classifies data by extrapolating data using Gaussian Distribution (identical to Normal Distribution) as well as Bayes theorem.

Advantages:

Works on small datasets

Unlike traditional neural networks in which each neuron was directly connected to every other neuron, the probabilities are assumed to be independent.

Not computationally intensive

Since the Naive Bayes Classifier is deterministic, The parameters for the Naive Bayes Classifier does not change every iteration, unlike the weights that power a Neural Network. This makes the algorithm much less computationally intensive.

Disadvantages:

Fails at learning Big Data

The complex mapping of a Neural Network outmatches the simple architecture of the Naive Bayes Algorithm when the data is enough to optimize all the parameters.

The Code:

With a better understanding of how the Gaussian Naive Bayes algorithm works, let’s get to the program:

Step 1| Prerequisites:

import yfinance
from scipy import stats

aapl = yfinance.download('AAPL','2016-1-1','2020-1-1')

These are the two libraries that I will use for the project: yfinance is for downloading stock data and scipy is to create gaussian distributions.

I downloaded Apple stock data, from 2016 to 2020, for reproducible results.

Step 2| Converting to Gaussian Distributions:

def calculate_prereq(values):
    std = np.std(values)
    mean = np.mean(values)
    return std,mean

def calculate_distribution(mean,std):
    norm = stats.norm(mean, std)
    return norm

def extrapolate(norm,x):
    return norm.pdf(x)

def values_to_norm(dicts):
    for dictionary in dicts:
        for term in dictionary:
            std,mean = calculate_prereq(dictionary[term])
            norm = calculate_distribution(mean,std)
            dictionary[term] = norm
    return dicts

The “calculate_prereq” function helps to calculate the standard deviation and the mean: The two things needed to create a Gaussian distribution.

I would make the function to create a Gaussian distribution from scratch, but scipy’s functions have been highly optimized and would therefore work better on datasets with more features.

Gaussian distributions are approximations of general probabilistic data. Take the example of the IQ test spectrum. Most people have an average IQ score of 100. Therefore, the peak of the Gaussian distribution would be at 100. On both ends of the spectrum, the number of people getting extremely low and extremely high scores decrease as the scores become more extreme. With a Gaussian distribution, one can extrapolate a probability of a person getting a certain value and therefore gain insight on it.

Step 3| Compare Possibilities:

def compare_possibilities(dicts,x):
    probabilities = []
    for dictionary in dicts:
        dict_probs = []
        for i in range(len(x)):
            value = x[i]
            dict_probs.append(extrapolate(dictionary[i],value))
        probabilities.append(np.prod(dict_probs))
    return probabilities.index(max(probabilities))

This function simply runs through the dictionaries (the different classes) and calculates the probability of the price increasing or dropping, given the ratios between the price of the last ten days. It then returns an index in the list of dictionaries the class that the Bayes Classifier calculates to have the highest probability.

Step 4| Run the Program:

drop = {}
increase = {}
for day in range(10,len(aapl)-1):
    previous_close = aapl['Close'][day-10:day]
    ratios = []
    for i in range(1,len(previous_close)):
        ratios.append(previous_close[i]/previous_close[i-1])
    if aapl['Close'][day+1] > aapl['Close'][day]:
        for i in range(len(ratios)):
            if i in increase:
                increase[i] += (ratios[i],)
            else:
                increase[i] = ()
    elif aapl['Close'][day+1] < aapl['Close'][day]:
        for i in range(len(ratios)):
            if i in drop:
                drop[i] += (ratios[i],)
            else:
                drop[i] = ()
                
new_close = aapl['Close'][-11:-1]
ratios = []
for i in range(1,len(new_close)):
    ratios.append(new_close[i]/new_close[i-1])
for i in range(len(ratios)):
    if i in increase:
        increase[i] += (ratios[i],)
    else:
        increase[i] = ()
            
X = ratios
print(X)
dicts = [increase,drop]
dicts = values_to_norm(dicts)
compare_possibilities(dicts,X)

This last part runs all the functions together, and gathers the 9 ratios for the stock price in the last 10 days. It then executes the program and returns if the price will increase, or drop. The value it returns is the index of the dictionary in the list dicts. If it is 1, the price is predicted to drop. If it is 0, the price is predicted to increase.

Conclusion:

This program is just the basic framework of a Gaussian Naive Bayes algorithm. Here are a few ways that you can improve my program:

Increase the number of features

You can include features such as volume and opening price, to increase the scope of the data. However, an overload of data could cause Gaussian Naive Bayes to be less effective, as it does not perform well with big data.

Link to Alpaca API

The alpaca API is a great platform to test trading strategies. Try linking this program to make buy or sell trades, based on the predictions of the model!

My links:

If you want to see more of my content, click this link.

Summarize