avatarSerafeim Loukas, PhD

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

4938

Abstract

atsmodels<i>, Scikit-Learn</i></p><p id="5fdf"><b>4.1. Load & inspect the data</b></p><p id="4ae6">Our <b>imports</b>:</p><div id="73db"><pre><span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np <span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd <span class="hljs-keyword">import</span> matplotlib.pyplot <span class="hljs-keyword">as</span> plt <span class="hljs-title">from</span> pandas.plotting <span class="hljs-keyword">import</span> lag_plot <span class="hljs-title">from</span> pandas <span class="hljs-keyword">import</span> datetime <span class="hljs-title">from</span> statsmodels.tsa.arima_model <span class="hljs-keyword">import</span> ARIMA <span class="hljs-title">from</span> sklearn.metrics <span class="hljs-keyword">import</span> mean_squared_error</pre></div><p id="75aa">Now let’s <b>load</b> the TESLA stock history <b>data</b>:</p><div id="1d07"><pre><span class="hljs-attribute">df</span> <span class="hljs-operator">=</span> pd.read_csv(<span class="hljs-string">"TSLA.csv"</span>) df.head(<span class="hljs-number">5</span>)</pre></div><figure id="a788"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*Mz1MFVelLRXGUMMZI1SCDA.png"><figcaption></figcaption></figure><ul><li>Our target variable will be the <b>Close</b> value.</li></ul><h2 id="4ac0">Before building the ARIMA model, let’s see if there is some cross-correlation in out data.</h2><div id="3e5c"><pre>plt<span class="hljs-selector-class">.figure</span>() <span class="hljs-function"><span class="hljs-title">lag_plot</span><span class="hljs-params">(df[<span class="hljs-string">'Open'</span>], lag=<span class="hljs-number">3</span>)</span></span> plt<span class="hljs-selector-class">.title</span>(<span class="hljs-string">'TESLA Stock - Autocorrelation plot with lag = 3'</span>) plt<span class="hljs-selector-class">.show</span>()</pre></div><figure id="3517"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*mWYgxguPhVFPkjZBG1GvNQ.png"><figcaption>TESLA Stock — Autocorrelation plot with lag = 3</figcaption></figure><p id="3206">We can now confirm that <b>ARIMA</b> is going to be a good model to be applied to this type of data (there is auto-correlation in the data).</p><p id="520d">Finally, let’s <b>plot the stock price evolution over time.</b></p><div id="762b"><pre>plt<span class="hljs-selector-class">.plot</span>(df<span class="hljs-selector-attr">[<span class="hljs-string">"Date"</span>]</span>, df<span class="hljs-selector-attr">[<span class="hljs-string">"Close"</span>]</span>) plt<span class="hljs-selector-class">.xticks</span>(np<span class="hljs-selector-class">.arange</span>(<span class="hljs-number">0</span>,<span class="hljs-number">1259</span>, <span class="hljs-number">200</span>), df<span class="hljs-selector-attr">[<span class="hljs-string">'Date'</span>]</span><span class="hljs-selector-attr">[0:1259:200]</span>) plt<span class="hljs-selector-class">.title</span>(<span class="hljs-string">"TESLA stock price over time"</span>) plt<span class="hljs-selector-class">.xlabel</span>(<span class="hljs-string">"time"</span>) plt<span class="hljs-selector-class">.ylabel</span>(<span class="hljs-string">"price"</span>) plt<span class="hljs-selector-class">.show</span>()</pre></div><figure id="6dc4"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*-MWDEvYgLLdomHL7KmHCzQ.png"><figcaption></figcaption></figure><p id="0049"><b>4.2. Build the predictive ARIMA model</b></p><p id="958c">Next, let’s <b>divide</b> the data into a <b>training </b>(70 % ) and <b>test </b>(30%) set. For this tutorial we select the following <b>ARIMA parameters: p=4, d=1 and q=0.</b></p><div id="166b"><pre><span class="hljs-attribute">train_data</span>, test_data = df[<span class="hljs-number">0</span>:int(len(df)<span class="hljs-number">0</span>.<span class="hljs-number">7</span>)], df[int(len(df)<span class="hljs-number">0</span>.<span class="hljs-number">7</span>):]</pre></div><div id="32a7"><pre><span class="hljs-attr">training_data</span> = train_data[<span class="hljs-string">'Close'</span>].values <span class="hljs-attr">test_data</span> = test_data[<span class="hljs-string">'Close'</span>].values</pre></div><div id="48d3"><pre><span class="hljs-attr">history</span> = [x for x in training_data] <span class="hljs-attr">model_predictions</span> = [] <span class="hljs-attr">N_test_observations</span> = len(test_data)</pre></div><div id="dba2"><pre><span class="hljs-keyword">for</span> time_point <span class="hljs-keyword">in</span> <span class="hljs-built_in">range</span>(N_test_observations): model = <span class="hljs-built_in">ARIMA</span>(history, <span class="hljs-attribute">order</span>=(<span class="hljs-number">4</span>,<span class="hljs-number">1</span>,<span class="hljs-number">0</span>)) model_fit = model<span class="hljs-selector-class">.fit</span>(disp=<sp

Options

an class="hljs-number">0</span>) output = model_fit<span class="hljs-selector-class">.forecast</span>() yhat = output<span class="hljs-selector-attr">[0]</span> model_predictions<span class="hljs-selector-class">.append</span>(yhat) true_test_value = test_data<span class="hljs-selector-attr">[time_point]</span> history<span class="hljs-selector-class">.append</span>(true_test_value)</pre></div><div id="f2d8"><pre>MSE_error = <span class="hljs-built_in">mean_squared_error</span>(test_data, model_predictions) <span class="hljs-function"><span class="hljs-title">print</span><span class="hljs-params">(<span class="hljs-string">'Testing Mean Squared Error is {}'</span>.format(MSE_error)</span></span>)</pre></div><h2 id="c999">Summary of the code</h2><ul><li>We split the training dataset into train and test sets and we use the train set to fit the model, and generate a prediction <b>for each element on the test set</b>.</li><li><b>A rolling forecasting procedure is required given the dependence on observations in prior time steps for differencing and the AR model. To this end, we re-create the ARIMA model after each new observation is received.</b></li><li>Finally, we manually keep track of all observations in a list called <b>history</b> that is seeded with the training data and to which new observations are appended at each iteration.</li></ul><p id="5e68">Testing Mean Squared Error is 741.0594879572484</p><p id="318a" type="7">The MSE of the test set is quite large denoting that the precise prediction is a hard problem. However, this is the average squared value across all the test set predictions. Let’s visualize the predictions to understand the performance of the model more.</p><div id="4267"><pre>test_set_range = df[<span class="hljs-built_in">int</span>(<span class="hljs-built_in">len</span>(df)*<span class="hljs-number">0.7</span>)<span class="hljs-symbol">:</span>].<span class="hljs-built_in">index</span></pre></div><div id="55f3"><pre>plt.plot(test_set_range, model_predictions, <span class="hljs-attribute">color</span>=<span class="hljs-string">'blue'</span>, <span class="hljs-attribute">marker</span>=<span class="hljs-string">'o'</span>, <span class="hljs-attribute">linestyle</span>=<span class="hljs-string">'dashed'</span>,label='Predicted Price<span class="hljs-string">')</span></pre></div><div id="9619"><pre>plt.plot(test_set_range, test_data, <span class="hljs-attribute">color</span>=<span class="hljs-string">'red'</span>, <span class="hljs-attribute">label</span>=<span class="hljs-string">'Actual Price'</span>)</pre></div><div id="23a2"><pre>plt<span class="hljs-selector-class">.title</span>(<span class="hljs-string">'TESLA Prices Prediction'</span>) plt<span class="hljs-selector-class">.xlabel</span>(<span class="hljs-string">'Date'</span>) plt<span class="hljs-selector-class">.ylabel</span>(<span class="hljs-string">'Prices'</span>) plt<span class="hljs-selector-class">.xticks</span>(np<span class="hljs-selector-class">.arange</span>(<span class="hljs-number">881</span>,<span class="hljs-number">1259</span>,<span class="hljs-number">50</span>), df<span class="hljs-selector-class">.Date</span><span class="hljs-selector-attr">[881:1259:50]</span>) plt<span class="hljs-selector-class">.legend</span>() plt<span class="hljs-selector-class">.show</span>()</pre></div><figure id="b7c4"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*n8t9uJTLkuk-T1gW6u_7tA.png"><figcaption>ARIMA model performance on the test set</figcaption></figure><h2 id="9a79">Not so bad right?</h2><p id="bd87">Our ARIMA model results in appreciable results. This model offers a good prediction accuracy and to be relatively fast compared to other alternatives, in terms of training/fitting time and complexity.</p><p id="849f">That’s all folks ! Hope you liked this article!</p><h1 id="6bfe">Stay tuned & support this effort</h1><p id="ab67">If you liked and found this article useful, <b>follow</b> me to be able to see all my new posts.</p><p id="0a4d">Questions? Post them as a comment and I will reply as soon as possible.</p><h1 id="061c">References</h1><p id="04f2">[1] <a href="https://en.wikipedia.org/wiki/Autoregressive_integrated_moving_average">https://en.wikipedia.org/wiki/Autoregressive_integrated_moving_average</a></p><h1 id="7cae">Get in touch with me</h1><ul><li><b>LinkedIn</b>: <a href="https://www.linkedin.com/in/serafeim-loukas/">https://www.linkedin.com/in/serafeim-loukas/</a></li><li><b>ResearchGate</b>: <a href="https://www.researchgate.net/profile/Serafeim_Loukas">https://www.researchgate.net/profile/Serafeim_Loukas</a></li><li><b>EPFL</b> <b>profile</b>: <a href="https://people.epfl.ch/serafeim.loukas">https://people.epfl.ch/serafeim.loukas</a></li><li><b>Stack</b> <b>Overflow</b>: <a href="https://stackoverflow.com/users/5025009/seralouk">https://stackoverflow.com/users/5025009/seralouk</a></li></ul></article></body>

Time-Series Forecasting: Predicting Stock Prices Using An ARIMA Model

In this post I show you how to predict the TESLA stock price using a forecasting ARIMA model

ARIMA model performance on the test set

1. Introduction

1.1. Time-series & forecasting models

Time-series forecasting models are the models that are capable to predict future values based on previously observed values. Time-series forecasting is widely used for non-stationary data. Non-stationary data are called the data whose statistical properties e.g. the mean and standard deviation are not constant over time but instead, these metrics vary over time.

These non-stationary input data (used as input to these models) are usually called time-series. Some examples of time-series include the temperature values over time, stock price over time, price of a house over time etc. So, the input is a signal (time-series) that is defined by observations taken sequentially in time.

A time series is a sequence of observations taken sequentially in time.

An example of a time-series. Plot created by the author in Python.

Observation: Time-series data is recorded on a discrete time scale.

Disclaimer: There have been attempts to predict stock prices using time series analysis algorithms, though they still cannot be used to place bets in the real market. This is just a tutorial article that does not intent in any way to “direct” people into buying stocks.

2. The AutoRegressive Integrated Moving Average (ARIMA) model

A famous and widely used forecasting method for time-series prediction is the AutoRegressive Integrated Moving Average (ARIMA) model. ARIMA models are capable of capturing a suite of different standard temporal structures in time-series data.

Terminology

Let’s break down these terms:

  • AR: < Auto Regressive > means that the model uses the dependent relationship between an observation and some predefined number of lagged observations (also known as “time lag” or “lag”).
  • I:< Integrated > means that the model employs differencing of raw observations (e.g. it subtracts an observation from an observation at the previous time step) in order to make the time-series stationary.MA:
  • MA: < Moving Average > means that the model exploits the relationship between the residual error and the observations.

Model parameters

The standard ARIMA models expect as input parameters 3 arguments i.e. p,d,q.

  • p is the number of lag observations.
  • d is the degree of differencing.
  • q is the size/width of the moving average window.

NEW: After a great deal of hard work and staying behind the scenes for quite a while, we’re excited to now offer our expertise through a platform, the “Data Science Hub” on Patreon (https://www.patreon.com/TheDataScienceHub). This hub is our way of providing you with bespoke consulting services and comprehensive responses to all your inquiries, ranging from Machine Learning to strategic data analytics planning.

3. Getting the stock price history data

Thanks to Yahoo finance we can get the data for free. Use the following link to get the stock price history of TESLA: https://finance.yahoo.com/quote/TSLA/history?period1=1436486400&period2=1594339200&interval=1d&filter=history&frequency=1d

You should see the following:

Click on the Download and save the .csv file locally on your computer.

The data are from 2015 till now (2020) !

4. Python working example

Modules needed: Numpy, Pandas, Statsmodels, Scikit-Learn

4.1. Load & inspect the data

Our imports:

import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt
from pandas.plotting import lag_plot
from pandas import datetime
from statsmodels.tsa.arima_model import ARIMA
from sklearn.metrics import mean_squared_error

Now let’s load the TESLA stock history data:

df = pd.read_csv("TSLA.csv")
df.head(5)
  • Our target variable will be the Close value.

Before building the ARIMA model, let’s see if there is some cross-correlation in out data.

plt.figure()
lag_plot(df['Open'], lag=3)
plt.title('TESLA Stock - Autocorrelation plot with lag = 3')
plt.show()
TESLA Stock — Autocorrelation plot with lag = 3

We can now confirm that ARIMA is going to be a good model to be applied to this type of data (there is auto-correlation in the data).

Finally, let’s plot the stock price evolution over time.

plt.plot(df["Date"], df["Close"])
plt.xticks(np.arange(0,1259, 200), df['Date'][0:1259:200])
plt.title("TESLA stock price over time")
plt.xlabel("time")
plt.ylabel("price")
plt.show()

4.2. Build the predictive ARIMA model

Next, let’s divide the data into a training (70 % ) and test (30%) set. For this tutorial we select the following ARIMA parameters: p=4, d=1 and q=0.

train_data, test_data = df[0:int(len(df)*0.7)], df[int(len(df)*0.7):]
training_data = train_data['Close'].values
test_data = test_data['Close'].values
history = [x for x in training_data]
model_predictions = []
N_test_observations = len(test_data)
for time_point in range(N_test_observations):
    model = ARIMA(history, order=(4,1,0))
    model_fit = model.fit(disp=0)
    output = model_fit.forecast()
    yhat = output[0]
    model_predictions.append(yhat)
    true_test_value = test_data[time_point]
    history.append(true_test_value)
MSE_error = mean_squared_error(test_data, model_predictions)
print('Testing Mean Squared Error is {}'.format(MSE_error))

Summary of the code

  • We split the training dataset into train and test sets and we use the train set to fit the model, and generate a prediction for each element on the test set.
  • A rolling forecasting procedure is required given the dependence on observations in prior time steps for differencing and the AR model. To this end, we re-create the ARIMA model after each new observation is received.
  • Finally, we manually keep track of all observations in a list called history that is seeded with the training data and to which new observations are appended at each iteration.

Testing Mean Squared Error is 741.0594879572484

The MSE of the test set is quite large denoting that the precise prediction is a hard problem. However, this is the average squared value across all the test set predictions. Let’s visualize the predictions to understand the performance of the model more.

test_set_range = df[int(len(df)*0.7):].index
plt.plot(test_set_range, model_predictions, color='blue', marker='o', linestyle='dashed',label='Predicted Price')
plt.plot(test_set_range, test_data, color='red', label='Actual Price')
plt.title('TESLA Prices Prediction')
plt.xlabel('Date')
plt.ylabel('Prices')
plt.xticks(np.arange(881,1259,50), df.Date[881:1259:50])
plt.legend()
plt.show()
ARIMA model performance on the test set

Not so bad right?

Our ARIMA model results in appreciable results. This model offers a good prediction accuracy and to be relatively fast compared to other alternatives, in terms of training/fitting time and complexity.

That’s all folks ! Hope you liked this article!

Stay tuned & support this effort

If you liked and found this article useful, follow me to be able to see all my new posts.

Questions? Post them as a comment and I will reply as soon as possible.

References

[1] https://en.wikipedia.org/wiki/Autoregressive_integrated_moving_average

Get in touch with me

Forecasting
Machine Learning
Data Science
Stock Market
Artificial Intelligence
Recommended from ReadMedium