Free AI web copilot to create summaries, insights and extended knowledge, download it at here

Abstract

article.</p><div id="b0ef"><pre><span class="hljs-built_in">exp</span>.plot_model(<span class="hljs-built_in">plot</span>=<span class="hljs-string">"acf"</span>) <span class="hljs-built_in">exp</span>.plot_model(<span class="hljs-built_in">plot</span>=<span class="hljs-string">"pacf"</span>)</pre></div><figure id="e60a"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*vxNpqQYhLv5tHYepi2RLPA.png"><figcaption>Random Walk Dataset ACF Plot [Image by Author]</figcaption></figure><figure id="71ac"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*Oz5Hsm2cTqGwCGkRDOAMUw.png"><figcaption>Random Walk Dataset PACF Plot [Image by Author]</figcaption></figure><p id="ce8e">👉 <b>Step 3: Theoretical Calculations</b></p><p id="d934">For a random walk model, we can use equation 1 to guide us in calculating theoretical values. Essentially, the next time point is predicted to be the last “known” time point. For in-sample predictions (i.e. predictions on the training dataset), this will change at every point in time since the last known point at t = 1 is not the same as the last “known” point at t = 10 (assuming both t=1 and t=10 are in-sample).</p><p id="894a">For the out-of-sample predictions (i.e. predictions in the unknown cross-validation/test dataset), the best future prediction will be the last known data point. This remains the same no matter how far we predict the future. This is an important distinction between in-sample and out-of-sample predictions for a random walk.</p><figure id="a6fd"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*YfmIwDAlcyuxfvha3ig2cQ.png"><figcaption>Theoretical In-sample (Train) and Out-of-Sample (Test) Predictions [Image by Author]</figcaption></figure><p id="7158">👉 <b>Step 4: Build the Model with PyCaret</b></p><div id="f038"><pre>#### Random Walk Model (without trend) ---- model3a = exp.create_model( <span class="hljs-string">"arima"</span>, order=(<span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>), seasonal_order=(<span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>), trend=<span class="hljs-string">"n"</span> )</pre></div><p id="0b95">👉 <b>Step 5: Analyze the Results</b></p><p id="0020">We will reuse the same helper functions that we created in the previous articles to analyze the results.</p><div id="2209"><pre><span class="hljs-function"><span class="hljs-title">summarize_model</span><span class="hljs-params">(model3a)</span></span></pre></div><figure id="4a6a"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*EaPapBztimzQYzPr4QtOpg.png"><figcaption>Random Walk Model Statistical Summary [Image by Author]</figcaption></figure><p id="c120">The statistical summary shows that the created model is a <code>SARIMAX(0,1,0)</code> model which matches our desire to build a model with <code>d=1</code>. The residual sigma2 (unexplained variance) is 0.9720 and representative of the <code>epsilon</code> term in equation 1.</p><div id="6587"><pre><span class="hljs-function"><span class="hljs-title">get_residual_properties</span><span class="hljs-params">(model3a)</span></span></pre></div><figure id="0d16"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*IH_q1FbCCszon9SU5pRvBg.png"><figcaption>Model Residuals [Image by Author]</figcaption></figure><p id="2af3">Looking at the model residuals, we can see that residuals indeed have a variance of 0.9720 which matches with the statistical summary. Next, let’s plot the predictions and compare to our theoretical framework.</p><div id="c7f2"><pre><span class="hljs-function"><span class="hljs-title">plot_predictions</span><span class="hljs-params">(model3a)</span></span></pre></div><figure id="76c1"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*LGZU17rLDCOyTORDuZgmww.png"><figcaption>Out-of-Sample Predictions [Image by Author]</figcaption></figure><figure id="ec8b"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*1bf_AIriem9qC8oN0iWUNw.png"><figcaption>Zoomed Out-of-Sample Predictions [Image by Author]</figcaption></figure><p id="1f8f">The out-of-sample predictions match our theoretical

Options

calculations. i.e. the predictions are the same as the last known data point (in this case the value at point 309). The ability to zoom into the interactive plots in <code>pycaret</code> makes it easy to analyze the results and gain better intuition into the working of the model.</p><figure id="a067"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*Cv60M69wCKuMM0nW8pQ8kg.png"><figcaption>In-Sample Predictions [Image by Author]</figcaption></figure><figure id="b5a9"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*WceanvpAldL6UlI-Syelxw.png"><figcaption>Zoomed In-Sample Predictions [Image by Author]</figcaption></figure><p id="a55d">Similarly, we can observe the in-sample predictions as well. Zooming into the plot shows that the prediction at any given point in time is the same as the last known point. And since the last known data point changes from one time point to the next, the in-sample prediction also changes from one point to the next. This also matches with our theoretical calculations.</p><p id="fcbf">👉 <b>Step 6: Checking the Model Fit</b></p><p id="888a">This is also a good time to introduce the concept of “model fit”. Checking the model fit essentially means checking to see if the model has captured all the “information” from the time series or not. This is true when the model residuals do not have any trend, seasonality, or auto-correlations, i.e. the residuals are “white noise”. <code>pycaret</code> provides a very handy feature to check model fit. We can check the white noise characteristics of a model’s residuals by passing a model to the <code>check_stats</code> method/function as follows.</p><div id="2a0d"><pre>exp.check_stats(model3a, <span class="hljs-attribute">test</span>=<span class="hljs-string">"white_noise"</span>)</pre></div><figure id="30ad"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*dq5NyTjNFCQ_QSMI0LbsKw.png"><figcaption>Model Residual White Noise Test [Image by Author]</figcaption></figure><p id="43a3">Lines 4 and 5 confirm that the residuals are indeed white noise and hence the model has captured the information from the time series well.</p><h1 id="af80">🚀 Conclusion</h1><p id="dc13">Hopefully, this simple model has laid a good foundation for us to understand the inner workings of the “difference term — d” in an ARIMA model. In the next article, we see how we can combine the “difference” term “d” with the “trend” component that we learned about in the previous article in this series. Until then, if you would like to connect with me on my social channels (I post about Time Series Analysis frequently), you can find me below. That’s it for now. Happy forecasting!</p><p id="67e0">🔗 <a href="https://www.linkedin.com/in/guptanick/">LinkedIn</a></p><p id="1728">🐦 <a href="https://twitter.com/guptanick13">Twitter</a></p><p id="e203">📘 <a href="https://github.com/ngupta23">GitHub</a></p><p id="5d15"><i>Loved the article? Become a <a href="https://ngupta13.medium.com/membership"><b>Medium member</b></a> to continue <b>learning without limits</b>. I’ll receive a portion of your membership fee if you use the following link, <b>with no extra cost to you</b>.</i></p><div id="9d2f" class="link-block"> <a href="https://ngupta13.medium.com/membership"> <div> <div> <h2>Join Medium with my referral link — Nikhil Gupta</h2> <div><h3>Read every story from Nikhil Gupta (and thousands of other writers on Medium). Your membership fee directly supports…</h3></div> <div><p>ngupta13.medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*QYdQl_Nf4gxHTEVl)"></div> </div> </div> </a> </div><h1 id="5705">📗 Resources</h1><ol><li><a href="https://nbviewer.ipython.org/github/ngupta23/medium_articles/blob/main/time_series/pycaret/pycaret_ts_arima_010_0000.ipynb"><b>Jupyter Notebook</b></a> containing the code for this article</li></ol><h1 id="22e6">📚 References</h1><p id="0b97">[1] <a href="https://github.com/pycaret/pycaret/discussions/1765">Time Series Exploratory Analysis | Autocorrelation Function (ACF)</a></p></article></body>

A Practical Guide to ARIMA Models using PyCaret — Part 3

Understanding the Difference Term

📚 Introduction

In the previous article in this series, we saw the impact of the trend term on the output of an ARIMA model. This article will look at the “difference” term “d” and see how this is modeled and what it represents.

📖 Suggested Previous Reads

The previous articles in this series can be found below. I would recommend that readers go through them first before continuing with this article. This article builds upon the concepts described in the previous ones as well as reuses some work done in it.

A Practical Guide to ARIMA Models using PyCaret — Part 1

A Practical Guide to ARIMA Models using PyCaret — Part 2

1️⃣ “Differencing — d” Overview in ARIMA Models

At a very high level, differencing means that the value of a time series at any given point in time depends on the value(s) at a previous point in time. A difference “d = 1” means that the value at any point in time depends on the previous point in time (given by equation 1). The epsilon term represents the noise term which can not be modeled.

ARIMA Equations for d=1 (Image by Author using https://latex2png.com/)

Important Side Note: The process that generates a time series using equation 1 is also called a “Random Walk”. Most stock data follows this pattern. If you look closely, you will realize that when this is modeled correctly, the best prediction of the future point is the same as the last known point. Hence, stock price models using traditional approaches like ARIMA do not produce “useful” models. Do we really need a model to tell us that tomorrow’s stock price will be the same as today’s stock price?

2️⃣️ Understanding the Difference Term using PyCaret

👉 Step 1: Setup PyCaret Time Series Experiment

In order to understand this concept better, we will use a random walk dataset from pycaret playground. Details can be found in the Jupyter notebook for this article (available at the end of the article).

#### Get data from data playground ----
y = get_data("1", folder="time_series/random_walk")

exp = TimeSeriesExperiment()
exp.setup(data=y, seasonal_period=1, fh=30, session_id=42)

exp.plot_model()

👉 Step 2: Perform EDA

A classical way to diagnose whether a time series has been generated using a random walk process is to look at the ACF and PACF plots. ACF plots will show extended auto-correlations [1]. PACF plots should show a peak at lag = 1 and the peak should be very close to 1 in magnitude. All other lags will be insignificant. You can think about the PACF magnitude as the coefficient of the lagged value y(t-1) in equation 1. I will write more about this in another article.

exp.plot_model(plot="acf")
exp.plot_model(plot="pacf")

Random Walk Dataset ACF Plot [Image by Author]

Random Walk Dataset PACF Plot [Image by Author]

👉 Step 3: Theoretical Calculations

For a random walk model, we can use equation 1 to guide us in calculating theoretical values. Essentially, the next time point is predicted to be the last “known” time point. For in-sample predictions (i.e. predictions on the training dataset), this will change at every point in time since the last known point at t = 1 is not the same as the last “known” point at t = 10 (assuming both t=1 and t=10 are in-sample).

For the out-of-sample predictions (i.e. predictions in the unknown cross-validation/test dataset), the best future prediction will be the last known data point. This remains the same no matter how far we predict the future. This is an important distinction between in-sample and out-of-sample predictions for a random walk.

Theoretical In-sample (Train) and Out-of-Sample (Test) Predictions [Image by Author]

👉 Step 4: Build the Model with PyCaret

#### Random Walk Model (without trend) ----
model3a = exp.create_model(
    "arima",
    order=(0, 1, 0),
    seasonal_order=(0, 0, 0, 0),
    trend="n"
)

👉 Step 5: Analyze the Results

We will reuse the same helper functions that we created in the previous articles to analyze the results.

summarize_model(model3a)

Random Walk Model Statistical Summary [Image by Author]

The statistical summary shows that the created model is a SARIMAX(0,1,0) model which matches our desire to build a model with d=1. The residual sigma2 (unexplained variance) is 0.9720 and representative of the epsilon term in equation 1.

get_residual_properties(model3a)

Looking at the model residuals, we can see that residuals indeed have a variance of 0.9720 which matches with the statistical summary. Next, let’s plot the predictions and compare to our theoretical framework.

plot_predictions(model3a)

Out-of-Sample Predictions [Image by Author]

Zoomed Out-of-Sample Predictions [Image by Author]

The out-of-sample predictions match our theoretical calculations. i.e. the predictions are the same as the last known data point (in this case the value at point 309). The ability to zoom into the interactive plots in pycaret makes it easy to analyze the results and gain better intuition into the working of the model.

Zoomed In-Sample Predictions [Image by Author]

Similarly, we can observe the in-sample predictions as well. Zooming into the plot shows that the prediction at any given point in time is the same as the last known point. And since the last known data point changes from one time point to the next, the in-sample prediction also changes from one point to the next. This also matches with our theoretical calculations.

👉 Step 6: Checking the Model Fit

This is also a good time to introduce the concept of “model fit”. Checking the model fit essentially means checking to see if the model has captured all the “information” from the time series or not. This is true when the model residuals do not have any trend, seasonality, or auto-correlations, i.e. the residuals are “white noise”. pycaret provides a very handy feature to check model fit. We can check the white noise characteristics of a model’s residuals by passing a model to the check_stats method/function as follows.

exp.check_stats(model3a, test="white_noise")

Model Residual White Noise Test [Image by Author]

Lines 4 and 5 confirm that the residuals are indeed white noise and hence the model has captured the information from the time series well.

🚀 Conclusion

Hopefully, this simple model has laid a good foundation for us to understand the inner workings of the “difference term — d” in an ARIMA model. In the next article, we see how we can combine the “difference” term “d” with the “trend” component that we learned about in the previous article in this series. Until then, if you would like to connect with me on my social channels (I post about Time Series Analysis frequently), you can find me below. That’s it for now. Happy forecasting!

🔗 LinkedIn

🐦 Twitter

📘 GitHub

Loved the article? Become a Medium member to continue learning without limits. I’ll receive a portion of your membership fee if you use the following link, with no extra cost to you.

Join Medium with my referral link — Nikhil Gupta

Read every story from Nikhil Gupta (and thousands of other writers on Medium). Your membership fee directly supports…

ngupta13.medium.com

📗 Resources

Jupyter Notebook containing the code for this article

📚 References

[1] Time Series Exploratory Analysis | Autocorrelation Function (ACF)