avatarTracyrenee

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

2182

Abstract

a higher level than simple basic math that is coded into Python,</li><li>Statsmodels, which performs statistical and time series operations,</li><li>Matplotlib is used to plot the data points onto a graph, and</li><li>Seaborn is a higher level graphics package that plots graphs in a statistical fashion.</li></ol><p id="c3b4">I then used pandas to read the time series csv file into the program. I timestamped and indexed the first column of the dataset:-</p><figure id="4241"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*vxBJGbjq82S12EgU"><figcaption></figcaption></figure><p id="f57c">I then checked the time series for stationarity. A time series is stationary if the p-value is less than 0.05, and on this occasion it was:-</p><figure id="1366"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*hkEzk-f4bx1ZBCuI"><figcaption></figcaption></figure><p id="acfe">I then split the time series into training and validation sets:-</p><figure id="3806"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*tawZYS5gxsZKoXVq"><figcaption></figcaption></figure><p id="5d67">The first model I experimented on was the Holt Winters model because it provides triple exponential smoothing, which means the dataset does not need to be de-seasoned or de-trended before it is trained and fitted into the model.</p><p id="f103">After I trained and fitted the model, I checked akaike information criteria, or aic, because this value it incorporates the goodness of fit and simplicity/parsimony into a single statistic. In general, the lower this value is, the better.</p><p id="aa37">I made a forecast of the model:-</p><figure id="012f"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*j7FPRr4ydwz_6zyy"><figcaption></figcaption></figure><p id="abba">I checked the root mean squared error, or rmse, of the predictions and achieved an error of 6.86:-</p><figure id="2e7a"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*bs1HWtm5mvNYCRW5"><figcaption></figcaption></figure><p id="53dc">I plotted the predictions on a graph against the time series, and it can be seen in the screenshot below:-</p><figure id="df6f">

Options

<img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*-JQSI_2k13MV7d99"><figcaption></figcaption></figure><p id="5e8f">I then experimented with the time series using the ARIMA model. I noted that I had an aic measurement of 2249, being higher than Holt Winters’ reading:-</p><figure id="01cf"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*ABoW7bR6ILbwKDJH"><figcaption></figcaption></figure><p id="5926">I made predictions on the trained and fitted model and plotted the results on a graph, as seen below:-</p><figure id="dc92"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*pWvOHVY0xF0oelUx"><figcaption></figcaption></figure><p id="9785">I then checked the rmse and found it is 3.16, which is about half the value of Holt Winters’ equivalent reading:-</p><figure id="4dc2"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*hzF9exj3uZVXGzv_"><figcaption></figcaption></figure><p id="62bb">In conclusion, we have a bit of a conundrum here because the aic measurement favours the Holt Winters model and the rmse favours ARIMA. I think that considering the fact that the time series is stationary and did not need to be de-trended or de-seasoned, I would opt for the ARIMA model. The reason for this is because the Holt Winters model is designed specifically for time series that have trending and seasonality that need exponentially smoothing.</p><p id="b1ab">I have prepared a code review to accompany this blog post, which can be viewed here:- <a href="https://www.youtube.com/watch?v=Me_z7zHL2yo">https://www.youtube.com/watch?v=Me_z7zHL2yo</a></p><p id="5d16"><i>More content at <a href="https://plainenglish.io/"><b>PlainEnglish.io</b></a>. Sign up for our <a href="http://newsletter.plainenglish.io/"><b>free weekly newsletter</b></a>. Follow us on <a href="https://twitter.com/inPlainEngHQ"><b>Twitter</b></a><b> </b>and <a href="https://www.linkedin.com/company/inplainenglish/"><b>LinkedIn</b></a>. Check out our <a href="https://discord.gg/GtDtUAvyhW"><b>Community Discord</b></a> and join our <a href="https://inplainenglish.pallet.com/talent/welcome"><b>Talent Collective</b></a>.</i></p></article></body>

A Comparison of Holt Winters vs ARIMA When Predicting on Female Births in California

Two very common time series models are the Holt Winters and the ARIMA methods.

The Holt Winters method is a simple time series forecasting method to use because it utilises triple exponential smoothing that is built into the method. Therefore, a time series must have trend and seasonality in it in order for this model to work.

ARIMA, on the other hand, must be detrended and de-seasoned before it will work in the manner that it should. The ARIMA model has three components in it, being:-

  1. Autoregression,
  2. Moving average, and
  3. Integration (or differencing).

I decided to conduct an experiment to find out which model works better on the California female births dataset, in which the data was gathered in the year 1959.

I have written the program for this experiment in Google Colab, which is a free online Jupyter Notebook that is hosted by Google. Google Colab is a great website, with the only drawback being the fact it does not have a proper undo function. The only type of undo function that this program has is to undo a deleted cell. Therefore, care needs to be taken not to delete or overwrite valuable code.

After I created the program that I would need to perform this experiment, I installed the most current libraries and imported them. Another niggle that I have found with Google Colab is the fact that the program does not stay abreast of the most recent updates on the various Python libraries.

Once the most recent versions of the python libraries I would need were installed, I imported them into the program, being:-

  1. Pandas is used to process data by creating and maintaining dataframes,
  2. Numpy is used to make numerical computations and to create numpy arrays,
  3. Math is an inbuilt math library, used to perform mathematical computations at a higher level than simple basic math that is coded into Python,
  4. Statsmodels, which performs statistical and time series operations,
  5. Matplotlib is used to plot the data points onto a graph, and
  6. Seaborn is a higher level graphics package that plots graphs in a statistical fashion.

I then used pandas to read the time series csv file into the program. I timestamped and indexed the first column of the dataset:-

I then checked the time series for stationarity. A time series is stationary if the p-value is less than 0.05, and on this occasion it was:-

I then split the time series into training and validation sets:-

The first model I experimented on was the Holt Winters model because it provides triple exponential smoothing, which means the dataset does not need to be de-seasoned or de-trended before it is trained and fitted into the model.

After I trained and fitted the model, I checked akaike information criteria, or aic, because this value it incorporates the goodness of fit and simplicity/parsimony into a single statistic. In general, the lower this value is, the better.

I made a forecast of the model:-

I checked the root mean squared error, or rmse, of the predictions and achieved an error of 6.86:-

I plotted the predictions on a graph against the time series, and it can be seen in the screenshot below:-

I then experimented with the time series using the ARIMA model. I noted that I had an aic measurement of 2249, being higher than Holt Winters’ reading:-

I made predictions on the trained and fitted model and plotted the results on a graph, as seen below:-

I then checked the rmse and found it is 3.16, which is about half the value of Holt Winters’ equivalent reading:-

In conclusion, we have a bit of a conundrum here because the aic measurement favours the Holt Winters model and the rmse favours ARIMA. I think that considering the fact that the time series is stationary and did not need to be de-trended or de-seasoned, I would opt for the ARIMA model. The reason for this is because the Holt Winters model is designed specifically for time series that have trending and seasonality that need exponentially smoothing.

I have prepared a code review to accompany this blog post, which can be viewed here:- https://www.youtube.com/watch?v=Me_z7zHL2yo

More content at PlainEnglish.io. Sign up for our free weekly newsletter. Follow us on Twitter and LinkedIn. Check out our Community Discord and join our Talent Collective.

Time Series Forecasting
Holt Winters
Arima
Data Science
Recommended from ReadMedium