Simple Linear Regression with Time Series
Step-by-step follow-along | Data Series | Episode 17.2

Consider reviewing episodes on linear regression before continuing:
- Understanding Simple Linear Regression
- Understanding Multiple Linear Regression
- Understanding Polynomial Regression
Overview
There are two features we can engineer for simple linear regression for time series analysis.
- Time-step (Date Time) feature
We index each date with a time, for example:

This enables us to produce the model:

Where:
- y : is our target (sales)
- β: is our weight
- t: is our time-step feature
- b: is our bias
Time-step features enable us to model time dependence. A time series is said to be time dependent if observations can be predicted based on the time in which it occurred.
2. Lag feature
Another feature we can make for time series analysis, is something called the lag feature.
For this we shift all our observations so they occur later in time.
For example:

This enables us to produce a similar model as before, but for this case using lag as our feature instead of time.

Lag features enable us to model serial dependence. A time series is said to have serial dependence when an observation can be predicted from past observations.
— — — — —
In this episode we focus on applying simple linear regression to time series data to model the mean temperature in Delhi, India from 1st January 2013 to 1st January 2017.
Libraries
import pandas as pd
import warnings
import numpy as np
import matplotlib.pyplot as plt
warnings.filterwarnings("ignore")Data Exploration
We read our data into python using the read_csv function from pandas.
# read the data
df = pd.read_csv("D:\ProjectData\weather_ts.csv")
# check data frame shape
df.shapeWe can make use of the head function to view the first few rows of our dataframe:
df.head()

We have four variables that are being recorded against time:
1) Mean Temperature 2) Humidity 3) Wind Speed 4) Mean Pressure
For this episode we are going to be focussing on mean temperature.
Adding Time-step Feature
df['Time'] = np.arange(len(df.index))
df.head()
From the above output we observe each date has been indexed with a time.
from sklearn.linear_model import LinearRegression
# Training data
X_ts = df[["Time"]] # feature
y_ts = df.meantemp # target
# Train the model
model_ts = LinearRegression()
model_ts.fit(X_ts, y_ts)
# Generate a series of predicted values
y_pred_ts = pd.Series(model_ts.predict(X_ts))To obtain our model intercept and coefficient we can use the following code:
model_ts.intercept_, model_ts.coef_

Which gives the model to 3dp:

We can produce a plot of our time series data using the time step feature and add our regression line:
plt.figure(figsize=(11, 4))
# Plot the data points
plt.plot(X, y, marker='o', markersize=2, linestyle='-', label='Actual data')
# Plot the regression line
plt.plot(X, y_pred, color='red', label='Regression line')
# Add labels and a legend
plt.xlabel('Time')
plt.ylabel('Mean Temperature')
plt.title('Simple Linear Regression (Time-step)')
plt.legend()
The above regression line does not capture the time dependence shown in our plot, a more complex time series model might be needed.
Lag feature
We can shift our mean temperature values by making using of the shift function from pandas:
df['Lag_1'] = df['meantemp'].shift(1)
df.head()
From the above code we have produced a new column with the mean temperature shifted by 1.
We can proceed as before, this time removing our missing value and using Lag_1 as our feature.
# Remove missing values and generate new df
df_lag = df.copy().dropna()
# Training data
X_lag = df_lag[["Lag_1"]] # feature
y_lag = df_lag.meantemp # target
# Train the model
model_lag = LinearRegression()
model_lag.fit(X_lag, y_lag)
# Generate a series of predicted values from our lag data
y_lag_pred = pd.Series(model.predict(X_lag))We can obtain the intercept and coefficient of our model:
# Obtain model intercept and coefficient
model_lag.intercept_, model_lag.coef_
Leaving us with the model to 3dp:

We can produce a scatter plot of our mean temperature against our lag feature. This can tell us if there exists a correlation between current and previous observations.
plt.figure(figsize=(4, 4))
# Plot the data points
plt.scatter(X_lag, y_lag,label='Actual data')
# Plot the regression line
plt.plot(X_lag, y_lag_pred, color='red', label='Regression line')
# Add labels and a legend
plt.xlabel('Lag_1')
plt.ylabel('Mean Temperature')
plt.title('Simple Linear Regression (Lag feature)')
plt.legend()
The above plot, shows that an increase in the mean temperature the day before results in an increase in the mean temperature the next day. Correlations such as these indicate it is useful to include a lag feature in the time series model.
Well Built Time Series Models
Well built time series models tend to include a combination of time-step features and lag features. In this episode we used simple linear regression, however we can use such features in other models.
If you have any questions please leave them below!





