avatarEdwina Gu

Summary

This post introduces the Kalman Filter, a method for analyzing timeseries datasets, and provides a Python implementation using numpy.

Abstract

The article focuses on the application of the Kalman Filter, a method for analyzing timeseries datasets, with a focus on stock market datasets. The author provides a brief history and details of the Kalman Filter, along with a Python implementation using numpy. The implementation is demonstrated with an example of a true model in the form of y = 3 with random white noise, and the results are compared to a simple linear regression model. The Kalman Filter is also applied to the SPY dataset between 2015 to 2020, demonstrating its ability to quickly follow trends even with a bad initial guess. The author notes that in-sample performance is not indicative of out-of-sample predictive power, and adjustments for trend and seasonality will be addressed in future posts.

Bullet points

  • The post focuses on the application of the Kalman Filter for analyzing timeseries datasets, specifically stock market datasets.
  • The author provides a brief history and details of the Kalman Filter.
  • A Python implementation of the Kalman Filter using numpy is provided.
  • The implementation is demonstrated with an example of a true model in the form of y = 3 with random white noise.
  • The results of the Kalman Filter are compared to a simple linear regression model.
  • The Kalman Filter is applied to the SPY dataset between 2015 to 2020, demonstrating its ability to quickly follow trends even with a bad initial guess.
  • The author notes that in-sample performance is not indicative of out-of-sample predictive power.
  • Adjustments for trend and seasonality will be addressed in future posts.

Timeseries Methods: Kalman Filter from scratch in Python — Part 1

These posts will be focused on application of various method in analyzing timeseries dataset (stock market dataset).

This first post is about Kalman Filter. History and details can be found in the wiki. There will be several posts on Kalman Filter, starting with the basic linear Kalman Filter with 1D dataset to multidimension dataset in nonlinear space.

Using the equation from the wiki we could easily create a function in python using numpy.

def kalman_filter(x_init, F, Q, R, H, data, B=None, u=None, sd=0, num_state=1):
    X_post = x_init
    P_post = np.eye(num_state) * sd
    num_steps = data.shape[1]
    mean = []
    covar = []
    if B is None:
        B = np.array([[0]])
        u = np.array([[0]])

    for i in range(num_steps):
        print(i)
        z_k = data[:,i]
        #predict
        X_prior = np.dot(F, X_post) + np.dot(B, u)
        P_prior = np.dot(np.dot(F, P_post) , F.T) + Q
        #update
        resid = z_k - np.dot(H ,X_prior)
        S_k = np.dot(np.dot(H, P_prior), H.T) + R
        K_k = np.dot(np.dot(P_prior,H.T), np.linalg.inv(S_k))
        X_post = X_prior + np.dot(K_k , resid)
        P_post = np.dot(np.eye(num_state) - np.dot(K_k , H), P_prior)
        mean.append(X_post)
        covar.append(P_prior)
    return mean, covar

To validate the result let’s assume that the true model is in the form of y = 3 with random white noise. initializing conditions are as follow:

x_init = np.array([[0]])
F = np.array([[1]])# state transition matrix
B = np.array([[0]]) # control input
u = np.array([[0]]) # control vector
Q = 1   # model noise
H = np.array([[1]]) # observation model
R = 1   # observation  noise

truth = 3
observation = truth + np.random.normal(0, 1, size=20)

mean, std = kalman_filter(x_init, F, Q, R, H, data, B,u,0,1)

we can see from the plot how kalman filter compares to a simple linear regression model.

Applying to SPY dataset between 2015 to 2020. We can see how quickly the Kalman Filter is able to follow the trend even with a bad initial guess of 0

It is important to note, in-sample performance is not indicative for out of sample predictive power. Specifically for Kalman Filter, the prediction will be a constant for future period, when we know clearly there are trend and seasonality. We will adjust for these factors in future posts.

Kalman Filter
Stock Market
Timeseries
Machine Learning
Regression Analysis
Recommended from ReadMedium