Time Series with Zillow’s Luminaire — Part I Data Exploration

I sit at my local beach at midnight, enjoying the big moon casting sparkling lights on the ocean. The wavy moonlight path looks to me like a time series path, not always smooth but traceable. It shows me tranquility and serendipity.
I wrote a few articles on time series forecasting and anomaly detection. The Luminaire by the Zillow Tech Hub is the next one to that I want to write an in-depth introduction. Are you doing time series forecasting and outlier detection now or shortly? After reading this article, you will be running your time series model comfortably with Luminaire. Read this article and the following series: Anomaly Detection for Time Series, Business Forecasting with Facebook’s “Prophet”, “A Technical Guide on RNN/LSTM/GRU for Stock Price Prediction”, and “Kalman Filter Explained!”.
Change Points Are Challenging in a Univariate Time Series
You typically deal with a daily, monthly, or quarterly time series (and high-frequency in IoT or stock tick data). If the number of data points is not enough to train a model, someone may suggest taking a longer time range. However, a longer time series also brings a new challenge because it contains more change points. A change point is one in that the data trend shifts abruptly. This happens in almost all time series such as finance, electricity consumption, manufacturing quality control, heart rate diagnostics, or human activity data. Let’s take the Dow Jones stock market daily data from 1970 to 2020 as an example. These years have so many macroeconomic events that have shifted the time series patterns from period to period. Human eyes may spot the change points easily, but an algorithm may not be able to. Since every time series has its change points, it will be great to have a general algorithm to detect the change points automatically.

Time Series Data Preparation

The above flowchart is a typical data science process, each step involves sophisticated methods. The Luminaire module follows the procedure. The data preparation step includes exploratory data analysis, data cleaning, missing data imputation, and change point detection. The model specification step optimizes the best model specifications. The modeling step models the data for an appropriate model. These models include a structural model such as ARIMA or Decomposition, as well as a moving average or a Kalman Filter model. The forecasting considers how users need the forecasts such as real-time predictions, or streaming data anomaly detection.
Because each step involves considerable knowledge, I am going to walk you through three detailed articles. I have made the code in this article available for download via this Github link.
The Luminaire by the Zillow Tech Hub
You may have seen the Zestimate on Zillow.com. The house price predictions are often so precise they become essential references for home buyers. This is done by the Zillow Tech Hub.

The Luminaire, an open-source product, is another product that you will love to employ. You can visit the Luminaire Homepage Luminaire Github or this article Automatic and Self-aware Anomaly Detection at Zillow using Luminaire or this paper [1].
Luminaire is a python package that provides anomaly detection and forecasting functions that incorporate correlational and seasonal patterns as well as uncontrollable variations in the data over time.
How to Start
Luminaire currently requires Python 3.6 as described on the PyPI page. I suggest you create a virtual environment for Python 3.6. Once you are in the python 3.6 virtual environments, all you need to do is pip install Luminaire.
The Pruned Exact Linear Time (PELT) to Detect Change Points
An intuitive solution to detect change points is called Pruned Exact Linear Time (PELT) [2], [3]. I drew two graphs (A) and (B) to explain the method. The blue line is the time series. The horizontal orange line in the left graph (A) is the regression line. The distance from each point (shown as a white circle) to the regression line is represented by the vertical orange line. The regression line is determined by minimizing the sum of the distances to all the data points. The blue time series has two segments but (A) does not know.

In contrast, the regression line in (B) is cut into two regression lines at the change point. The sum of the distances in (B) is much smaller than that in (A). By sliding the cut point from left to right of the time series, the algorithm can find the appropriate change point for the time series that minimizes the sum of the distances or errors. The equation below is the algorithm to search for the number of change points and the locations of change points. C(.) is the distance or the cost function. We also need to control for not creating too many line segments thus overfitting the time series. So the term b (beta) is the number of segments as a penalty term to prevent the search from yielding too many segments.

In the article “Finding the Change Points in a Time Series” I gave an in-depth review of change point detection that can be used for real-time applications. You can take a look.
(A) Let’s Start with the Minimum Data Exploration
I create a time series with two clusters. There is only one change point. Can Luminaire detect it?











