The provided content discusses the importance and methods of detecting change points in time series data, emphasizing both offline (batch) and online (real-time) applications.
Abstract
Change point detection (CPD) is a critical aspect of time series analysis, with significant implications in various fields such as medical monitoring, speech recognition, and weather detection. The article introduces several CPD algorithms, explaining their importance for real-time and batch processing. It highlights the role of CPD in a typical data science workflow, which includes data preparation, model specification, modeling, and forecasting. The Python modules "ruptures" and "changefinder" are presented as tools for offline and online change point detection, respectively. The article demonstrates the application of these modules on two simulated time series datasets, one with constant variance and the other with varying variance, to illustrate their effectiveness in identifying change points. The author concludes by recommending a sequence of articles for a comprehensive understanding of time series analysis and provides references to further reading.
Opinions
The author, Chris Kuo/Dr. Dataman, suggests that the development of different CPD algorithms is crucial for their application in real-time and batch scenarios.
The Ruptures module is recommended for offline applications due to its ability to build segmented regression lines, although it may require a long runtime for large datasets.
The changefinder module is favored for online applications because of its efficiency and ability to handle streaming data without the need for extensive computational resources.
The author expresses that while the changefinder module may miss some change points in certain conditions, overall, it performs reasonably well, especially considering its computational efficiency compared to offline methods.
The article promotes the use of the author's referral link to join Medium and encourages readers to purchase books by Chris Kuo to support his work.
An AI service, ZAI.chat, is recommended as a cost-effective alternative to ChatGPT Plus (GPT-4), highlighting its performance and affordability.
Detecting the Change Points in a Time Series
Suppose you wear an iWatch to monitor your heart rate. You run for a quarter mile, walk for ten minutes, then run for another quarter mile. The heart rate data will look like the time series in Figure 1. It shows a cluster of high heart rates, then a cluster of low heart rates, then back to high rates. The abrupt changes in the time series inform us the source object has major activity changes. Similarly, imagine a patient in an Intensive Care Unit (ICU), and the nurses monitor his heart rate. It is critical to detect any changes in the heart rate and alert the doctor to respond quickly. We need to detect them accurately and timely and send out alarms. Change point detection (CPD) is important. In medical condition monitoring, we need to monitor the health condition of a patient; in speech recognition, we detect the changes in the voice frequencies; in weather detection, we monitor the temperature changes to signal the citizens.
Figure 1
(A) Where Does CPD Fit in a Typical Data Science Process?
A typical data science process can be described by the flowchart in Figure (A): Data Preparation → Model Specification → Modeling → Forecasting. The data preparation step includes exploratory data analysis, data cleaning, and missing data imputation. Change point detection is an important task in the data preparation step.
Figure (A): Data Science Process
With prepared data, the model specification step searches for the best model specifications to do modeling. This step collects the basic information about the time series. For example, What is the frequency of the time series? Where are the change points? Do we need a logarithmic transformation? Should some of the data points be truncated? Do you need flags for the holidays? Should you apply Fourier transformation, and with what frequency? These tasks are frequently performed by an experienced data scientist. The good news is that they are incorporated into many modules for automation.
The modeling step models the data for an appropriate model. These models include a structural model such as ARIMA or Decomposition, as well as a moving average or a Kalman Filter model. The forecasting considers how users need the forecasts such as real-time predictions, or streaming data anomaly detection.
(B) Why Read this article?
In this article, I will introduce several CPD algorithms. After reading this article you will understand why it is important to develop different algorithms for real-time (online) or batch (offline) applications. You will also learn the reasons why CPD has been employed by many forecasting Python tools such as the Luminaire by Zillow, or the Prophet/Kats by Facebook.
With a time series long enough, you can spot the change points easily. Some algorithms use all the past data to detect changes. However, there are real-time use cases where you cannot afford to collect data over a long period to determine the change points. For example, suppose you fly a drone that streams its positions every second to the home device. You cannot just wait and collect data over ten minutes to detect any changes retrospectively. With the emergence of real-time streaming data such as IoT or online streaming, online methods can be implemented on a hardware device (see [2]). Because of the real-time and non-real-time use cases, both the real-time (or called online) and the batch (or called offline) algorithms have been developed to provide efficiency.
The offline algorithm uses the entire time series (or at least the time series of a longer period) to detect the changes. In contrast, online algorithms can detect the change points “on the fly”. I will show you the Ruptures Python module for offline applications, and the changefinder module for online applications.
(D) Generate Two Time Series
I generate two-time series to test the algorithms that I am going to introduce. The change points in the first time series are easier to detect, and those of the second one is harder to detect. The notebook is available via this Github link.
The first time series “ts1”: Has ten data segments with constant variance.
The second time series “ts2”: has ten data segments but with varying variances.
In time series modeling we try to find the underlying pattern such as a regression line so we can forecast the future. But the regression line will not be straight if there are change points. So an intuitive algorithm is to build segmented regression lines where the kinked points are the change points. This method is called Pruned Exact Linear Time (PELT) [3], [4].
This regression-based algorithm can be explained by the two graphs (A) and (B) below. The time series (shown in blue) has one change point and two segments. The horizontal orange line in the left graph (A) is the regression line. The distance from each point (shown as a white circle) to the regression line is represented by the vertical orange line. The regression line is determined by minimizing the sum of the distances to all the data points.
The regression line in (B) is cut into two regression lines at the change point. The sum of the distances in (B) is much smaller than that in (A). By sliding the cut point from left to right of the time series, the algorithm can find the appropriate change point for the time series that minimizes the sum of the distances or errors. The equation below is the algorithm to search for the number of change points and the locations of change points. C(.) is the distance or the cost function. We also need to control for not creating too many line segments thus overfitting the time series. So the term b (beta) is the number of segments as a penalty term to prevent the search from yielding too many segments.
This algorithm is coded in the Python module “ruptures”. Let’s see how it works. Install it with pip install ruptures.
Example E.1 — constant variance
This is nice. It has detected all the ten change points that we have generated. Let’s see if it is still effective when the variance varies over time.
Example E.2 —varying variance
The PELT algorithm spots the changing points at [2000, 3000, 3990, 5005, 5995, 6995, 8000, 10000] as shown below. We know two change points [1000, 9000] are supposed to be there. But because the data before and after those points are too similar, PELT does not spot the difference.
The algorithm needs a long runtime to find the change points in Example 1.1 and especially Example 1.2. This may not meet the requirement for real-time streaming data. Therefore the following Python module “changefinder” is designed for real-time applications.
A time series can be characterized by an autoregressive (AR) moving average process. An AR model says the next data point is a weighted moving average of the past data points with a random noise like the following equation, in which θi are the weights of the past p data points.
If there is a change point, it is expected that the AR processes before and after the change point will be different. With this intuition, Yamanishi & Takeuchi [5] proposed the Sequentially Discounting AutoRegressive (SDAR) learning algorithm. The SDAR method has two learning phases. In the first learning phase, it produces an intermediate score called the anomaly score. In the second learning phase, it produces the change-point score that can detect a change point. I outline the algorithm as the following:
[1st AR model] Read in a chunk of data t = 1, …, N, then build an AR (auto-regressive) model. It then produces an “anomaly score” which is the difference between the AR predicted value for Xt and the actual data Xt. Notice in this step only N data points are taken. Since it does not use the entire history, it is set for an online stream of data.
[1st Smoothing] The above anomaly score will be very volatile and produce false signals. The algorithm generates the moving average Yt for the anomaly score to smooth out the outliers.
[2nd AR model] Build an AR model for Yt and produce another “anomaly score” based on the newly built AR model and Yt.
[2nd Smoothing] Similarly, the new anomaly score tends to be volatile. The algorithm generates the moving average to smooth out. The final product is a score called the change-point score, as shown in the illustration below.
This algorithm does not need the entire time series to detect change points so it substantially reduces the computation time. In the following Example 2.1 and Example 2.2 I will test the algorithm on the two simulated time series data generated in Section (0). The first one (ts1) is a time series with constant variance, and the second one (ts2), has a varying variance.
Example F.1 —constant variance
We are interested in the change-point score in the changefinder module that indicates if a time series departs abruptly from its norm. The code below applies the model to the first time series (ts1). The code has three parameters:
r: the discount rate (0 to 1). If you set a high discount rate, the past time series decreases quickly, meaning you do not want to learn from the past. Setting it at 0.01 is just fine. It is not very sensitive so it does not matter if it is set at 0.01 or 0.05. Experiment with it.
order: the AR model order
smooth: the widow size of recent N data to compute the moving average for smoothing.
The red time series is the change-point score. Those large change-point scores are the locations where changes happen. Here I print out the locations of the top 20 (you can take more).
I plot these detected change points on the time series to see how well the change points were detected. In general, the algorithm does a decent job of finding those change points. The method missed the change points at 2000, 4000, and 6000. This is because the data frequency before and after those points are too similar to qualify for a difference. While running the code, you may find the computational time is much less than the PELT method of the ruptures module.
Example F.2 — varying variance
Change points in a time series with varying variance are hard to find. As shown below, Time series ts2 seems to have three to four clusters, not ten clusters as we designed. Let’s test the performance of the algorithm. The red time series is the change-point score. It detects several change points.
I plot the change points on the time series ts2. Considering the complexity of the time series, the result is quite reasonable.
(G) Conclusion
I hope this article gives you a better understanding of this topic. If you like to have a comprehensive review, the following sequence will help:
[1] C. Truong, L. Oudre, N. Vayatis. A selective review of offline change point detection methods. Signal Processing, 167:107299, 2020. [journal][pdf]
[2] Takuma Iwata , Kohei Nakamura, Yuta Tokusashi, and Hiroki Matsutani. Accelerating Online Change-Point Detection Algorithm using 10GbE FPGA NIC. [pdf]
[3] Killick Rebecca, Fearnhead P, Eckley Idris A(2012). Optimal detection of change points with a linear computational cost. JASA, 107, 1590–1598 (arxiv_link)
[4] Gachomo Dorcas Wambui, Gichuhi Anthony Waititu, Jomo Kenyatta (2015). The Power of the Pruned Exact Linear Time(PELT) Test in Multiple Changepoint Detection American Journal of Theoretical and Applied Statistics
Volume 4, Issue 6, November 2015, Pages: 581–586 (arxiv_link)
[5] Yamanishi, K., & Takeuchi, J. I. (2002). A unifying framework for detecting outliers and change points from non-stationary time series data. 676–681. Paper presented at KDD — 2002 Proceedings of the Eight ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alta, Canada. https://doi.org/10.1145/775047.775148
[6] Militino, A.F.; Moradi, M.; Ugarte, M.D. On the Performances of Trend and Change-Point Detection Methods for Remote Sensing Data. Remote Sens. 2020, 12, 1008. https://doi.org/10.3390/rs12061008
Time Series Articles
If you are interested in a comprehensive survey on time series forecasting and anomaly detection, below is a list that you may find helpful: