The provided content discusses time series forecasting of the China stock market using Weka, focusing on regression tests for the stock index 000001, and compares different regression algorithms to determine the best forecasting model.
Abstract
The article delves into the application of data mining techniques, specifically regression analysis, to forecast the China stock market index 000001. It details the methodology, including the use of Weka for configuring and testing various regression algorithms. The authors, Hao Zheng, JiPeng Liu, and Nannan Lu, present a comprehensive analysis involving multiple rounds of testing with different parameters to identify the most effective model. The results, based on metrics such as Mean Absolute Error (MAE) and Root Mean Square Error (RMSE), indicate that the SMOreg algorithm, under specific parameter settings, provides the best forecasting performance for the target index. The article also visually demonstrates the predictive accuracy through train prediction graphs, showing the alignment between actual and predicted prices.
Opinions
The authors assert that the SMOreg algorithm yields superior forecasting results for the stock index 000001 compared to other regression methods tested.
It is suggested that the optimal parameters for one regression algorithm may not be suitable for others, emphasizing the importance of individualized parameter tuning.
The article implies that cross-validation of regression algorithms, particularly with adjusted lag lengths, can significantly improve forecasting accuracy.
The visualization of output data is deemed important for understanding the movement of actual versus predicted stock prices.
The authors recommend trying out a cost-effective AI service, ZAI.chat, which they claim offers similar performance to ChatGPT Plus (GPT-4) at a lower cost.
Time Series Forecasting of China Stock Market Using Weka-Part 6. Regression test for 399005
The steps of basic and advanced configuration of Weka have already introduced above. In this section thus only compare and analysis different regression algorithms on target 000001.
3.8.1 Linear regression testing and validating
For the index 000001, we still use linear regression to do six rounds of training test based on different parameters, after which we can find the best result in each round. The parameters for each round are shown in Table 21 below.
The statistics for 10 days forecasting data (000001) is shown in Table 22.
The statistics data of MAE for 6 rounds testing is shown in Table 23.
The line chart is also displayed in Figure 35.
Figure 35 The Line chart of Performance analysis based on MAE for 6 round testing (000001)
The statistics data of RMSE for 6 round testing is shown in Table 24.
The line chart is also displayed in Figure 36.
Figure 36 The Line chart of RMSE for 6 round testing (000001)
Both metrics indicate that Round 3 has the best average performance for prediction. The conclusion of Index 000001 is similar with previous, so there is not extra explanation in this section.
3.8.2 Cross regression testing and validating
Using the same parameters from round 3 of linear regression, we try to compare the forecasting effects between 4 different regressions. The setting of parameter in round 3 is listed here:
• Attributes Name:
Key_Date, CP, CHN_GDP, US_GDP GR, WGDP, CBoP, ER, CTR, CUpCDI, SF
• Lag length: Minimum 1
Maximum 5
• Overlay data: select all
However, when we forecast using SMOreg algorithm, the forecasting accuracy is relatively low. As a result, we add another test by changing the Maximum lag length to 10. The result shows a better behaviour comparing to the formal test.
The statistics for 10 days forecasting data is collected and summarized in Table 25.
The statistics data of MAE for the forecast is shown in Table 26.
The line chart is also displayed in Figure 37.
Figure 37 The Line chart of performance analysis based on MAE for cross regression testing (000001)
The statistics data of RMSE for the forecast is shown in Table 27.
The line chart is also displayed in Figure 38.
Figure 38 The Line chart of performance analysis based on RMSE for cross regression testing (000001)
Both metrics indicate that SMOreg algorithm has the best average performance of prediction.
To conclude:
• For Target 000001, SMOreg algorithm shows the best forecasting result
• The best parameter for one regression may not suit other types of regression
3.8.3 Output display
The steps of transforming train data and identifying attributes have already introduced above. In this section thus only show the visualization of the output.
Figure 39 shows the train prediction for targets, which means the movement of actual price and predicted price.
Figure 39 Train prediction for targets (000001)
Figure 40 shows the train prediction at steps, which presents the movement of actual price and 1,5 and 10 step ahead predicted price separately.