Free AI web copilot to create summaries, insights and extended knowledge, download it at here

Abstract

"88a6"> <div> <div> <img class="ratio" src="http://placehold.it/16x9"> <iframe class="" src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fairtable.com%2Fembed%2FshrCxzLyI2aZtrDlz&url=https%3A%2F%2Fairtable.com%2FshrCxzLyI2aZtrDlz&image=https%3A%2F%2Fstatic.airtable.com%2Fimages%2Foembed%2Fairtable.png&key=a19fcc184b9711e1b4764040d3dc5c07&type=text%2Fhtml&schema=airtable" allowfullscreen="" frameborder="0" height="533" width="800"> </div> </div> </figure></iframe></div></div></figure><p id="7a91">The statistics data of Mean absolute error (MAE) for 8 round testing is shown in Table 9.</p> <figure id="3a98"> <div> <div> <img class="ratio" src="http://placehold.it/16x9"> <iframe class="" src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fairtable.com%2Fembed%2Fshr0obJNorhiRcthv&url=https%3A%2F%2Fairtable.com%2Fshr0obJNorhiRcthv&image=https%3A%2F%2Fstatic.airtable.com%2Fimages%2Foembed%2Fairtable.png&key=a19fcc184b9711e1b4764040d3dc5c07&type=text%2Fhtml&schema=airtable" allowfullscreen="" frameborder="0" height="533" width="800"> </div> </div> </figure></iframe></div></div></figure><p id="8fad">The bold values refer to the best performing round.</p><p id="3e01">The line chart is also displayed in Figure 19.</p><figure id="ce0d"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*RCSMkArh40SZCr93pmfs0A.png"><figcaption><b>Figure 19 </b>The Line chart of Performance analysis based on MAE for 8 round testing (399006)</figcaption></figure><p id="2e28">The statistics data of Root mean squared error (RMSE) for 8 round testing is shown in Table 10.</p> <figure id="eb82"> <div> <div> <img class="ratio" src="http://placehold.it/16x9"> <iframe class="" src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fairtable.com%2Fembed%2Fshr5lC0wI2CF0vyvT&url=https%3A%2F%2Fairtable.com%2Fshr5lC0wI2CF0vyvT&image=https%3A%2F%2Fstatic.airtable.com%2Fimages%2Foembed%2Fairtable.png&key=a19fcc184b9711e1b4764040d3dc5c07&type=text%2Fhtml&schema=airtable" allowfullscreen="" frameborder="0" height="533" width="800"> </div> </div> </figure></iframe></div></div></figure><p id="423f">The bold values refer to the best performing round.</p><p id="85ea">The line chart is also displayed in Figure 20.</p><figure id="929a"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*jSz0Rqka0LVr4AM4Ikh1CA.png"><figcaption><b>Figure 20 </b>The Line chart of RMSE for 8 round testing (399006)</figcaption></figure><p id="fe07">Both metrics indicate that round 4 has the best average performance of prediction. Besides, we can find a significant error reduction from round 3 to round 4. The reason is the reduction of the three attributes: IR_C, RRR_L, RRR_SM. Through analysing the output log, we can find the reason is that these are all classified attributes after the transformation of training data. Figure 21 shows the log in the output area.</p><figure id="bcaa"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*K_Y-EBrIGO8gusVc9hMYOA.png"><figcaption><b>Figure 21 </b>Classified attributes in output log</figcaption></figure><p id="36c7">Figure 22 shows the new run chart of 399006 after attribute reduction in round 4.</p><figure id="e838"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*1JK0CslZouOuGDTR8u7ORw.png"><figcaption><b>Figure 22 </b>Run chart of 399006 after attributes reduction</figcaption></figure><p id="db9c">Through the 8 round forecasting test and output evaluation, we can come to the following conclusion:</p><p id="5cc3">1. Round 4 shows the best forecasting result.</p><p id="2491">2. Attributes reduction is very important to improve forecasting results, especially the excluding of classified attributes</p><p id="56a3">3. Parameter adjustment can also have effect on the forecast accuracy.</p><h1 id="159b">3.6.2 Cross regression testing and validating</h1><p id="af64">Using the same parameters from round 4 of linear regression, we try to compare the forecasting effect between 4 different regressions. The setting of parameter in round 4 is listed here:</p><p id="6689">· Attributes name: key_date, Closing_price, CPI_N_C_month, CPI_N_S_month, CPI_C_C_month, CPI_C_S_month, CPI_R_C_month, CPI_R_S_month, PPI_Month, PPI_Total, PMI_M, index, PMI_NM, index, M2, M1, M0, NFC, GDP_GV, GDP_Industry2, GDP_Industry3</p><p id="af38">· Lag length: Minimum lag: 1 , Maximum lag: 5</p><p id="b26b">· Overlay data: use and select all.</p><p id="e1d9">However, when we forecast using Multilayer Perceptron, the forecasting accuracy is relatively low. As a result, we add another test by changing the Maximum lag length to 20. The result shows a better behaviour comparing to the formal test.</p><p id="1151">The statistics for 10 days forecasting data is collected and summarized in Table 11.</p> <figure

Options

id="3b57"> <div> <div> <img class="ratio" src="http://placehold.it/16x9"> <iframe class="" src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fairtable.com%2Fembed%2FshroSHnHTuh9d9QgT&url=https%3A%2F%2Fairtable.com%2FshroSHnHTuh9d9QgT&image=https%3A%2F%2Fstatic.airtable.com%2Fimages%2Foembed%2Fairtable.png&key=a19fcc184b9711e1b4764040d3dc5c07&type=text%2Fhtml&schema=airtable" allowfullscreen="" frameborder="0" height="533" width="800"> </div> </div> </figure></iframe></div></div></figure><p id="dfa4">The statistics data of Mean absolute error (MAE) for the forecast is shown in Table 12.</p> <figure id="41d0"> <div> <div> <img class="ratio" src="http://placehold.it/16x9"> <iframe class="" src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fairtable.com%2Fembed%2FshrUcgKf9QrZISBWb&url=https%3A%2F%2Fairtable.com%2FshrUcgKf9QrZISBWb&image=https%3A%2F%2Fstatic.airtable.com%2Fimages%2Foembed%2Fairtable.png&key=a19fcc184b9711e1b4764040d3dc5c07&type=text%2Fhtml&schema=airtable" allowfullscreen="" frameborder="0" height="533" width="800"> </div> </div> </figure></iframe></div></div></figure><p id="84d9">The bold values refer to the best performing model.</p><p id="8e5e">The line chart is also displayed in Figure 23.</p><figure id="a265"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*vX7-I73s7D4vbQ5caL1Jmw.png"><figcaption><b>Figure 23 </b>The Line chart of performance analysis based on MAE for cross regression testing (399006)</figcaption></figure><p id="1581">The statistics data of Root mean squared error (RMSE) for the forecast is shown in Table 13.</p> <figure id="987a"> <div> <div> <img class="ratio" src="http://placehold.it/16x9"> <iframe class="" src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fairtable.com%2Fembed%2FshrSGQjEhx51hHf4u&url=https%3A%2F%2Fairtable.com%2FshrSGQjEhx51hHf4u&image=https%3A%2F%2Fstatic.airtable.com%2Fimages%2Foembed%2Fairtable.png&key=a19fcc184b9711e1b4764040d3dc5c07&type=text%2Fhtml&schema=airtable" allowfullscreen="" frameborder="0" height="533" width="800"> </div> </div> </figure></iframe></div></div></figure><p id="7b10">The bold values refer to the best performing model.</p><p id="23ed">The line chart is also displayed in Figure 24.</p><figure id="4725"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*Oe3PpVAWUoK3tcXUo6vb-A.png"><figcaption><b>Figure 24 </b>The Line chart of Performance analysis based on RMSE for cross regression testing (399006)</figcaption></figure><p id="0f6a">Both metrics indicate that Linear regression has the best average performance of prediction.</p><p id="d4ca">To conclude:</p><p id="6344">· Results show that all the models accurately predict the stock index 399006, but linear regression in Weka outperforms the other four models</p><p id="f9ef">· The best parameter for one regression may not suit other types of regression</p><h1 id="3003">3.6.3 Output display</h1><p id="ed34"><b> · Transformed training data</b></p><p id="8293">In the output area, we can find that after we set the maximum and minimum length of lag, the target “Closing_price” and “Key_Date” will divided into corresponding number of transformed attributes after the transformation of training data (Figure 25).</p><figure id="61bd"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*U9JaT980SAJ4pVj9V6mcfA.png"><figcaption><b>Figure 25 </b>Transformed training data</figcaption></figure><p id="714b"><b> · Identify positive and negative attributes</b></p><p id="fca3">For linear regression, based on the factor in the generated formula, we can identify the attributes that have more positive or negative contribution to the forecasting result. In Figure 26, we can find the CPI_C_S is the most negative independent attribute, while CPI_C_S is the most positive independent attribute.</p><figure id="80cb"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*aA9rlPk95rf8fje3EWOsWQ.png"><figcaption><b>Figure 26 </b>Identify positive and negative attributes</figcaption></figure><p id="d641"><b> · Data Visualization</b></p><p id="f859">Figure 27 shows the train prediction for targets that shows the movement of actual price(red line) and predicted price(blue line).</p><figure id="9a1c"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*z_ie11weGpqJVCVaku36oQ.png"><figcaption><b>Figure 27 </b>Train prediction for targets (399006)</figcaption></figure><p id="f6a1">Figure 28 shows the train prediction at steps that presents the movement of actual price and 1,5 and 10 step ahead predicted price separately.</p><figure id="9485"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*gIRSL87AEBEpYyw7Z2iHHA.png"><figcaption><b>Figure 28 </b>Train prediction at steps (399006)</figcaption></figure></article></body>

Time Series Forecasting of China Stock Market Using Weka-Part 4. Regression test for 399006

Hao, Zheng

JiPeng, Liu

Nannan, Lu

1. Introduction

1.1 Research background

1.2 Data Mining Applications

1.2.1 Predictive data mining

1.2.2 Literature review on data mining applications

1.3 Regression technique for time series forecasting

1.3.1 Introduction of regression

1.3.2 Regression techniques

2. Methodology

2.1 Flowchart of the system

2.2 Flowchart model introduction

3. Experiments and Results Discussion

3.1 Overview of data set

3.2 Data pre-processing

3.3 Algorithm selection

3.4 Basic configuration of forecasting package

3.5 Advanced configuration of forecasting package

3.6 Regression test for 399006

3.6.1 Linear regression testing and validating

3.6.2 Cross regression testing and validating

3.6.3 Output display

3.7 Regression test for 399005

3.7.1 Linear regression testing and validating

3.7.2 Cross regression testing and validating

3.7.3 Output display

3.8 Regression test for 000001

3.8.1 Linear regression testing and validating

3.8.2 Cross regression testing and validating

3.8.3 Output display

4. Conclusion and future work

4.1 Conclusion

4.2 Future work

Reference

3.6 Regression test for 399006

We designed two types of forecasting test. One is to do fine tune configuration with different input parameters based on regression method. After that, we compare different algorithms based on the same settings of configuration. The parameter of the model with the best performance in the first experiment will be used as the input setting in the second part. Then according to the analysis of the outcome metrics, we can filter out the optimal model.

3.6.1 Linear regression testing and validating

One of the most commonly used forecasting algorithmic is linear regression. We use linear regression to do several rounds of training test based on different parameters. After that, we can get the best result based on comparing the metrics of the outcome in each round.

Then we designed 8 round forecasting test to train the forecasting model with different input parameters. The detailed parameter setting in each round is shown in table 7.

The statistics for 10 days forecasting data is shown in Table 8.

The statistics data of Mean absolute error (MAE) for 8 round testing is shown in Table 9.

The bold values refer to the best performing round.

The line chart is also displayed in Figure 19.

**Figure 19** The Line chart of Performance analysis based on MAE for 8 round testing (399006)

The statistics data of Root mean squared error (RMSE) for 8 round testing is shown in Table 10.

The bold values refer to the best performing round.

The line chart is also displayed in Figure 20.

**Figure 20** The Line chart of RMSE for 8 round testing (399006)

Both metrics indicate that round 4 has the best average performance of prediction. Besides, we can find a significant error reduction from round 3 to round 4. The reason is the reduction of the three attributes: IR_C, RRR_L, RRR_SM. Through analysing the output log, we can find the reason is that these are all classified attributes after the transformation of training data. Figure 21 shows the log in the output area.

**Figure 21** Classified attributes in output log

Figure 22 shows the new run chart of 399006 after attribute reduction in round 4.

**Figure 22** Run chart of 399006 after attributes reduction

Through the 8 round forecasting test and output evaluation, we can come to the following conclusion:

1. Round 4 shows the best forecasting result.

2. Attributes reduction is very important to improve forecasting results, especially the excluding of classified attributes

3. Parameter adjustment can also have effect on the forecast accuracy.

3.6.2 Cross regression testing and validating

Using the same parameters from round 4 of linear regression, we try to compare the forecasting effect between 4 different regressions. The setting of parameter in round 4 is listed here:

· Attributes name: key_date, Closing_price, CPI_N_C_month, CPI_N_S_month, CPI_C_C_month, CPI_C_S_month, CPI_R_C_month, CPI_R_S_month, PPI_Month, PPI_Total, PMI_M, index, PMI_NM, index, M2, M1, M0, NFC, GDP_GV, GDP_Industry2, GDP_Industry3

· Lag length: Minimum lag: 1 , Maximum lag: 5

· Overlay data: use and select all.

However, when we forecast using Multilayer Perceptron, the forecasting accuracy is relatively low. As a result, we add another test by changing the Maximum lag length to 20. The result shows a better behaviour comparing to the formal test.

The statistics for 10 days forecasting data is collected and summarized in Table 11.

The statistics data of Mean absolute error (MAE) for the forecast is shown in Table 12.

The bold values refer to the best performing model.

The line chart is also displayed in Figure 23.

**Figure 23** The Line chart of performance analysis based on MAE for cross regression testing (399006)

The statistics data of Root mean squared error (RMSE) for the forecast is shown in Table 13.

The bold values refer to the best performing model.

The line chart is also displayed in Figure 24.

**Figure 24** The Line chart of Performance analysis based on RMSE for cross regression testing (399006)

Both metrics indicate that Linear regression has the best average performance of prediction.

To conclude:

· Results show that all the models accurately predict the stock index 399006, but linear regression in Weka outperforms the other four models

· The best parameter for one regression may not suit other types of regression

3.6.3 Output display

 · Transformed training data

In the output area, we can find that after we set the maximum and minimum length of lag, the target “Closing_price” and “Key_Date” will divided into corresponding number of transformed attributes after the transformation of training data (Figure 25).

 · Identify positive and negative attributes

For linear regression, based on the factor in the generated formula, we can identify the attributes that have more positive or negative contribution to the forecasting result. In Figure 26, we can find the CPI_C_S is the most negative independent attribute, while CPI_C_S is the most positive independent attribute.

 · Data Visualization

Figure 27 shows the train prediction for targets that shows the movement of actual price(red line) and predicted price(blue line).

**Figure 27** Train prediction for targets (399006)

Figure 28 shows the train prediction at steps that presents the movement of actual price and 1,5 and 10 step ahead predicted price separately.

**Figure 28** Train prediction at steps (399006)

Time Series Forecasting of China Stock Market Using Weka-Part 4. Regression test for 399006

Table of Contents

3.6 Regression test for 399006

3.6.1 Linear regression testing and validating

3.6.2 Cross regression testing and validating

3.6.3 Output display