avatarHarry zheng

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

4511

Abstract

on target 399005.</p><h1 id="e8f8">3.7.1 Linear regression testing and validating</h1><p id="573b">In this section, we did 7 round tests using linear regression to find the related attribute and best lag number for future analysis. From table 14, we can see round 1 is initial test with all attributes. In round 2 and round 3, some attributes like Volume and import_cu were removed for its irrelevant with this index. In next round 4, 5, 6, unchecked overlay data, lag number 1–20 and 1–10 were compared to find the best performance. In round 7, we tested the algorithm with the listed attributes and lag number 1–5.</p> <figure id="f4b8"> <div> <div> <img class="ratio" src="http://placehold.it/16x9"> <iframe class="" src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fairtable.com%2Fembed%2FshrXDKGONozePP5ma&amp;url=https%3A%2F%2Fairtable.com%2FshrXDKGONozePP5ma&amp;image=https%3A%2F%2Fstatic.airtable.com%2Fimages%2Foembed%2Fairtable.png&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=airtable" allowfullscreen="" frameborder="0" height="533" width="800"> </div> </div> </figure></iframe></div></div></figure><p id="ddcd">The statistics for 10 days forecasting data is shown in Table 15.</p> <figure id="9f4e"> <div> <div> <img class="ratio" src="http://placehold.it/16x9"> <iframe class="" src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fairtable.com%2Fembed%2Fshr23MRFi6DTYH3SI&amp;url=https%3A%2F%2Fairtable.com%2Fshr23MRFi6DTYH3SI&amp;image=https%3A%2F%2Fstatic.airtable.com%2Fimages%2Foembed%2Fairtable.png&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=airtable" allowfullscreen="" frameborder="0" height="533" width="800"> </div> </div> </figure></iframe></div></div></figure><p id="3894">The statistics data of Mean absolute error (MAE) for 7 rounds testing is shown in Table 16.</p> <figure id="b581"> <div> <div> <img class="ratio" src="http://placehold.it/16x9"> <iframe class="" src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fairtable.com%2Fembed%2FshrVZh34RQevnZZN7&amp;url=https%3A%2F%2Fairtable.com%2FshrVZh34RQevnZZN7&amp;image=https%3A%2F%2Fstatic.airtable.com%2Fimages%2Foembed%2Fairtable.png&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=airtable" allowfullscreen="" frameborder="0" height="533" width="800"> </div> </div> </figure></iframe></div></div></figure><p id="0fde">The bold values refer to the best performing round.</p><p id="00c9">The line chart is also displayed in Figure 29.</p><figure id="12f1"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*8axunHJt3bSkpq-2_Ub3vQ.png"><figcaption><b>Figure 29 </b>The Line chart of MAE for 7 round testing (399005)</figcaption></figure><p id="2ca5">The statistics data of Root mean squared error (RMSE) for 7 rounds testing is shown in Table 17.</p> <figure id="49cc"> <div> <div> <img class="ratio" src="http://placehold.it/16x9"> <iframe class="" src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fairtable.com%2Fembed%2FshrxbJg8A4iALpqir&amp;url=https%3A%2F%2Fairtable.com%2FshrxbJg8A4iALpqir&amp;image=https%3A%2F%2Fstatic.airtable.com%2Fimages%2Foembed%2Fairtable.png&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=airtable" allowfullscreen="" frameborder="0" height="533" width="800"> </div> </div> </figure></iframe></div></div></figure><p id="fd89">The bold values refer to the best performing round.</p><p id="973a">The line chart is also displayed in Figure 30.</p><figure id="6a18"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*SjgQCClpzdluu8irax1ffw.png"><figcaption><b>Figure 30 </b>The Line chart of RMSE for 7 round testing (399005)</figcaption></figure><p id="eb53"><b>Figure 30 </b>The Line chart of RMSE for 7 round testing (399005)</p><p id="6f85">Analyzing the outputs of those 7 rounds tests on Weka, Round 7 has the minimum Mean Absolute Error and Root Mean Squared Error in Table 16 and Table 17. Therefore, we chosen attributes and configurations of round 7 for further analysis.</p><h1 id="bcc0">3.7

Options

.2 Cross regression testing and validating</h1><p id="cb50">The setting of parameter in Round 7 is listed here:</p><p id="644b">· Attributes name: key_date, Closing_price, ppi_cu, pmi_product, currency_m0, currency_m1, currency_m2, shibor_1d, shibor_1w, shibor_2w, shibor_1m, shibor_3m, shibor_6m, shibor_9m, shibor_1y</p><p id="092c">· lag length: Minimum lag: 1</p><p id="6aa8">Maximum lag: 5</p><p id="6419">· overlay data: use and select all.</p><p id="091f">For time serial predication, it contains the following algorithms liner regression, Gaussian processes, multilayer perceptron and SMOreg. For time limited, we used the attributes and configuration liner regression as a standard to test the performance between those four algorithms.</p><p id="4d96">The statistics for 10 days forecasting data is collected and summarized in Table 18.</p> <figure id="8079"> <div> <div> <img class="ratio" src="http://placehold.it/16x9"> <iframe class="" src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fairtable.com%2Fembed%2Fshr2GZ2hFvQyAipeu&amp;url=https%3A%2F%2Fairtable.com%2Fshr2GZ2hFvQyAipeu&amp;image=https%3A%2F%2Fstatic.airtable.com%2Fimages%2Foembed%2Fairtable.png&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=airtable" allowfullscreen="" frameborder="0" height="533" width="800"> </div> </div> </figure></iframe></div></div></figure><p id="1fbb">The statistics data of Mean absolute error (MAE) for the forecast is shown in Table 19.</p> <figure id="b429"> <div> <div> <img class="ratio" src="http://placehold.it/16x9"> <iframe class="" src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fairtable.com%2Fembed%2FshrXEM6WgzoOvqcxh&amp;url=https%3A%2F%2Fairtable.com%2FshrXEM6WgzoOvqcxh&amp;image=https%3A%2F%2Fstatic.airtable.com%2Fimages%2Foembed%2Fairtable.png&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=airtable" allowfullscreen="" frameborder="0" height="533" width="800"> </div> </div> </figure></iframe></div></div></figure><p id="4604">The bold values refer to the best performing round.</p><p id="2164">The line chart is also displayed in Figure 31.</p><figure id="98dc"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*Z6uTWA9Rw1Is-7xusdnC6g.png"><figcaption><b>Figure 31 </b>The Line chart of MAE for cross regression testing (399005)</figcaption></figure><p id="4eef">The statistics data of Root mean squared error (RMSE) for the forecast is shown in Table 20.</p> <figure id="4e99"> <div> <div> <img class="ratio" src="http://placehold.it/16x9"> <iframe class="" src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fairtable.com%2Fembed%2FshrmuMKpkz2Y53U8T&amp;url=https%3A%2F%2Fairtable.com%2FshrmuMKpkz2Y53U8T&amp;image=https%3A%2F%2Fstatic.airtable.com%2Fimages%2Foembed%2Fairtable.png&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=airtable" allowfullscreen="" frameborder="0" height="533" width="800"> </div> </div> </figure></iframe></div></div></figure><p id="f89d">The bold values refer to the best performing round.</p><p id="52fc">The line chart is also displayed in Figure 32.</p><figure id="7219"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*yvBysRLKgaE5KfPYBQ04Pw.png"><figcaption><b>Figure 32 </b>The Line chart of RMSE for cross regression testing (399005)</figcaption></figure><p id="d230">From Table 19–20, for 399005 index, Gaussian processes have the best predication with lowest MAE and RMSE.</p><h1 id="7db8">3.7.3 Output display</h1><p id="f81f">Figure 33 is the train prediction for targets with the actual price in red and predication price in blue.</p><figure id="8fb2"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*yYq8AQ4T3EfPTc2_efgMHg.png"><figcaption><b>Figure 33</b> Train prediction for targets (399005)</figcaption></figure><p id="5827">Figure 34 is the train prediction at steps with the actual price and 1, 5 and 10 step ahead predicted price separately.</p><figure id="728c"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*XRc7ZK8iyFGvhzdj3U_Lsg.png"><figcaption><b>Figure 34</b> Train prediction at steps (399005)</figcaption></figure></article></body>

Time Series Forecasting of China Stock Market Using Weka-Part 5. Regression test for 399005

Hao, Zheng

JiPeng, Liu

Nannan, Lu

Table of Contents

1. Introduction

1.1 Research background

1.2 Data Mining Applications

1.2.1 Predictive data mining

1.2.2 Literature review on data mining applications

1.3 Regression technique for time series forecasting

1.3.1 Introduction of regression

1.3.2 Regression techniques

2. Methodology

2.1 Flowchart of the system

2.2 Flowchart model introduction

3. Experiments and Results Discussion

3.1 Overview of data set

3.2 Data pre-processing

3.3 Algorithm selection

3.4 Basic configuration of forecasting package

3.5 Advanced configuration of forecasting package

3.5.1 Base learner

3.5.2 Lag creation

3.5.3 Overlay data

3.5.4 Evaluation

3.5.5 Output

3.6 Regression test for 399006

3.6.1 Linear regression testing and validating

3.6.2 Cross regression testing and validating

3.6.3 Output display

3.7 Regression test for 399005

3.7.1 Linear regression testing and validating

3.7.2 Cross regression testing and validating

3.7.3 Output display

3.8 Regression test for 000001

3.8.1 Linear regression testing and validating

3.8.2 Cross regression testing and validating

3.8.3 Output display

4. Conclusion and future work

4.1 Conclusion

4.2 Future work

Reference

3.7 Regression test for 399005

The steps of basic and advanced configuration of Weka have already introduced above. In this section thus only compare and analysis different regression algorithms on target 399005.

3.7.1 Linear regression testing and validating

In this section, we did 7 round tests using linear regression to find the related attribute and best lag number for future analysis. From table 14, we can see round 1 is initial test with all attributes. In round 2 and round 3, some attributes like Volume and import_cu were removed for its irrelevant with this index. In next round 4, 5, 6, unchecked overlay data, lag number 1–20 and 1–10 were compared to find the best performance. In round 7, we tested the algorithm with the listed attributes and lag number 1–5.

The statistics for 10 days forecasting data is shown in Table 15.

The statistics data of Mean absolute error (MAE) for 7 rounds testing is shown in Table 16.

The bold values refer to the best performing round.

The line chart is also displayed in Figure 29.

Figure 29 The Line chart of MAE for 7 round testing (399005)

The statistics data of Root mean squared error (RMSE) for 7 rounds testing is shown in Table 17.

The bold values refer to the best performing round.

The line chart is also displayed in Figure 30.

Figure 30 The Line chart of RMSE for 7 round testing (399005)

Figure 30 The Line chart of RMSE for 7 round testing (399005)

Analyzing the outputs of those 7 rounds tests on Weka, Round 7 has the minimum Mean Absolute Error and Root Mean Squared Error in Table 16 and Table 17. Therefore, we chosen attributes and configurations of round 7 for further analysis.

3.7.2 Cross regression testing and validating

The setting of parameter in Round 7 is listed here:

· Attributes name: key_date, Closing_price, ppi_cu, pmi_product, currency_m0, currency_m1, currency_m2, shibor_1d, shibor_1w, shibor_2w, shibor_1m, shibor_3m, shibor_6m, shibor_9m, shibor_1y

· lag length: Minimum lag: 1

Maximum lag: 5

· overlay data: use and select all.

For time serial predication, it contains the following algorithms liner regression, Gaussian processes, multilayer perceptron and SMOreg. For time limited, we used the attributes and configuration liner regression as a standard to test the performance between those four algorithms.

The statistics for 10 days forecasting data is collected and summarized in Table 18.

The statistics data of Mean absolute error (MAE) for the forecast is shown in Table 19.

The bold values refer to the best performing round.

The line chart is also displayed in Figure 31.

Figure 31 The Line chart of MAE for cross regression testing (399005)

The statistics data of Root mean squared error (RMSE) for the forecast is shown in Table 20.

The bold values refer to the best performing round.

The line chart is also displayed in Figure 32.

Figure 32 The Line chart of RMSE for cross regression testing (399005)

From Table 19–20, for 399005 index, Gaussian processes have the best predication with lowest MAE and RMSE.

3.7.3 Output display

Figure 33 is the train prediction for targets with the actual price in red and predication price in blue.

Figure 33 Train prediction for targets (399005)

Figure 34 is the train prediction at steps with the actual price and 1, 5 and 10 step ahead predicted price separately.

Figure 34 Train prediction at steps (399005)
Machine Learning
Data Mining
Predictions
Stock Market
Recommended from ReadMedium