Stock Price Prediction — Features that are handy in Stock Price Prediction with Machine Learning
This is my third post related to stock price prediction, you can find the first and the second one below.
In this post, I tried to summarize what I found out about the features used in the prediction of stock prices in previous research.
‘It is a capital mistake to theorize before one has data.’ Arthur Conan Doyle
An issue worth mentioning is the adequacy of the indicators used for stock price prediction. Weng et al. question why the previous price is used as a predictor and claim a very limited number of papers use macroeconomic indicators for stock price prediction. The papers which say that macroeconomic indicators are good predictors of stock prices, usually use one index only and its historical prices, too. They point out that this is inadequate to prove that macroeconomic indicators are good predictors of the stock markets.
The independent variables used in stock market prediction problems are numerous. In some previous papers, different researchers found that the macroeconomic indicators which have the most potential to impact stock prices are oil prices, housing prices, interest rates, foreign markets, and inflation. Some other papers concluded that industrial production, risk premium change, yield curve twist, and inflation all affect the variability of stock returns significantly. Enke and Thawornwong (2005) think that financial data rely highly on economic, political, international, and natural events. Indeed, the financial markets experienced a huge shock globally during the COVID-19 pandemic in 2020.

Weng et al. performed a feature selection method for indexes and found out that different macroeconomic indicators may be relevant for different indices. They found out that the macroeconomic indicators alone are better predictors than historical prices. One of their hypotheses was that the errors from the models were not completely random and can be explained by the macroeconomic indicators. They also found out that the bias (errors in the model) can be predicted by the macroeconomic indicators and this approach increases the accuracy metrics by 25–50%. On the other hand, D. Enke and S. Thawornwong state that long-term treasury rates and commercial papers are usually used in these types of studies. DeBondt and Thaler used variables such as cumulative excess return, the market value of equity, the market value of equity divided by the book value of equity, and company assets to rank the stocks and used these rankings as predictive variables. They conclude from their work that most of the portfolios of losers in a previous period outperform in the next period. In one of the studies using LSTM for stock price prediction, Fischer and Krauss examined the feature importance for their model. They found out that the winners’ portfolio consists of below-mean momentum, strong short-term reversal characteristics, high volatility, and high beta. In another one, Bao et al. (2017) used Haar wavelet transform to denoise the financial time series, then stacked autoencoders to extract deep invariant daily features of it, and finally LSTM for two indices from each developing, relatively developed, and developed markets. Their model outperforms the models without stacked auto-encoders in both predictive accuracy and profitability regardless of the index chosen for examination. An auto-encoder creates a more cost-effective representation of the input data, creating a bottleneck structure to approximate the data by itself.
Another problem in stock market prediction is predictability. Eom et al. used a neural network model to assess market predictability. The methods they used in this research are Hurst exponent which is the measure for long-term memory in time series data and ApEn (approximate entropy) which is a statistical measure of regularity and predictability. With these methods, they estimated market efficiency and showed that the relationship between their averages is negative. Their findings reveal that the Hurst exponent has a higher correlation with the predictability of the market.
As for reinforcement learning for stock market prediction, this is a new topic and generally, the assumptions made are not realistic, such as no transaction costs, no liquidity issues, and no bid or ask spread issues. Some deep learning and deep reinforcement learning methods, including deep Q-learning, enhanced prediction to extract better information and to find the optimal strategy mostly in complex and dynamic market conditions. Chakole et al. built a computer program making automated trading decisions and proposed 2 different models with different representations of a state, cluster, and candlestick as representations of the states. Both models outperform the Buy and Hold strategy and decision tree whereas the model using clusters as a representation of a state performs better. They used transaction cost together with q-learning in their study and conclude that the initial values of q-values play an important role in the final results.

Carta et al. use an ensemble of deep learning and deep reinforcement learning, given intra-day data. They end up with better trading results, together with less overfitting. They claim that, in chaotic environments, such as the stock market, relying solely on market prices and a single ensemble technique may produce acceptable but inferior trading results.
That was all for stock price prediction for now. Please read my other posts below.
I recommend you to read this book as well.
Thank you!
This post may contain affilliate links.
