avatarSabir Jana, CFA

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

12830

Abstract

hljs-title">data</span>['pos_' + <span class="hljs-title">model</span>] * <span class="hljs-title">data</span>['returns'] strategy_rtn.append(<span class="hljs-built_in">col</span>) strategy_rtn.insert(<span class="hljs-number">0</span>, 'returns')</pre></div><div id="fc67"><pre># fit the models <span class="hljs-built_in">fit_models</span>(stock)</pre></div><div id="e2fe"><pre># derives <span class="hljs-attribute">all</span> <span class="hljs-attribute">position</span> values <span class="hljs-built_in">derive_positions</span>(stock)</pre></div><div id="acd4"><pre># evaluate <span class="hljs-keyword">all</span> trading strategies <span class="hljs-keyword">by</span> multiplying predicted directions <span class="hljs-keyword">to</span> actual daily <span class="hljs-keyword">returns</span> evaluate_strats(stock)</pre></div><div id="c2ff"><pre># calculate total <span class="hljs-built_in">return</span> <span class="hljs-keyword">and</span> <span class="hljs-built_in">std</span>. deviation of each strategy <span class="hljs-built_in">print</span>('\nTotal Returns: \n') <span class="hljs-built_in">print</span>(stock[strategy_rtn].<span class="hljs-built_in">sum</span>().<span class="hljs-built_in">apply</span>(<span class="hljs-built_in">np</span>.<span class="hljs-built_in">exp</span>)) <span class="hljs-built_in">print</span>('\nAnnual Volitility:') stock[strategy_rtn].<span class="hljs-built_in">std</span>() * <span class="hljs-number">252</span> ** <span class="hljs-number">0.5</span></pre></div><div id="29de"><pre># number of trades over time for highest and second highest return strategy print('Number of trades SVM = ', (<span class="hljs-name">stock</span>['pos_svm'].diff()!=0).sum()) print('Number of trades Ramdom Forest = ',(<span class="hljs-name">stock</span>['pos_random_forest'].diff()!=0).sum())</pre></div><div id="9393"><pre><span class="hljs-comment"># vectorized backtesting of the resulting trading strategies and visualize the performance over time</span> ax = stock[strategy_rtn]<span class="hljs-string">.cumsum</span><span class="hljs-params">()</span><span class="hljs-string">.apply</span><span class="hljs-params">(np.exp)</span><span class="hljs-string">.plot</span><span class="hljs-params">(<span class="hljs-attr">figsize</span>=(12, 6)</span>, title = 'Machine Learning Classifiers Return Comparison') ax.<span class="hljs-keyword">set</span>_ylabel<span class="hljs-params">("Cumulative Returns")</span> ax.grid<span class="hljs-params">(True)</span>; plt.tight_layout<span class="hljs-params">()</span>; plt.savefig<span class="hljs-params">('images/chart2', <span class="hljs-attr">dpi</span>=300)</span></pre></div><p id="3fe5">Code commentary:</p><ol><li>Create a dictionary of selected algorithms.</li><li>Define a function that fits all models with <code>direction </code>column as the dependent variable and <code>_bin </code>columns as feature variables.</li><li>Define a function that predicts all position values from the fitted models.</li><li>Define a function to evaluate all trading strategies.</li><li>Next, we fit the models, predict positions, and evaluate all trading strategies by multiplying predicted directions to actual daily returns.</li><li>Calculate the total return and standard deviation of each strategy.</li><li>Calculate the number of trades overtime for the highest and second-highest return strategies.</li><li>Vectorize backtesting of the resulting trading strategies and visualize the performance over time.</li></ol><figure id="3078"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*aWarfLYgx1kjTWetYlfJQA.png"><figcaption>Machine Learning Classifiers Return Comparison</figcaption></figure><figure id="61c8"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*b16bKiLUe-_uq1zEpS6NRA.png"><figcaption>Total Returns and Annual Volatility</figcaption></figure><p id="96a6">We can see that the support vector machine model has given the maximum total returns over time with comparable annual volatility with other models. However, it will be quite immature to deploy any such strategy based on vectorized backtesting results. Some of the reason are listed below:</p><ol><li>The number of trades is quite high and vectorized backtesting doesn’t account for costs such as trading and market slippage.</li><li>The strategy accounts for both long and short positions however short selling may not be feasible due to multiple reasons.</li></ol><p id="fc16">Hence, our backtesting needs to be more realistic and event-driven to address the above gaps.</p><p id="a146"><b>Backtesting of Selected Strategy using Backtrader</b></p><p id="26c0">In this section, we will take our best performing model, i.e. support vector machine (SVM), and perform the backtesting using the python library <a href="https://www.backtrader.com/docu/">Backtrader</a>. The backtesting strategy will be as follows:</p><ol><li>We start with the initial capital of 100, 000 and trading commission as 0.1%.</li><li>We buy when the <code>predicted</code> value is +1 and sell (only if stock is in possession) when the predicted value is -1.</li><li>All-in strategy — when creating a buy order, buy as many shares as possible.</li><li>Short selling is not allowed.</li></ol><p id="e028">Let’s go through the python code:</p><div id="4867"><pre><span class="hljs-comment"># fetch the daily pricing data from yahoo finance</span> prices = yf.download(ticker, <span class="hljs-attribute">progress</span>=<span class="hljs-literal">True</span>, <span class="hljs-attribute">actions</span>=<span class="hljs-literal">True</span>, <span class="hljs-attribute">start</span>=start, <span class="hljs-attribute">end</span>=end) prices.head(2)</pre></div><div id="3f91"><pre><span class="hljs-comment"># rename the columns as needed for Backtrader</span> prices.drop([<span class="hljs-string">'Close'</span>,<span class="hljs-string">'Dividends'</span>,<span class="hljs-string">'Stock Splits'</span>], <span class="hljs-attribute">inplace</span>=<span class="hljs-literal">True</span>, <span class="hljs-attribute">axis</span>=1) prices.rename(columns = {<span class="hljs-string">'Open'</span>:<span class="hljs-string">'open'</span>,<span class="hljs-string">'High'</span>:<span class="hljs-string">'high'</span>,<span class="hljs-string">'Low'</span>:<span class="hljs-string">'low'</span>,<span class="hljs-string">'Adj Close'</span>:<span class="hljs-string">'close'</span>,<span class="hljs-string">'Volume'</span>:<span class="hljs-string">'volume'</span>, }, <span class="hljs-attribute">inplace</span>=<span class="hljs-literal">True</span>) prices.head(3)</pre></div><div id="516c"><pre><span class="hljs-comment"># add the predicted column to prices dataframe. This will be used as signal for buy or sell</span> predictions = stock[<span class="hljs-string">'strategy_svm'</span>] predictions = pd.DataFrame(predictions) predictions.rename(columns = {<span class="hljs-string">'strategy_svm'</span>:<span class="hljs-string">'predicted'</span>}, <span class="hljs-attribute">inplace</span>=<span class="hljs-literal">True</span>) prices = predictions.join(prices, <span class="hljs-attribute">how</span>=<span class="hljs-string">'right'</span>).dropna() prices.head(2)</pre></div><div id="f34e"><pre><span class="hljs-attr">OHLCV</span> = [<span class="hljs-string">'open'</span>, <span class="hljs-string">'high'</span>, <span class="hljs-string">'low'</span>, <span class="hljs-string">'close'</span>, <span class="hljs-string">'volume'</span>]</pre></div><div id="dd15"><pre># <span class="hljs-keyword">class</span> <span class="hljs-symbol">to</span> <span class="hljs-symbol">define</span> <span class="hljs-symbol">the</span> <span class="hljs-symbol">columns</span> <span class="hljs-symbol">we</span> <span class="hljs-symbol">will</span> <span class="hljs-symbol">provide</span> <span class="hljs-symbol">class</span> <span class="hljs-symbol">SignalData</span>(<span class="hljs-symbol">PandasData</span>): """ <span class="hljs-symbol">Define</span> <span class="hljs-symbol">pandas</span> <span class="hljs-symbol">DataFrame</span> <span class="hljs-symbol">structure</span> """ <span class="hljs-symbol">cols</span> = <span class="hljs-symbol">OHLCV</span> + ['<span class="hljs-symbol">predicted</span>']</pre></div><div id="8e2e"><pre><span class="hljs-comment"># create lines</span> <span class="hljs-attr">lines</span> = tuple(cols)</pre></div><div id="dcc6"><pre><span class="hljs-meta"># <span class="hljs-keyword">define</span> parameters</span> <span class="hljs-keyword">params</span> = {c: <span class="hljs-number">-1</span> <span class="hljs-keyword">for</span> c <span class="hljs-keyword">in</span> cols} <span class="hljs-keyword">params</span>.update({<span class="hljs-string">'datetime'</span>: None}) <span class="hljs-keyword">params</span> = tuple(<span class="hljs-keyword">params</span>.items())</pre></div><figure id="e78c"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*6FpWwCSdejX4hqaLKcNfrQ.png"><figcaption>Dataframe with Predicted Column</figcaption></figure><p id="2939">Code commentary:</p><ol><li>Fetch the daily pricing data from yahoo finance and rename the columns as OHLCV format needed for Backtrader.</li><li>Take the SVM strategy returns from the <code>stock </code>dataframe and join it to the <code>prices </code>dataframe. This column’s value will be a signal to buy or sell while placing the order.</li><li>Define a custom <code>SignalData</code> class for dataframe columns to be fed to Backtrader.</li></ol><p id="6aa4">Now, we define the <code>MLStrategy</code> class for the backtesting strategy. It needs to be inherited from <code>bt.Strategy</code>. As we have predicted the market direction on the day’s closing price, hence we will use <code>cheat_on_open=True </code>when creating the <code>bt.Cerebro</code> object. This means the number of shares we want to buy will be based on day t+1’s open price. As a result, we also define the <code>next_open</code> method instead of <code>next</code> within the Strategy class.</p><div id="7fc4"><pre><span class="hljs-comment"># define backtesting strategy class</span> <span class="hljs-keyword">class</span> <span class="hljs-title class_">MLStrategy</span>(bt.Strategy): params = <span class="hljs-built_in">dict</span>( )

<span class="hljs-keyword">def</span> <span class="hljs-title function_">__init__</span>(<span class="hljs-params">self</span>):
    <span class="hljs-comment"># keep track of open, close prices and predicted value in the series</span>
    self.data_predicted = self.datas[<span class="hljs-number">0</span>].predicted
    self.data_open = self.datas[<span class="hljs-number">0</span>].<span class="hljs-built_in">open</span>
    self.data_close = self.datas[<span class="hljs-number">0</span>].close
    
    <span class="hljs-comment"># keep track of pending orders/buy price/buy commission</span>
    self.order = <span class="hljs-literal">None</span>
    self.price = <span class="hljs-literal">None</span>
    self.comm = <span class="hljs-literal">None</span></pre></div><div id="e665"><pre>    <span class="hljs-comment"># logging function</span>
<span class="hljs-keyword">def</span> <span class="hljs-title function_">log</span>(<span class="hljs-params">self, txt</span>):
    <span class="hljs-string">'''Logging function'''</span>
    dt = self.datas[<span class="hljs-number">0</span>].datetime.date(<span class="hljs-number">0</span>).isoformat()
    <span class="hljs-built_in">print</span>(<span class="hljs-string">f'<span class="hljs-subst">{dt}</span>, <span class="hljs-subst">{txt}</span>'</span>)</pre></div><div id="cdb2"><pre>    def notify_order(self, <span class="hljs-keyword">order</span>):
    <span class="hljs-keyword">if</span> <span class="hljs-keyword">order</span>.status <span class="hljs-keyword">in</span> [<span class="hljs-keyword">order</span>.Submitted, <span class="hljs-keyword">order</span>.Accepted]:
        # <span class="hljs-keyword">order</span> already submitted/accepted - <span class="hljs-keyword">no action</span> required
        <span class="hljs-keyword">return</span></pre></div><div id="868f"><pre>        <span class="hljs-comment"># report executed order</span>
    <span class="hljs-keyword">if</span> order.status <span class="hljs-keyword">in</span> [order.Completed]:
        <span class="hljs-keyword">if</span> order.isbuy():
            self.log(<span class="hljs-string">f'BUY EXECUTED --- Price: <span class="hljs-subst">{order.executed.price:<span class="hljs-number">.2</span>f}</span>, C

Options

ost: <span class="hljs-subst">{order.executed.value:<span class="hljs-number">.2</span>f}</span>,Commission: <span class="hljs-subst">{order.executed.comm:<span class="hljs-number">.2</span>f}</span>'</span> ) self.price = order.executed.price self.comm = order.executed.comm <span class="hljs-keyword">else</span>: self.log(<span class="hljs-string">f'SELL EXECUTED --- Price: <span class="hljs-subst">{order.executed.price:<span class="hljs-number">.2</span>f}</span>, Cost: <span class="hljs-subst">{order.executed.value:<span class="hljs-number">.2</span>f}</span>,Commission: <span class="hljs-subst">{order.executed.comm:<span class="hljs-number">.2</span>f}</span>'</span> )</pre></div><div id="a600"><pre> # <span class="hljs-keyword">report</span> failed <span class="hljs-keyword">order</span> elif <span class="hljs-keyword">order</span>.status <span class="hljs-keyword">in</span> [<span class="hljs-keyword">order</span>.Canceled, <span class="hljs-keyword">order</span>.Margin, <span class="hljs-keyword">order</span>.Rejected]: self.<span class="hljs-built_in">log</span>('<span class="hljs-keyword">Order</span> Failed')</pre></div><div id="02cf"><pre> # <span class="hljs-keyword">set</span> <span class="hljs-keyword">no</span> pending <span class="hljs-keyword">order</span> self.<span class="hljs-keyword">order</span> = <span class="hljs-keyword">None</span></pre></div><div id="57c7"><pre> <span class="hljs-keyword">def</span> <span class="hljs-title function_">notify_trade</span>(<span class="hljs-params">self, trade</span>): <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> trade.isclosed: <span class="hljs-keyword">return</span> self.log(<span class="hljs-string">f'OPERATION RESULT --- Gross: <span class="hljs-subst">{trade.pnl:<span class="hljs-number">.2</span>f}</span>, Net: <span class="hljs-subst">{trade.pnlcomm:<span class="hljs-number">.2</span>f}</span>'</span>)</pre></div><div id="035c"><pre> <span class="hljs-comment"># We have set cheat_on_open = True.This means that we calculated the signals on day t's close price, </span> <span class="hljs-comment"># but calculated the number of shares we wanted to buy based on day t+1's open price.</span> <span class="hljs-keyword">def</span> <span class="hljs-title function_">next_open</span>(<span class="hljs-params">self</span>): <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> self.position: <span class="hljs-keyword">if</span> self.data_predicted > <span class="hljs-number">0</span>: <span class="hljs-comment"># calculate the max number of shares ('all-in')</span> size = <span class="hljs-built_in">int</span>(self.broker.getcash() / self.datas[<span class="hljs-number">0</span>].<span class="hljs-built_in">open</span>) <span class="hljs-comment"># buy order</span> self.log(<span class="hljs-string">f'BUY CREATED --- Size: <span class="hljs-subst">{size}</span>, Cash: <span class="hljs-subst">{self.broker.getcash():<span class="hljs-number">.2</span>f}</span>, Open: <span class="hljs-subst">{self.data_open[<span class="hljs-number">0</span>]}</span>, Close: <span class="hljs-subst">{self.data_close[<span class="hljs-number">0</span>]}</span>'</span>) self.buy(size=size) <span class="hljs-keyword">else</span>: <span class="hljs-keyword">if</span> self.data_predicted < <span class="hljs-number">0</span>: <span class="hljs-comment"># sell order</span> self.log(<span class="hljs-string">f'SELL CREATED --- Size: <span class="hljs-subst">{self.position.size}</span>'</span>) self.sell(size=self.position.size)</pre></div><p id="3392">Code commentary:</p><ol><li>The function <code>init</code> tracks open, close, predicted, and pending orders.</li><li>The function <code>notify_order</code> tracks the order status.</li><li>The function <code>notify_trade</code> is triggered if the order is complete and logs profit and loss for the trade.</li><li>The function <code>next_open</code> checks the available cash and calculates the maximum number of shares that can be bought. It places the buy order if we don’t hold any position and the<code>predicted </code>value is greater than zero. Else, it places the sell order if the <code>predicted</code> value is less than zero.</li></ol><p id="32ac">Next, we instantiate <code>SignalData </code>and <code>Cerebro</code> objects and add <code>prices</code> dataframe, <code>MLStrategy</code>, initial capital, commission, and <code>pyfolio </code>analyzer. Finally, we run the backtest and capture the results.</p><div id="ceab"><pre># instantiate SignalData <span class="hljs-keyword">class</span> <span class="hljs-symbol">data</span> = <span class="hljs-symbol">SignalData</span>(<span class="hljs-symbol">dataname</span>=<span class="hljs-symbol">prices</span>)</pre></div><div id="1072"><pre><span class="hljs-comment"># instantiate Cerebro, add strategy, data, initial cash, commission and pyfolio for performance analysis</span> cerebro = bt.Cerebro(stdstats = <span class="hljs-literal">False</span>, <span class="hljs-attribute">cheat_on_open</span>=<span class="hljs-literal">True</span>) cerebro.addstrategy(MLStrategy) cerebro.adddata(data, <span class="hljs-attribute">name</span>=ticker) cerebro.broker.setcash(100000.0) cerebro.broker.setcommission(<span class="hljs-attribute">commission</span>=0.001) cerebro.addanalyzer(bt.analyzers.PyFolio, <span class="hljs-attribute">_name</span>=<span class="hljs-string">'pyfolio'</span>)</pre></div><div id="b600"><pre><span class="hljs-comment"># run the backtest</span> <span class="hljs-built_in">print</span>(<span class="hljs-string">'Starting Portfolio Value: %.2f'</span> % cerebro.broker.getvalue()) backtest_result = cerebro.<span class="hljs-built_in">run</span>() <span class="hljs-built_in">print</span>(<span class="hljs-string">'Final Portfolio Value: %.2f'</span> % cerebro.broker.getvalue()) </pre></div><figure id="e53e"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*DJqTmPqlSnOEhmQPGq261A.png"><figcaption>Backtesting Logs</figcaption></figure><p id="80b6"><b>Performance Analysis of Backtesting</b></p><p id="2ff0">We will analyze the performance statistics using <a href="https://github.com/quantopian/pyfolio"><code>pyfo</code>lio</a> . pyfolio is a Python library for performance and risk analysis of financial portfolios developed by <a href="https://www.quantopian.com/">Quantopian Inc</a>.</p><div id="50eb"><pre><span class="hljs-comment"># Extract inputs for pyfolio</span> <span class="hljs-attribute">strat</span> = backtest_result[<span class="hljs-number">0</span>] <span class="hljs-attribute">pyfoliozer</span> = strat.analyzers.getbyname(‘pyfolio’) <span class="hljs-attribute">returns</span>, positions, transactions, gross_lev = pyfoliozer.get_pf_items() <span class="hljs-attribute">returns</span>.name = ‘Strategy’ <span class="hljs-attribute">returns</span>.head(<span class="hljs-number">2</span>)</pre></div><div id="cefe"><pre># <span class="hljs-keyword">get</span> benchmark <span class="hljs-keyword">returns</span> benchmark_rets= stock[<span class="hljs-string">'returns'</span>] benchmark_rets.<span class="hljs-keyword">index</span> = benchmark_rets.<span class="hljs-keyword">index</span>.tz_localize(<span class="hljs-string">'UTC'</span>) benchmark_rets = benchmark_rets.<span class="hljs-keyword">filter</span>(<span class="hljs-keyword">returns</span>.<span class="hljs-keyword">index</span>) benchmark_rets.name = <span class="hljs-string">'Nifty-50'</span> benchmark_rets.head(<span class="hljs-number">2</span>)</pre></div><div id="975a"><pre># <span class="hljs-keyword">get</span> performance <span class="hljs-keyword">statistics</span> <span class="hljs-keyword">for</span> strategy pf.show_perf_stats(<span class="hljs-keyword">returns</span>)</pre></div><div id="b496"><pre><span class="hljs-comment"># plot performance for strategy vs benchmark</span> fig, ax = plt.subplots(<span class="hljs-attribute">nrows</span>=2, <span class="hljs-attribute">ncols</span>=2, figsize=(16, 9),<span class="hljs-attribute">constrained_layout</span>=<span class="hljs-literal">True</span>) axes = ax.flatten()</pre></div><div id="ce59"><pre>pf.plot_drawdown_periods(<span class="hljs-attribute">returns</span>=returns, <span class="hljs-attribute">ax</span>=axes[0]) axes[0].grid(<span class="hljs-literal">True</span>) pf.plot_rolling_returns(<span class="hljs-attribute">returns</span>=returns, <span class="hljs-attribute">factor_returns</span>=benchmark_rets, <span class="hljs-attribute">ax</span>=axes[1], <span class="hljs-attribute">title</span>=<span class="hljs-string">'Strategy vs Nifty-50'</span>) axes[1].grid(<span class="hljs-literal">True</span>) pf.plot_drawdown_underwater(<span class="hljs-attribute">returns</span>=returns, <span class="hljs-attribute">ax</span>=axes[2]) axes[2].grid(<span class="hljs-literal">True</span>) pf.plot_rolling_sharpe(<span class="hljs-attribute">returns</span>=returns, <span class="hljs-attribute">ax</span>=axes[3]) axes[3].grid(<span class="hljs-literal">True</span>) <span class="hljs-comment"># fig.suptitle('Strategy vs Nifty-50 (Buy and Hold)', fontsize=16, y=0.990)</span></pre></div><div id="a3aa"><pre>plt<span class="hljs-selector-class">.grid</span>(True) plt<span class="hljs-selector-class">.legend</span>() plt<span class="hljs-selector-class">.tight_layout</span>() plt<span class="hljs-selector-class">.savefig</span>(<span class="hljs-string">'images/chart3'</span>, dpi=<span class="hljs-number">300</span>)</pre></div><p id="cd0a">Code commentary:</p><ol><li>We extract inputs needed for pyfolio from the backtesting result.</li><li>Get the benchmark daily returns to compare and contrast with the strategy.</li><li>Get performance statistics for the strategy using pyfolio <code>show_perf_stats</code>.</li><li>Visualize drawdowns, cumulative returns, underwater plot, and rolling Sharpe ratio.</li></ol><figure id="6d79"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*962uNS5J3bwWNoZ0VCZfDg.png"><figcaption>Strategy Performance</figcaption></figure><p id="4865">Let’s analyze the performance of our strategy. The annual return is just 3.9% and the cumulative return is 48% as compared to 8.86 times total return we observed during vectorized backtesting. If we visualize a few other performance parameters in comparison to the benchmark, we can see our strategy is not able to beat the performance of the simple buy and hold strategy.</p><p id="82f9">So the obvious question is why? This is due to the fact that we paid a huge commission for a high number of trades. The second reason; we allowed no short selling while performing backtesting with Backtrader.</p><figure id="0ae8"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*gHWJrP0-U49IDCgc7TdQYw.png"><figcaption>Strategy vs Benchmark</figcaption></figure><p id="156d">In conclusion, often the vectorized backtesting results may look great on paper however we need to consider all aspects of implementation shortfall and feasibility before we decide to implement such a strategy. Also, keep in mind that the capital market is not just about machine learning otherwise all data scientists would have become super-rich by now.</p><p id="fa6d">Happy investing and do leave your comments on the article!</p><p id="0816"><i>Please Note: This analysis is only for educational purposes and the author is not liable for any of your investment decisions.</i></p><p id="c90f">References:</p><ol><li><a href="https://www.amazon.in/Python-Finance-2e-Yves-Hilpisch/dp/1492024333">Python for Finance 2e: Mastering Data-Driven Finance</a> by Yves Hilpisch</li><li><a href="https://www.amazon.in/Python-Finance-Cookbook-libraries-financial-ebook/dp/B083KG9DC7/ref=sr_1_2?dchild=1&amp;keywords=Python+for+Finance+Cookbook&amp;qid=1597938216&amp;s=books&amp;sr=1-2">Python for Finance Cookbook: Over 50 recipes for applying modern Python libraries to financial data analysis</a> by Eryk Lewinson</li><li><a href="https://www.amazon.in/gp/product/B08D9SP6MB/ref=dbs_a_def_rwt_bibl_vppi_i1">Machine Learning for Algorithmic Trading</a> by Stefan Jansen</li><li>Please check out my other articles/ posts on quantitative finance at my <a href="https://www.linkedin.com/in/sabirjana/detail/recent-activity/shares/">Linkedin</a> page or on <a href="https://medium.com/@sabirh.jana">Medium</a>.</li></ol></article></body>

ML Classification Algorithms to Predict Market Movements and Backtesting

In this article, we will use the stock trading strategies based on multiple machine learning classification algorithms to predict the market movement. To analyze the performance we will perform simple vectorized backtesting and then test the best performing strategy using Backtrader to get a more realistic picture. You can find the relevant Jupyter notebook used in this article on my Github page. The overall approach is as follows:

  1. Gathering Historical Pricing Data.
  2. Feature Engineering.
  3. Build and Apply Classification Machine Learning Algorithms.
  4. Backtesting of Selected Strategy using Backtrader.
  5. Performance Analysis of Backtesting.

Gathering Historical Pricing Data

We are going to use the Nifty-50 index for this analysis. We will download the daily closing pricing data with the help of yfinance python library, calculate daily log returns, and derive market direction based on that. We will visualize the closing prices and daily returns to quickly check our data. Let’s go through the code:

# make the necessary imports 
import numpy as np
from matplotlib import pyplot as plt
import pandas as pd
import seaborn as sns
import yfinance as yf
import warnings
from sklearn import linear_model
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.neural_network import MLPClassifier
import datetime
import pyfolio as pf
import backtrader as bt
from backtrader.feeds import PandasData
import warnings
# set the style and ignore warnings
plt.style.use(‘seaborn-colorblind’)
warnings.simplefilter(action=’ignore’, category=FutureWarning)
warnings.filterwarnings(‘ignore’)
# this is to display images in notebook
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
# ticker and the start and end dates for testing
ticker =  '^NSEI' # Nifty 50 benchmark
start = datetime.datetime(2010, 1, 1)
end = datetime.datetime(2020, 7, 31)
# download ticker ‘Adj Close’ price from yahoo finance
stock =  yf.download(ticker, progress=True, actions=True,start=start, end=end)['Adj Close']
stock = pd.DataFrame(stock)
stock.rename(columns = {'Adj Close':ticker}, inplace=True)
stock.head(2)
# calculate daily log returns and market direction
stock['returns'] = np.log(stock / stock.shift(1))
stock.dropna(inplace=True)
stock['direction'] = np.sign(stock['returns']).astype(int)
stock.head(3)
# visualize the closing price and daily returns
fig, ax = plt.subplots(2, 1, sharex=True, figsize = (12,6))
ax[0].plot(stock[ticker], label = f'{ticker} Adj Close')
ax[0].set(title = f'{ticker} Closing Price', ylabel = 'Price')
ax[0].grid(True)
ax[0].legend()
ax[1].plot(stock['returns'], label = 'Daily Returns')
ax[1].set(title = f'{ticker} Daily Retunrs', ylabel = 'Returns')
ax[1].grid(True)
plt.legend()
plt.tight_layout();
plt.savefig('images/chart1', dpi=300)
Daily Closing Prices and Log Returns

Code commentary:

  1. Make the necessary imports.
  2. Set the ticker as index Nifty-50 with start and end dates as 2010–01–01 and 2020–07–31.
  3. Download daily Adj Close data with the help of yfinance from Yahoo Finance.
  4. Calculate daily log returns and market direction using np.sign().astype(int).
  5. Visualize daily closing prices and log returns.

Feature Engineering

In this section, we will create feature variables to predict the market direction. As a first step, we will use five lags of the log-returns series and then digitize them as binary (0, 1) to predict the probability of an upward and a downward market movement as (+1, -1). The python code is as follows:

# define the number of lags
lags = [1, 2, 3, 4, 5]
# compute lagged log returns
cols = []
for lag in lags:
    col = f'rtn_lag{lag}'
    stock[col] = stock['returns'].shift(lag)
    cols.append(col)
stock.dropna(inplace=True)
stock.head(2)
# function to transform the lag returns to binary values (0,+1)
def create_bins(data, bins=[0]):
    global cols_bin
    cols_bin = []
    for col in cols:
        col_bin = col + '_bin'
        data[col_bin] = np.digitize(data[col], bins=bins)  
        cols_bin.append(col_bin)
create_bins(stock)
stock[cols+cols_bin].head(2)
Lag Returns and Corresponding Binary Values (0,+1)

Code commentary:

  1. Compute five days lagged returns and shift the returns series to the number of lags to align them with one day forward return.
  2. Define the function to transform the lag returns to binary values (0,1) using the function np.digitize().

Build and Apply Classification Machine Learning Algorithms

Now we are going to use Logistic regression, Gaussian Naive Bayes, Support Vector Machine (SVM), Random Forest, and MLP Classifier approach to predict the market direction as (+1, -1). Please refer to sklearn documentation for detail on these and other algorithms. We will then evaluate the performance of each of these models using vectorized backtesting and visualize the cumulative returns. Let’s go through the python code:

# create a dictionary of selected algorithms
models = {
 ‘log_reg’: linear_model.LogisticRegression(),
 ‘gauss_nb’: GaussianNB(),
 ‘svm’: SVC(),
 ‘random_forest’: RandomForestClassifier(max_depth=10, n_estimators=100),
 ‘MLP’ : MLPClassifier(max_iter=500),
}
# function that fits all models.
def fit_models(data):  
    mfit = {model: models[model].fit(data[cols_bin], data['direction']) for model in models.keys()}
# function that predicts (derives all position values) from the fitted models
def derive_positions(data):  
    for model in models.keys():
        data['pos_' + model] = models[model].predict(data[cols_bin])
# function to evaluate all trading strategies
def evaluate_strats(data):  
    global strategy_rtn
    strategy_rtn = []
    for model in models.keys():
        col = 'strategy_' + model 
        data[col] = data['pos_' + model] * data['returns']
        strategy_rtn.append(col)
    strategy_rtn.insert(0, 'returns')
# fit the models
fit_models(stock)
# derives all position values
derive_positions(stock)
# evaluate all trading strategies by multiplying predicted directions to actual daily returns
evaluate_strats(stock)
# calculate total return and std. deviation of each strategy
print('\nTotal Returns: \n')
print(stock[strategy_rtn].sum().apply(np.exp))
print('\nAnnual Volitility:')
stock[strategy_rtn].std() * 252 ** 0.5
# number of trades over time for highest and second highest return strategy
print('Number of trades SVM = ', (stock['pos_svm'].diff()!=0).sum())
print('Number of trades Ramdom Forest = ',(stock['pos_random_forest'].diff()!=0).sum())
# vectorized backtesting of the resulting trading strategies and visualize the performance over time
ax = stock[strategy_rtn].cumsum().apply(np.exp).plot(figsize=(12, 6), 
                                                     title = 'Machine Learning Classifiers Return Comparison')
ax.set_ylabel("Cumulative Returns")
ax.grid(True);
plt.tight_layout();
plt.savefig('images/chart2', dpi=300)

Code commentary:

  1. Create a dictionary of selected algorithms.
  2. Define a function that fits all models with direction column as the dependent variable and _bin columns as feature variables.
  3. Define a function that predicts all position values from the fitted models.
  4. Define a function to evaluate all trading strategies.
  5. Next, we fit the models, predict positions, and evaluate all trading strategies by multiplying predicted directions to actual daily returns.
  6. Calculate the total return and standard deviation of each strategy.
  7. Calculate the number of trades overtime for the highest and second-highest return strategies.
  8. Vectorize backtesting of the resulting trading strategies and visualize the performance over time.
Machine Learning Classifiers Return Comparison
Total Returns and Annual Volatility

We can see that the support vector machine model has given the maximum total returns over time with comparable annual volatility with other models. However, it will be quite immature to deploy any such strategy based on vectorized backtesting results. Some of the reason are listed below:

  1. The number of trades is quite high and vectorized backtesting doesn’t account for costs such as trading and market slippage.
  2. The strategy accounts for both long and short positions however short selling may not be feasible due to multiple reasons.

Hence, our backtesting needs to be more realistic and event-driven to address the above gaps.

Backtesting of Selected Strategy using Backtrader

In this section, we will take our best performing model, i.e. support vector machine (SVM), and perform the backtesting using the python library Backtrader. The backtesting strategy will be as follows:

  1. We start with the initial capital of 100, 000 and trading commission as 0.1%.
  2. We buy when the predicted value is +1 and sell (only if stock is in possession) when the predicted value is -1.
  3. All-in strategy — when creating a buy order, buy as many shares as possible.
  4. Short selling is not allowed.

Let’s go through the python code:

# fetch the daily pricing data from yahoo finance
prices = yf.download(ticker, progress=True, actions=True, start=start, end=end)
prices.head(2)
# rename the columns as needed for Backtrader
prices.drop(['Close','Dividends','Stock Splits'], inplace=True, axis=1)
prices.rename(columns = {'Open':'open','High':'high','Low':'low','Adj Close':'close','Volume':'volume',
                         }, inplace=True)
prices.head(3)
# add the predicted column to prices dataframe. This will be used as signal for buy or sell
predictions = stock['strategy_svm']
predictions = pd.DataFrame(predictions)
predictions.rename(columns = {'strategy_svm':'predicted'}, inplace=True)
prices = predictions.join(prices, how='right').dropna()
prices.head(2)
OHLCV = ['open', 'high', 'low', 'close', 'volume']
# class to define the columns we will provide
class SignalData(PandasData):
    """
    Define pandas DataFrame structure
    """
    cols = OHLCV + ['predicted']
# create lines
    lines = tuple(cols)
# define parameters
    params = {c: -1 for c in cols}
    params.update({'datetime': None})
    params = tuple(params.items())
Dataframe with Predicted Column

Code commentary:

  1. Fetch the daily pricing data from yahoo finance and rename the columns as OHLCV format needed for Backtrader.
  2. Take the SVM strategy returns from the stock dataframe and join it to the prices dataframe. This column’s value will be a signal to buy or sell while placing the order.
  3. Define a custom SignalData class for dataframe columns to be fed to Backtrader.

Now, we define the MLStrategy class for the backtesting strategy. It needs to be inherited from bt.Strategy. As we have predicted the market direction on the day’s closing price, hence we will use cheat_on_open=True when creating the bt.Cerebro object. This means the number of shares we want to buy will be based on day t+1’s open price. As a result, we also define the next_open method instead of next within the Strategy class.

# define backtesting strategy class
class MLStrategy(bt.Strategy):
    params = dict(
    )
    
    def __init__(self):
        # keep track of open, close prices and predicted value in the series
        self.data_predicted = self.datas[0].predicted
        self.data_open = self.datas[0].open
        self.data_close = self.datas[0].close
        
        # keep track of pending orders/buy price/buy commission
        self.order = None
        self.price = None
        self.comm = None
    # logging function
    def log(self, txt):
        '''Logging function'''
        dt = self.datas[0].datetime.date(0).isoformat()
        print(f'{dt}, {txt}')
    def notify_order(self, order):
        if order.status in [order.Submitted, order.Accepted]:
            # order already submitted/accepted - no action required
            return
        # report executed order
        if order.status in [order.Completed]:
            if order.isbuy():
                self.log(f'BUY EXECUTED --- Price: {order.executed.price:.2f}, Cost: {order.executed.value:.2f},Commission: {order.executed.comm:.2f}'
                )
                self.price = order.executed.price
                self.comm = order.executed.comm
            else:
                self.log(f'SELL EXECUTED --- Price: {order.executed.price:.2f}, Cost: {order.executed.value:.2f},Commission: {order.executed.comm:.2f}'
                )
        # report failed order
        elif order.status in [order.Canceled, order.Margin, 
                              order.Rejected]:
            self.log('Order Failed')
        # set no pending order
        self.order = None
    def notify_trade(self, trade):
        if not trade.isclosed:
            return
        self.log(f'OPERATION RESULT --- Gross: {trade.pnl:.2f}, Net: {trade.pnlcomm:.2f}')
    # We have set cheat_on_open = True.This means that we calculated the signals on day t's close price, 
    # but calculated the number of shares we wanted to buy based on day t+1's open price.
    def next_open(self):
        if not self.position:
            if self.data_predicted > 0:
                # calculate the max number of shares ('all-in')
                size = int(self.broker.getcash() / self.datas[0].open)
                # buy order
                self.log(f'BUY CREATED --- Size: {size}, Cash: {self.broker.getcash():.2f}, Open: {self.data_open[0]}, Close: {self.data_close[0]}')
                self.buy(size=size)
        else:
            if self.data_predicted < 0:
                # sell order
                self.log(f'SELL CREATED --- Size: {self.position.size}')
                self.sell(size=self.position.size)

Code commentary:

  1. The function __init__ tracks open, close, predicted, and pending orders.
  2. The function notify_order tracks the order status.
  3. The function notify_trade is triggered if the order is complete and logs profit and loss for the trade.
  4. The function next_open checks the available cash and calculates the maximum number of shares that can be bought. It places the buy order if we don’t hold any position and thepredicted value is greater than zero. Else, it places the sell order if the predicted value is less than zero.

Next, we instantiate SignalData and Cerebro objects and add prices dataframe, MLStrategy, initial capital, commission, and pyfolio analyzer. Finally, we run the backtest and capture the results.

# instantiate SignalData class
data = SignalData(dataname=prices)
# instantiate Cerebro, add strategy, data, initial cash, commission and pyfolio for performance analysis
cerebro = bt.Cerebro(stdstats = False, cheat_on_open=True)
cerebro.addstrategy(MLStrategy)
cerebro.adddata(data, name=ticker)
cerebro.broker.setcash(100000.0)
cerebro.broker.setcommission(commission=0.001)
cerebro.addanalyzer(bt.analyzers.PyFolio, _name='pyfolio')
# run the backtest
print('Starting Portfolio Value: %.2f' % cerebro.broker.getvalue())
backtest_result = cerebro.run()
print('Final Portfolio Value: %.2f' % cerebro.broker.getvalue())
Backtesting Logs

Performance Analysis of Backtesting

We will analyze the performance statistics using pyfolio . pyfolio is a Python library for performance and risk analysis of financial portfolios developed by Quantopian Inc.

# Extract inputs for pyfolio
strat = backtest_result[0]
pyfoliozer = strat.analyzers.getbyname(‘pyfolio’)
returns, positions, transactions, gross_lev = pyfoliozer.get_pf_items()
returns.name = ‘Strategy’
returns.head(2)
# get benchmark returns
benchmark_rets= stock['returns']
benchmark_rets.index = benchmark_rets.index.tz_localize('UTC') 
benchmark_rets = benchmark_rets.filter(returns.index)
benchmark_rets.name = 'Nifty-50'
benchmark_rets.head(2)
# get performance statistics for strategy
pf.show_perf_stats(returns)
# plot performance for strategy vs benchmark
fig, ax = plt.subplots(nrows=2, ncols=2, figsize=(16, 9),constrained_layout=True)
axes = ax.flatten()
pf.plot_drawdown_periods(returns=returns, ax=axes[0])
axes[0].grid(True)
pf.plot_rolling_returns(returns=returns,
                        factor_returns=benchmark_rets,
                        ax=axes[1], title='Strategy vs Nifty-50')
axes[1].grid(True)
pf.plot_drawdown_underwater(returns=returns, ax=axes[2])
axes[2].grid(True)
pf.plot_rolling_sharpe(returns=returns, ax=axes[3])
axes[3].grid(True)
# fig.suptitle('Strategy vs Nifty-50 (Buy and Hold)', fontsize=16, y=0.990)
plt.grid(True)
plt.legend()
plt.tight_layout()
plt.savefig('images/chart3', dpi=300)

Code commentary:

  1. We extract inputs needed for pyfolio from the backtesting result.
  2. Get the benchmark daily returns to compare and contrast with the strategy.
  3. Get performance statistics for the strategy using pyfolio show_perf_stats.
  4. Visualize drawdowns, cumulative returns, underwater plot, and rolling Sharpe ratio.
Strategy Performance

Let’s analyze the performance of our strategy. The annual return is just 3.9% and the cumulative return is 48% as compared to 8.86 times total return we observed during vectorized backtesting. If we visualize a few other performance parameters in comparison to the benchmark, we can see our strategy is not able to beat the performance of the simple buy and hold strategy.

So the obvious question is why? This is due to the fact that we paid a huge commission for a high number of trades. The second reason; we allowed no short selling while performing backtesting with Backtrader.

Strategy vs Benchmark

In conclusion, often the vectorized backtesting results may look great on paper however we need to consider all aspects of implementation shortfall and feasibility before we decide to implement such a strategy. Also, keep in mind that the capital market is not just about machine learning otherwise all data scientists would have become super-rich by now.

Happy investing and do leave your comments on the article!

Please Note: This analysis is only for educational purposes and the author is not liable for any of your investment decisions.

References:

  1. Python for Finance 2e: Mastering Data-Driven Finance by Yves Hilpisch
  2. Python for Finance Cookbook: Over 50 recipes for applying modern Python libraries to financial data analysis by Eryk Lewinson
  3. Machine Learning for Algorithmic Trading by Stefan Jansen
  4. Please check out my other articles/ posts on quantitative finance at my Linkedin page or on Medium.
Investing
Machine Learning
Finance
Equity
Trading
Recommended from ReadMedium