avatarAvi Chawla

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

5477

Abstract

the accuracy of each decision tree model.</p><p id="a369">In sklearn, individual trees can be accessed with <code>model.estimators_</code> attribute.</p> <figure id="5111"> <div> <div> <img class="ratio" src="http://placehold.it/16x9"> <iframe class="" src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fembed.deepnote.com%2F09206a70-d82a-4458-a9d3-e723bab57c84%2F632a17683b9f41a6bf67f312ad2964b3%2F3203adcc7c65473096c9a7d4c500271f%3Fheight%3D210.9375&amp;display_name=Deepnote&amp;url=https%3A%2F%2Fembed.deepnote.com%2F09206a70-d82a-4458-a9d3-e723bab57c84%2F632a17683b9f41a6bf67f312ad2964b3%2F3203adcc7c65473096c9a7d4c500271f%3Fheight%3D210.9375&amp;image=https%3A%2F%2Fdeepnote.com%2Fstatic%2Fthumbnails%2Fmain.png&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=deepnote" allowfullscreen="" frameborder="0" height="210" width="700"> </div> </div> </figure></iframe></div></div></figure><p id="7217">Thus, we iterate over all trees and compute their <b>validation accuracy</b>:</p> <figure id="7b71"> <div> <div> <img class="ratio" src="http://placehold.it/16x9"> <iframe class="" src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fembed.deepnote.com%2F09206a70-d82a-4458-a9d3-e723bab57c84%2F632a17683b9f41a6bf67f312ad2964b3%2F9e9ff22e835b4968939a7377f4806e01%3Fheight%3D224&amp;display_name=Deepnote&amp;url=https%3A%2F%2Fembed.deepnote.com%2F09206a70-d82a-4458-a9d3-e723bab57c84%2F632a17683b9f41a6bf67f312ad2964b3%2F9e9ff22e835b4968939a7377f4806e01%3Fheight%3D224&amp;image=https%3A%2F%2Fdeepnote.com%2Fstatic%2Fthumbnails%2Fmain.png&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=deepnote" allowfullscreen="" frameborder="0" height="224" width="700"> </div> </div> </figure></iframe></div></div></figure><p id="03b4">The <code>model_accs</code> is a NumPy array that stores tree id and its test accuracy:</p> <figure id="59b8"> <div> <div> <img class="ratio" src="http://placehold.it/16x9"> <iframe class="" src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fembed.deepnote.com%2F09206a70-d82a-4458-a9d3-e723bab57c84%2F632a17683b9f41a6bf67f312ad2964b3%2F688e8e9fa32a47f9922c8ce54a214149%3Fheight%3D210.9375&amp;display_name=Deepnote&amp;url=https%3A%2F%2Fembed.deepnote.com%2F09206a70-d82a-4458-a9d3-e723bab57c84%2F632a17683b9f41a6bf67f312ad2964b3%2F688e8e9fa32a47f9922c8ce54a214149%3Fheight%3D210.9375&amp;image=https%3A%2F%2Fdeepnote.com%2Fstatic%2Fthumbnails%2Fmain.png&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=deepnote" allowfullscreen="" frameborder="0" height="210" width="700"> </div> </div> </figure></iframe></div></div></figure><p id="3fff">Now, we must rearrange the decision tree models in the <code>model.estimators_</code> list in decreasing order of test accuracies:</p> <figure id="a252"> <div> <div> <img class="ratio" src="http://placehold.it/16x9"> <iframe class="" src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fembed.deepnote.com%3A443%2F09206a70-d82a-4458-a9d3-e723bab57c84%2F632a17683b9f41a6bf67f312ad2964b3%2F22adbe15d6444bbfb151af3637f42f12%3Fheight%3D119&amp;display_name=Deepnote&amp;url=https%3A%2F%2Fembed.deepnote.com%3A443%2F09206a70-d82a-4458-a9d3-e723bab57c84%2F632a17683b9f41a6bf67f312ad2964b3%2F22adbe15d6444bbfb151af3637f42f12%3Fheight%3D119&amp;image=https%3A%2F%2Fdeepnote.com%2Fstatic%2Fthumbnails%2Fmain.png&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=deepnote" allowfullscreen="" frameborder="0" height="119" width="700"> </div> </div> </figure></iframe></div></div></figure> <figure id="e752"> <div> <div> <img class="ratio" src="http://placehold.it/16x9"> <iframe class="" src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fembed.deepnote.com%3A443%2F09206a70-d82a-4458-a9d3-e723bab57c84%2F632a17683b9f41a6bf67f312ad2964b3%2Fba1073e120214b35850dade8466d24bd%3Fheight%3D170.1875&amp;display_name=Deepnote&amp;url=https%3A%2F%2Fembed.deepnote.com%3A443%2F09206a70-d82a-4458-a9d3-e723bab57c84%2F632a17683b9f41a6bf67f312ad2964b3%2Fba1073e120214b35850dade8466d24bd%3Fheight%3D170.1875&amp;image=https%3A%2F%2Fdeepnote.com%2Fstatic%2Fthumbnails%2Fmain.png&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=deepnote" allowfullscreen="" frameborder="0" height="170" width="700"> </div> </div> </figure></iframe></div></div></figure><blockquote id="45e8"><p><i>This list tells us that the 99th indexed model is the highest performing, followed by 59th indexed, and so on….</i></p></blockquote><p id="1c89">Now, let’s rearrange the tree models in <code>model.estimators_</code> list in the order of <code>model_ids</code>:</p> <figure id="8fc9"> <div> <div> <img class="ratio" src="http://placehold.it/16x9"> <iframe class="" src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fembed.deepnote.com%2F09206a70-d82a-4458-a9d3-e723bab57c84%2F632a1

Options

7683b9f41a6bf67f312ad2964b3%2F8c2a7feee8a04a8385a53fc9d19c4472%3Fheight%3D119&display_name=Deepnote&url=https%3A%2F%2Fembed.deepnote.com%2F09206a70-d82a-4458-a9d3-e723bab57c84%2F632a17683b9f41a6bf67f312ad2964b3%2F8c2a7feee8a04a8385a53fc9d19c4472%3Fheight%3D119&image=https%3A%2F%2Fdeepnote.com%2Fstatic%2Fthumbnails%2Fmain.png&key=a19fcc184b9711e1b4764040d3dc5c07&type=text%2Fhtml&schema=deepnote" allowfullscreen="" frameborder="0" height="119" width="700"> </div> </div> </figure></iframe></div></div></figure><p id="5737">Done!</p><p id="80ff">Finally, we create the plot discussed earlier.</p><p id="5ef7">It will be a line plot depicting the accuracy of the random forest:</p><ul><li>By considering only the <b>top two</b> decision trees.</li><li>By considering only the <b>top three</b> decision trees.</li><li>By considering only the <b>top four</b> decision trees.</li><li>and so on.</li></ul><p id="5949">The code to compute cumulative accuracies is demonstrated below:</p> <figure id="c3af"> <div> <div> <img class="ratio" src="http://placehold.it/16x9"> <iframe class="" src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fembed.deepnote.com%2F09206a70-d82a-4458-a9d3-e723bab57c84%2F632a17683b9f41a6bf67f312ad2964b3%2Fcf0fa2f6feae411a98ec8a8020f8fda7%3Fheight%3D245&amp;display_name=Deepnote&amp;url=https%3A%2F%2Fembed.deepnote.com%2F09206a70-d82a-4458-a9d3-e723bab57c84%2F632a17683b9f41a6bf67f312ad2964b3%2Fcf0fa2f6feae411a98ec8a8020f8fda7%3Fheight%3D245&amp;image=https%3A%2F%2Fdeepnote.com%2Fstatic%2Fthumbnails%2Fmain.png&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=deepnote" allowfullscreen="" frameborder="0" height="245" width="700"> </div> </div> </figure></iframe></div></div></figure><p id="7f1b">In the above code:</p><ul><li>We create a copy of the base model called <code>small_model</code>.</li><li>In each iteration, we set small_model’s trees to the first “k” trees of the base model.</li><li>Finally, we score the <code>small_model</code> with just “k” trees.</li></ul><p id="5050">Plotting the cumulative accuracy result, we get the following plot:</p><figure id="7ceb"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*iephOk_WUVYI_mrU9hMD4g.png"><figcaption>Cumulative accuracy score</figcaption></figure><p id="b837">It is clear that the max test accuracy appears by only considering <b>6 trees</b>, which is a <b>sixteen-fold</b> reduction in the number of trees.</p><p id="aa1f">Comparing their accuracy and run-time, we get:</p><figure id="0bc8"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*7nlW_15T5F2wvRWY_-CuFw.png"><figcaption>Small model vs usual RF model accuracy and run-time (Image by author)</figcaption></figure><ul><li>We get a <b>6.5% accuracy boost.</b></li><li><b>13 times</b> prediction faster run-time.</li></ul><p id="6a64">Now, tell me something:</p><ul><li>Did we do any retraining or hyperparameter tuning? <b>No.</b></li><li>As we reduced the number of decision trees, didn’t we improve the run time? <b>Of course, we did.</b></li></ul><p id="5ad5">Isn’t that cool?</p><p id="f5c4">Of course, we may not want to overly reduce the ensemble size because we want to ensure our RF still maintains many different types of decision trees.</p><p id="d6aa">The approach to select the best “k” can be quite subjective and it does not necessarily have to rely solely on the validation accuracy.</p><p id="069b">In fact, we can consider the test set to see how the reduced model is performing, as shown below:</p><figure id="bfc4"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*KXf3pmPlModilqgea4CIFg.jpeg"><figcaption></figcaption></figure><p id="e6a0">In the above plot:</p><ul><li>The red depicts the validation accuracy obtained after selecting the first “k” decision trees.</li><li>The blue line depicts the corresponding train accuracy.</li><li>The green line depicts the test accuracy..</li></ul><p id="c6b4">As I mentioned earlier, the selection of decision trees does not necessarily have to be on the validation set as it may lead to overfitting the validation set.</p><p id="32e9">In fact, we can see that the optimal decision trees we obtained from the validation set does not entirely translate to what we see on the test set.</p><p id="2863">Yet, a message that is clear is that we do not need all decision trees in an RF.</p><p id="270a">Picking the best one based on what we see in the metrics can be much more optimal, both in terms of speed and accuracy.</p><p id="fc57">What are your thoughts?</p><p id="94f0">👉 Find the code for this post here: <a href="https://deepnote.com/workspace/pandas-one-liners-bb4c3b40-37ee-48c9-bb3d-024ed8b5b43c/project/Random-Forest-Model-09206a70-d82a-4458-a9d3-e723bab57c84/notebook/Notebook%201-632a17683b9f41a6bf67f312ad2964b3"><b>Code notebook</b></a>.</p><p id="7b57">👉 Over to you: What are some other cool ways to improve model run-time?</p><p id="16cf">Thanks for reading!</p><p id="68ff"><a href="https://avichawla.substack.com/"><b>🚀 Get a Free Data Science PDF (550+ pages) with 320+ posts by subscribing to my daily newsletter today</b></a><b>:</b></p><figure id="f606"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*LX5Q3JBWBu4s2r_x.gif"><figcaption></figcaption></figure></article></body>

Your Random Forest Model is Never the Best Random Forest Model You Can Build

The coolest trick to improve random forest models.

Photo by Adarsh Kummur on Unsplash

Random forest is a pretty powerful and robust model, which is a combination of many different decision trees.

Decision tree vs Random forest (Image by author)

But the biggest problem is that whenever we use random forest, we always create much more base decision trees than required.

Of course, this can be tuned as a hyperparameter, but it requires training many different random forest models, which takes time.

Today, I will share one of the most incredible tricks I formulated recently to:

  • Increase the accuracy of a random forest model:
  • Decrease its size.
  • Drastically increase its prediction run-time.

And all this without having to ever retrain the model.

Are you ready?

Let’s begin!

Logic

We know that a random forest model is an ensemble of many different decision trees:

Prediction from a random forest model (Image by author)

The final prediction is generated by aggregating the predictions from each individual and independent decision tree.

As each decision tree in a random forest is independent, this means that each decision tree will have some test accuracy, right?

Accuracy of trees in Random Forest (Image by author)

This also means that there will be some underperforming and some well-performing decision trees. Agreed?

So what if we do the following:

  • We find the validation accuracy of every decision tree.
  • We sort the accuracies in decreasing order.
  • We keep only the “k” top-performing decision trees and remove the rest.

Once done, we’ll be only left with the best-performing decision trees in our random forest, as evaluated on the test set.

Cool, isn’t it?

Now, how to decide “k”?

It’s simple.

We can create a cumulative accuracy plot.

It will be a line plot depicting the accuracy of the random forest:

  • Considering only the top two decision trees.
  • Considering only the top three decision trees.
  • Considering only the top four decision trees.
  • And so on.

It is expected that the accuracy will first increase with the number of decision trees and then decrease.

Looking at this plot, we can find the most optimal “k”.

Implementation

Let’s look at its implementation.

First, we create a classification dataset:

First, we train our random forest as we usually would:

Next, we must compute the accuracy of each decision tree model.

In sklearn, individual trees can be accessed with model.estimators_ attribute.

Thus, we iterate over all trees and compute their validation accuracy:

The model_accs is a NumPy array that stores tree id and its test accuracy:

Now, we must rearrange the decision tree models in the model.estimators_ list in decreasing order of test accuracies:

This list tells us that the 99th indexed model is the highest performing, followed by 59th indexed, and so on….

Now, let’s rearrange the tree models in model.estimators_ list in the order of model_ids:

Done!

Finally, we create the plot discussed earlier.

It will be a line plot depicting the accuracy of the random forest:

  • By considering only the top two decision trees.
  • By considering only the top three decision trees.
  • By considering only the top four decision trees.
  • and so on.

The code to compute cumulative accuracies is demonstrated below:

In the above code:

  • We create a copy of the base model called small_model.
  • In each iteration, we set small_model’s trees to the first “k” trees of the base model.
  • Finally, we score the small_model with just “k” trees.

Plotting the cumulative accuracy result, we get the following plot:

Cumulative accuracy score

It is clear that the max test accuracy appears by only considering 6 trees, which is a sixteen-fold reduction in the number of trees.

Comparing their accuracy and run-time, we get:

Small model vs usual RF model accuracy and run-time (Image by author)
  • We get a 6.5% accuracy boost.
  • 13 times prediction faster run-time.

Now, tell me something:

  • Did we do any retraining or hyperparameter tuning? No.
  • As we reduced the number of decision trees, didn’t we improve the run time? Of course, we did.

Isn’t that cool?

Of course, we may not want to overly reduce the ensemble size because we want to ensure our RF still maintains many different types of decision trees.

The approach to select the best “k” can be quite subjective and it does not necessarily have to rely solely on the validation accuracy.

In fact, we can consider the test set to see how the reduced model is performing, as shown below:

In the above plot:

  • The red depicts the validation accuracy obtained after selecting the first “k” decision trees.
  • The blue line depicts the corresponding train accuracy.
  • The green line depicts the test accuracy..

As I mentioned earlier, the selection of decision trees does not necessarily have to be on the validation set as it may lead to overfitting the validation set.

In fact, we can see that the optimal decision trees we obtained from the validation set does not entirely translate to what we see on the test set.

Yet, a message that is clear is that we do not need all decision trees in an RF.

Picking the best one based on what we see in the metrics can be much more optimal, both in terms of speed and accuracy.

What are your thoughts?

👉 Find the code for this post here: Code notebook.

👉 Over to you: What are some other cool ways to improve model run-time?

Thanks for reading!

🚀 Get a Free Data Science PDF (550+ pages) with 320+ posts by subscribing to my daily newsletter today:

Data Science
Artificial Intelligence
Machine Learning
Python
Technology
Recommended from ReadMedium