avatarSalma El Shahawy

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

11010

Abstract

604938 0.748918 0.715802 0.748918 f1-score 0.810458 0.628205 0.748918 0.719331 0.746551 support 150.000000 81.000000 0.748918 231.000000 231.000000</pre></div><h1 id="2a16">2. Boosting</h1><p id="05bb">Boosting is another technique to build multiple models (also from the same type); however, <b><i>each model learns to fix the previous model’s prediction errors</i></b> across the models’ sequence. Boosting is primarily used to balance the <b>bias</b> and <b>variance</b> in the <b><i>supervised machine learning models</i></b>. Boosting is an algorithm that converts weak learners into strong ones.</p><p id="3555" type="7">Boosting manages to build a base estimator sequentially from weak ones, then reduce the combined estimators’ bias.</p><figure id="2c3c"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*omt2p7CEVyI6SiAg8IuUsw.png"><figcaption>Boosting ensemble in sequential — image by the author</figcaption></figure><h2 id="fc89">2.1 AdaBoost (AD)</h2><p id="452f">AdaBoost (AD) weighs the dataset instances by classifying features. This enables the algorithm to account for these features in constructing the subsequent model.</p><div id="dc1e"><pre>from sklearn<span class="hljs-selector-class">.ensemble</span> import AdaBoostClassifier

ada_boost_clf = <span class="hljs-built_in">AdaBoostClassifier</span>(n_estimators=<span class="hljs-number">30</span>) ada_boost_clf<span class="hljs-selector-class">.fit</span>(X_train, y_train) <span class="hljs-function"><span class="hljs-title">evaluate</span><span class="hljs-params">(ada_boost_clf, X_train, X_test, y_train, y_test)</span></span></pre></div><div id="0c96"><pre>TRAINIG RESULTS:

CONFUSION MATRIX: [[<span class="hljs-number">314</span> <span class="hljs-number">36</span>] [ <span class="hljs-number">49</span> <span class="hljs-number">138</span>]] ACCURACY SCORE: <span class="hljs-number">0</span>.<span class="hljs-number">8417</span> CLASSIFICATION REPORT: <span class="hljs-number">0</span> <span class="hljs-number">1</span> accuracy macro avg weighted avg precision <span class="hljs-number">0.865014</span> <span class="hljs-number">0.793103</span> <span class="hljs-number">0.841713</span> <span class="hljs-number">0.829059</span> <span class="hljs-number">0.839972</span> recall <span class="hljs-number">0.897143</span> <span class="hljs-number">0.737968</span> <span class="hljs-number">0.841713</span> <span class="hljs-number">0.817555</span> <span class="hljs-number">0.841713</span> f1-score <span class="hljs-number">0.880785</span> <span class="hljs-number">0.764543</span> <span class="hljs-number">0.841713</span> <span class="hljs-number">0.822664</span> <span class="hljs-number">0.840306</span> support <span class="hljs-number">350</span>.<span class="hljs-number">000000</span> <span class="hljs-number">187.000000</span> <span class="hljs-number">0.841713</span> <span class="hljs-number">537</span>.<span class="hljs-number">000000</span> <span class="hljs-number">537</span>.<span class="hljs-number">000000</span> TESTING RESULTS:

CONFUSION MATRIX: [[<span class="hljs-number">129</span> <span class="hljs-number">21</span>] [ <span class="hljs-number">36</span> <span class="hljs-number">45</span>]] ACCURACY SCORE: <span class="hljs-number">0</span>.<span class="hljs-number">7532</span> CLASSIFICATION REPORT: <span class="hljs-number">0</span> <span class="hljs-number">1</span> accuracy macro avg weighted avg precision <span class="hljs-number">0.781818</span> <span class="hljs-number">0.681818</span> <span class="hljs-number">0.753247</span> <span class="hljs-number">0.731818</span> <span class="hljs-number">0.746753</span> recall <span class="hljs-number">0.860000</span> <span class="hljs-number">0.555556</span> <span class="hljs-number">0.753247</span> <span class="hljs-number">0.707778</span> <span class="hljs-number">0.753247</span> f1-score <span class="hljs-number">0.819048</span> <span class="hljs-number">0.612245</span> <span class="hljs-number">0.753247</span> <span class="hljs-number">0.715646</span> <span class="hljs-number">0.746532</span> support <span class="hljs-number">150.000000</span> <span class="hljs-number">81.000000</span> <span class="hljs-number">0.753247</span> <span class="hljs-number">231.000000</span> <span class="hljs-number">231.000000</span></pre></div><h2 id="982c">2.2 Stochastic Gradient Boosting ( SGB )</h2><p id="9613">Stochastic Gradient Boosting (SGB) is one of the advanced ensemble algorithms. At each iteration, SGB randomly draws a sub-sample from the training set (<b>without replacement</b>). The sub-sample is then utilized to fit the base model(learner) until the error becomes stable.</p><div id="521f"><pre>from sklearn<span class="hljs-selector-class">.ensemble</span> import GradientBoostingClassifier

grad_boost_clf = <span class="hljs-built_in">GradientBoostingClassifier</span>(n_estimators=<span class="hljs-number">100</span>, random_state=<span class="hljs-number">42</span>) grad_boost_clf<span class="hljs-selector-class">.fit</span>(X_train, y_train) <span class="hljs-function"><span class="hljs-title">evaluate</span><span class="hljs-params">(grad_boost_clf, X_train, X_test, y_train, y_test)</span></span></pre></div><div id="4221"><pre>TRAINIG RESULTS:

CONFUSION MATRIX: [[<span class="hljs-number">339</span> <span class="hljs-number">11</span>] [ <span class="hljs-number">26</span> <span class="hljs-number">161</span>]] ACCURACY SCORE: <span class="hljs-number">0</span>.<span class="hljs-number">9311</span> CLASSIFICATION REPORT: <span class="hljs-number">0</span> <span class="hljs-number">1</span> accuracy macro avg weighted avg precision <span class="hljs-number">0.928767</span> <span class="hljs-number">0.936047</span> <span class="hljs-number">0.931099</span> <span class="hljs-number">0.932407</span> <span class="hljs-number">0.931302</span> recall <span class="hljs-number">0.968571</span> <span class="hljs-number">0.860963</span> <span class="hljs-number">0.931099</span> <span class="hljs-number">0.914767</span> <span class="hljs-number">0.931099</span> f1-score <span class="hljs-number">0.948252</span> <span class="hljs-number">0.896936</span> <span class="hljs-number">0.931099</span> <span class="hljs-number">0.922594</span> <span class="hljs-number">0.930382</span> support <span class="hljs-number">350</span>.<span class="hljs-number">000000</span> <span class="hljs-number">187.000000</span> <span class="hljs-number">0.931099</span> <span class="hljs-number">537</span>.<span class="hljs-number">000000</span> <span class="hljs-number">537</span>.<span class="hljs-number">000000</span> TESTING RESULTS:

CONFUSION MATRIX: [[<span class="hljs-number">126</span> <span class="hljs-number">24</span>] [ <span class="hljs-number">37</span> <span class="hljs-number">44</span>]] ACCURACY SCORE: <span class="hljs-number">0</span>.<span class="hljs-number">7359</span> CLASSIFICATION REPORT: <span class="hljs-number">0</span> <span class="hljs-number">1</span> accuracy macro avg weighted avg precision <span class="hljs-number">0.773006</span> <span class="hljs-number">0.647059</span> <span class="hljs-number">0.735931</span> <span class="hljs-number">0.710032</span> <span class="hljs-number">0.728843</span> recall <span class="hljs-number">0.840000</span> <span class="hljs-number">0.543210</span> <span class="hljs-number">0.735931</span> <span class="hljs-number">0.691605</span> <span class="hljs-number">0.735931</span> f1-score <span class="hljs-number">0.805112</span> <span class="hljs-number">0.590604</span> <span class="hljs-number">0.735931</span> <span class="hljs-number">0.697858</span> <span class="hljs-number">0.729895</span> support <span class="hljs-number">150.000000</span> <span class="hljs-number">81.000000</span> <span class="hljs-number">0.735931</span> <span class="hljs-number">231.000000</span> <span class="hljs-number">231.000000</span></pre></div><h1 id="056a">3. Voting</h1><p id="f48f">Voting is a set of equally well-performing models to balance out their weaknesses. Voting uses three approaches for the voting procedure, hard, soft, and weighted.</p><ol><li><b>Hard voting </b>— the majority of class labels predicted.</li><li><b>Soft voting</b> — the <a href="https://en.wikipedia.org/wiki/Arg_max">argmax</a> of the sum of predicted probabilities.</li><li><b>Weighted voting</b> — the <a href="https://en.wikipedia.org/wiki/Arg_max">argmax</a> of the weighted sum of predicted probabilities.</li></ol><figure id="6383"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*avnMrRgesott3lM-JR-fRw.png"><figcaption>Voting ensemble — image by the author</figcaption></figure><p id="c13b">Voting is simple and easy to implement. First, it creates two standalone models (may be more depending on the use case) from the dataset. A voting classifier is then used to wrap the models and average the submodels’ predictions when introducing the new data.</p><div id="c695"><pre>from sklearn<span class="hljs-selector-class">.ensemble</span> import VotingClassifier from sklearn<span class="hljs-selector-class">.linear_model</span> import LogisticRegression from sklearn<span class="hljs-selector-class">.svm</span> import SVC

estimators = <span class="hljs-selector-attr">[]</span> log_reg = <span class="hljs-built_in">LogisticRegression</span>(solver=<span class="hljs-string">'liblinear'</span>) estimators<span class="hljs-selector-class">.append</span>((<span class="hljs-string">'Logistic'</span>, log_reg))

tree = <span class="hljs-built_in">DecisionTreeClassifier</span>() estimators<span class="hljs-selector-class">.append</span>((<span class="hljs-string">'Tree'</span>, tree))

svm_clf = <span class="hljs-built_in">SVC</span>(gamma=<span class="hljs-string">'scale'</span>) estimators<span class="hljs-selector-class">.append</span>((<span class="hljs-string">'SVM'</span>, svm_clf))

voting = <span class="hljs-built_in">VotingClassifier</span>(estimators=estimators) voting<span class="hljs-selector-class">.fit</span>(X_train, y_train)

<span class="hljs-function"><span class="hljs-title">evaluate</span><span class="hljs-params">(voting, X_train, X_test, y_train, y_test)</span></span></pre></div><div id="8995"><pre>TRAINIG RESULTS:

CONFUSION MATRIX: [[<span class="hljs-number">328</span> <span class="hljs-number">22</span>] [ <span class="hljs-number">75</span> <span class="hljs-number">112</span>]] ACCURACY SCORE: <span class="hljs-number">0</span>.<spa

Options

n class="hljs-number">8194</span> CLASSIFICATION REPORT: <span class="hljs-number">0</span> <span class="hljs-number">1</span> accuracy macro avg weighted avg precision <span class="hljs-number">0.813896</span> <span class="hljs-number">0.835821</span> <span class="hljs-number">0.819367</span> <span class="hljs-number">0.824858</span> <span class="hljs-number">0.821531</span> recall <span class="hljs-number">0.937143</span> <span class="hljs-number">0.598930</span> <span class="hljs-number">0.819367</span> <span class="hljs-number">0.768037</span> <span class="hljs-number">0.819367</span> f1-score <span class="hljs-number">0.871182</span> <span class="hljs-number">0.697819</span> <span class="hljs-number">0.819367</span> <span class="hljs-number">0.784501</span> <span class="hljs-number">0.810812</span> support <span class="hljs-number">350</span>.<span class="hljs-number">000000</span> <span class="hljs-number">187.000000</span> <span class="hljs-number">0.819367</span> <span class="hljs-number">537</span>.<span class="hljs-number">000000</span> <span class="hljs-number">537</span>.<span class="hljs-number">000000</span> TESTING RESULTS:

CONFUSION MATRIX: [[<span class="hljs-number">135</span> <span class="hljs-number">15</span>] [ <span class="hljs-number">40</span> <span class="hljs-number">41</span>]] ACCURACY SCORE: <span class="hljs-number">0</span>.<span class="hljs-number">7619</span> CLASSIFICATION REPORT: <span class="hljs-number">0</span> <span class="hljs-number">1</span> accuracy macro avg weighted avg precision <span class="hljs-number">0.771429</span> <span class="hljs-number">0.732143</span> <span class="hljs-number">0.761905</span> <span class="hljs-number">0.751786</span> <span class="hljs-number">0.757653</span> recall <span class="hljs-number">0.900000</span> <span class="hljs-number">0.506173</span> <span class="hljs-number">0.761905</span> <span class="hljs-number">0.703086</span> <span class="hljs-number">0.761905</span> f1-score <span class="hljs-number">0.830769</span> <span class="hljs-number">0.598540</span> <span class="hljs-number">0.761905</span> <span class="hljs-number">0.714655</span> <span class="hljs-number">0.749338</span> support <span class="hljs-number">150.000000</span> <span class="hljs-number">81.000000</span> <span class="hljs-number">0.761905</span> <span class="hljs-number">231.000000</span> <span class="hljs-number">231.000000</span></pre></div><h1 id="a836">4. Stacking</h1><figure id="925e"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*jqaqeUgnvX_TqxJ-5dDaYQ.png"><figcaption>Stacking — image by the author</figcaption></figure><p id="7996">Stacking has the same working principle as the voting ensemble. However, <b>stacking can control the ability to</b> <b>adjust the submodels’ predictions sequentially</b>- as inputs to the meta-model, to boost the performance. In other words, stacking generates predictions from each model’s algorithm; subsequently, the meta-model uses these predictions as inputs (weights) to create the final outputs.</p><p id="4089">The superiority of stacking is that it can combine different powerful learners and make precise and robust predictions than any standalone model.</p><p id="44f0">The sklearn library has the StackingClassifier() under the ensemble module, you <a href="https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.StackingClassifier.html">can find the link here</a>. However, I will implement the stacking ensemble using the <a href="http://ml-ensemble.com/">ML-Ensemble library</a>.</p><blockquote id="439b"><p><b>To make a fair comparison between stacking and the previous ensembles, I recalculated the previous accuracies using a fold of 10.</b></p></blockquote><div id="b53d"><pre><span class="hljs-keyword">from</span> mlens.ensemble <span class="hljs-keyword">import</span> SuperLearner</pre></div><div id="0ba2"><pre># create <span class="hljs-selector-tag">a</span> list of base-models def <span class="hljs-built_in">get_models</span>(): models = <span class="hljs-built_in">list</span>() models.<span class="hljs-built_in">append</span>(<span class="hljs-built_in">LogisticRegression</span>(solver=<span class="hljs-string">'liblinear'</span>)) models.<span class="hljs-built_in">append</span>(<span class="hljs-built_in">DecisionTreeClassifier</span>()) models.<span class="hljs-built_in">append</span>(<span class="hljs-built_in">SVC</span>(gamma=<span class="hljs-string">'scale'</span>, probability=True)) models.<span class="hljs-built_in">append</span>(<span class="hljs-built_in">GaussianNB</span>()) models.<span class="hljs-built_in">append</span>(<span class="hljs-built_in">KNeighborsClassifier</span>()) models.<span class="hljs-built_in">append</span>(<span class="hljs-built_in">AdaBoostClassifier</span>()) models.<span class="hljs-built_in">append</span>(<span class="hljs-built_in">BaggingClassifier</span>(n_estimators=<span class="hljs-number">10</span>)) models.<span class="hljs-built_in">append</span>(<span class="hljs-built_in">RandomForestClassifier</span>(n_estimators=<span class="hljs-number">10</span>)) models.<span class="hljs-built_in">append</span>(<span class="hljs-built_in">ExtraTreesClassifier</span>(n_estimators=<span class="hljs-number">10</span>)) return models</pre></div><div id="0603"><pre>def get_super_learner(X): ensemble = SuperLearner(<span class="hljs-attribute">scorer</span>=accuracy_score, folds = 10, <span class="hljs-attribute">random_state</span>=41) model = get_models() ensemble.<span class="hljs-built_in">add</span>(model) <span class="hljs-comment"># add some layers to the ensemble structure</span> ensemble.<span class="hljs-built_in">add</span>([LogisticRegression(), RandomForestClassifier()]) ensemble.<span class="hljs-built_in">add</span>([LogisticRegression(), SVC()]) <span class="hljs-comment"># add meta model</span> ensemble.add_meta(SVC()) return ensemble</pre></div><div id="ea90"><pre><span class="hljs-comment"># create the super learner</span> ensemble = get_super_learner(X_train) <span class="hljs-comment"># fit the super learner</span> ensemble.fit(X_train, y_train) <span class="hljs-comment"># summarize base learners</span> <span class="hljs-built_in">print</span>(ensemble.data) <span class="hljs-comment"># make predictions on hold out set</span> yhat = ensemble.predict(X_test) <span class="hljs-built_in">print</span>(<span class="hljs-string">'Super Learner: %.3f'</span> % (accuracy_score(y_test, yhat) * 100))</pre></div><figure id="36ce"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*jencq4A6RJB279KG1cU5RQ.png"><figcaption>Stacking using the superlearner class — image by the author</figcaption></figure><div id="644f"><pre><span class="hljs-attribute">ACCURACY</span> SCORE <span class="hljs-literal">ON</span> TRAIN: <span class="hljs-number">83</span>.<span class="hljs-number">24022346368714</span>
<span class="hljs-attribute">ACCURACY</span> SCORE <span class="hljs-literal">ON</span> TEST: <span class="hljs-number">76</span>.<span class="hljs-number">62337662337663</span></pre></div><h1 id="1d42">Compare the performance</h1><div id="be98"><pre><span class="hljs-keyword">import</span> plotly.graph_objects <span class="hljs-keyword">as</span> go</pre></div><div id="fd17"><pre>fig = go<span class="hljs-selector-class">.Figure</span>() fig<span class="hljs-selector-class">.add_trace</span>(go<span class="hljs-selector-class">.Bar</span>( x = test<span class="hljs-selector-attr">[<span class="hljs-string">'Algo'</span>]</span>, y = test<span class="hljs-selector-attr">[<span class="hljs-string">'Train'</span>]</span>, text = test<span class="hljs-selector-attr">[<span class="hljs-string">'Train'</span>]</span>, textposition=<span class="hljs-string">'auto'</span>, name = <span class="hljs-string">'Accuracy on Train set'</span>, marker_color = <span class="hljs-string">'indianred'</span>))</pre></div><div id="0329"><pre>fig<span class="hljs-selector-class">.add_trace</span>(go<span class="hljs-selector-class">.Bar</span>( x = test<span class="hljs-selector-attr">[<span class="hljs-string">'Algo'</span>]</span>, y = test<span class="hljs-selector-attr">[<span class="hljs-string">'Test'</span>]</span>, text = test<span class="hljs-selector-attr">[<span class="hljs-string">'Test'</span>]</span>, textposition=<span class="hljs-string">'auto'</span>, name = <span class="hljs-string">'Accuracy on Test set'</span>, marker_color = <span class="hljs-string">'lightsalmon'</span>))</pre></div><div id="04c3"><pre>fig.update_traces(<span class="hljs-attribute">texttemplate</span>=<span class="hljs-string">'%{text:.2f}'</span>) fig.update_layout(<span class="hljs-attribute">title_text</span>=<span class="hljs-string">'Comprehensive comparasion between ensembles on Train and Test set'</span>) fig.show()</pre></div><figure id="2094"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*X5sD-Ann_yW2dvkLOcRMkw.png"><figcaption>Ensembles performance comparison bar plot — Train vs Test set on accuracy metric — image by the author</figcaption></figure><p id="e672">As shown, the stacked ensemble performed great on the test set, with the highest classification accuracy is 76.623%. Great!</p><h1 id="4234">4. Conclusions and Takeaways</h1><p id="77e3">We have explored several types of ensembles and learned to implement them the right way to extend the model’s predictive power. We’ve also concluded some essential points to consider:</p><ol><li>Stacking showed improvement in the accuracy, robustness and got a better generalization.</li><li>We may use voting in such cases when we want to set equally well-performing models to balance their weaknesses.</li><li>Boosting is a great ensemble; It merely combines multiple weak learners to get a powerful one.</li><li>You may consider Bagging when you want to produce a model with less variance by combining different good models — decrease overfitting.</li><li>Choosing the right ensemble depends on the business problem and what outcome you desire.</li></ol><p id="70cc">Finally, I wish this gave a comprehensive guide on implementing ensembles and get the most out of them. If you encountered any issues, please list them in the comment section; I would be happy to help. The best way to encourage me is by following me here on <a href="https://medium.com/@salmaeng71"><b>Medium</b></a>, <a href="https://www.linkedin.com/in/salma-elshahawy/"><b>LinkedIn</b></a><b>, </b>or <a href="https://github.com/salma71"><b>Github</b></a><b>.</b> Happy learning!</p></article></body>

How to Get the Most of the ML Ensembles

Lessons from Kaggle: Compare ensembles algorithms in terms of model accuracy, robustness, and generalization. Implementation included!

Ensembles methods — image by the author

Introduction

We previously discussed some of the common ways to leverage the prediction power of Machine Learning (ML) models. These methods are mainly utilized to improve model generalizability by splitting the data into particular schemes.

typical steps to build a typical ML model without ensembles — image by the author

However, there are more advanced methods to enhance the models’ performance such as ensemble algorithms. In this post, we will discuss and compare the performance of multiple ensemble algorithms. So, let’s get started!

The ensemble method aims to combine the predictions of multiple base-estimator , instead of a single estimator, that leverage the generalization and robustness of the model.

Update 02/13/2021: include the StackingClassifier() class link within the sklearn.ensemble module.

Pre-requisites

  1. I will use the toy dataset from the UCIML public repository which is hosted on Kaggle; It has nine columns, including the target variable. The notebook is hosted on GitHub if you would like to follow along.
  2. I utilized the Kaggle API to fetch the dataset while working on the notebook. If you don’t have an account on Kaggle, just download the dataset on your local machine and skip this part in the notebook. You can follow this post on StackOverflow for step-by-step instructions.

I included the script to fetch and download the data into google colab, just make sure you generate your own token before running it.

3. I did some basic preprocessing to the dataset before building the model — such as imputing missing data, to avoid errors.

4. I created two separate notebooks, one for comparing the first three ensembles. The second one comprises the implementation of the stacked ensemble from scratch and using the MLens library.

Methods of ensembles

Ensembles methods — image by the author

Ensembles are procedures that build various models and then blend them to produce improved predictions. Ensembles enable achieving more precise predictions compared to a single model. Utilizing Ensembles typically gave the edge for the winning teams in ML competitions. You can find the CrowdFlower winners’ interview — Team Quartet, who used Ensemble to win the competition.

1. Bagging — Bootstrap Aggregating:

Bootstrap aggregating tends to build multiple models (using the same type of algorithms) from different subsamples with replacement from the training dataset.

Bagging is to ensemble several good models to reduce the model variance.

BaggingThank intuition parallel processing — image by the author

Bagging has three types of ensembles as follows:

1.2 Bagging decision trees

Bagging performs best with algorithms that produce high variance predictions. In the following example, we will develop the BaggingClassifier() combined with a DecisionTreeClassifier() within the sklearn library.

Please note that the results may differ due to stochastic learning nature!

Bagging produces models and splits the samples in parallel.

from sklearn.ensemble import BaggingClassifier
tree = DecisionTreeClassifier()
bagging_clf = BaggingClassifier(base_estimator=tree, n_estimators=1500, random_state=42)
bagging_clf.fit(X_train, y_train)

evaluate(bagging_clf, X_train, X_test, y_train, y_test)
TRAINIG RESULTS: 
===============================
CONFUSION MATRIX:
[[350   0]
 [  0 187]]
ACCURACY SCORE:
1.0000
CLASSIFICATION REPORT:
               0      1  accuracy  macro avg  weighted avg
precision    1.0    1.0       1.0        1.0           1.0
recall       1.0    1.0       1.0        1.0           1.0
f1-score     1.0    1.0       1.0        1.0           1.0
support    350.0  187.0       1.0      537.0         537.0
TESTING RESULTS: 
===============================
CONFUSION MATRIX:
[[126  24]
 [ 38  43]]
ACCURACY SCORE:
0.7316
CLASSIFICATION REPORT:
                    0          1  accuracy   macro avg  weighted avg
precision    0.768293   0.641791  0.731602    0.705042      0.723935
recall       0.840000   0.530864  0.731602    0.685432      0.731602
f1-score     0.802548   0.581081  0.731602    0.691814      0.724891
support    150.000000  81.000000  0.731602  231.000000    231.000000

1.2 Random Forest (RF)

Random Forest (RF) is a meta estimator that fits different decision tree classifiers on multiple sub-samples and estimates the average accuracy.

The sub-sample size is constant, but the samples are drawn with replacement if bootstrap=True (default).

Now, let’s take a shot and try the Random forest (RF) model. RF works like the bagged decision tree class; however, it reduces the correlation between individual classifiers. RF only considers the random subset of features per split instead of following the greedy approach to picking the best split point.

from sklearn.ensemble import RandomForestClassifier

rf_clf = RandomForestClassifier(random_state=42, n_estimators=1000)
rf_clf.fit(X_train, y_train)
evaluate(rf_clf, X_train, X_test, y_train, y_test)
TRAINIG RESULTS: 
===============================
CONFUSION MATRIX:
[[350   0]
 [  0 187]]
ACCURACY SCORE:
1.0000
CLASSIFICATION REPORT:
               0      1  accuracy  macro avg  weighted avg
precision    1.0    1.0       1.0        1.0           1.0
recall       1.0    1.0       1.0        1.0           1.0
f1-score     1.0    1.0       1.0        1.0           1.0
support    350.0  187.0       1.0      537.0         537.0
TESTING RESULTS: 
===============================
CONFUSION MATRIX:
[[127  23]
 [ 38  43]]
ACCURACY SCORE:
0.7359
CLASSIFICATION REPORT:
                    0          1  accuracy   macro avg  weighted avg
precision    0.769697   0.651515  0.735931    0.710606      0.728257
recall       0.846667   0.530864  0.735931    0.688765      0.735931
f1-score     0.806349   0.585034  0.735931    0.695692      0.728745
support    150.000000  81.000000  0.735931  231.000000    231.000000

1.3 Extra trees — ET

Extra Trees (ET) is a modification of bagging. The ExtraTreesClassifier() is a class from the sklearn library that creates a meta estimator to fit several randomized decision trees (a.k.a. ET) of various sub-samples. Then, ET computes the average prediction among the sub-samples. This allows improving the accuracy of the model and control for over-fitting.

from sklearn.ensemble import ExtraTreesClassifier

ex_tree_clf = ExtraTreesClassifier(n_estimators=1000, max_features=7, random_state=42)
ex_tree_clf.fit(X_train, y_train)
evaluate(ex_tree_clf, X_train, X_test, y_train, y_test)
TRAINIG RESULTS: 
===============================
CONFUSION MATRIX:
[[350   0]
 [  0 187]]
ACCURACY SCORE:
1.0000
CLASSIFICATION REPORT:
               0      1  accuracy  macro avg  weighted avg
precision    1.0    1.0       1.0        1.0           1.0
recall       1.0    1.0       1.0        1.0           1.0
f1-score     1.0    1.0       1.0        1.0           1.0
support    350.0  187.0       1.0      537.0         537.0
TESTING RESULTS: 
===============================
CONFUSION MATRIX:
[[124  26]
 [ 32  49]]
ACCURACY SCORE:
0.7489
CLASSIFICATION REPORT:
                    0          1  accuracy   macro avg  weighted avg
precision    0.794872   0.653333  0.748918    0.724103      0.745241
recall       0.826667   0.604938  0.748918    0.715802      0.748918
f1-score     0.810458   0.628205  0.748918    0.719331      0.746551
support    150.000000  81.000000  0.748918  231.000000    231.000000

2. Boosting

Boosting is another technique to build multiple models (also from the same type); however, each model learns to fix the previous model’s prediction errors across the models’ sequence. Boosting is primarily used to balance the bias and variance in the supervised machine learning models. Boosting is an algorithm that converts weak learners into strong ones.

Boosting manages to build a base estimator sequentially from weak ones, then reduce the combined estimators’ bias.

Boosting ensemble in sequential — image by the author

2.1 AdaBoost (AD)

AdaBoost (AD) weighs the dataset instances by classifying features. This enables the algorithm to account for these features in constructing the subsequent model.

from sklearn.ensemble import AdaBoostClassifier

ada_boost_clf = AdaBoostClassifier(n_estimators=30)
ada_boost_clf.fit(X_train, y_train)
evaluate(ada_boost_clf, X_train, X_test, y_train, y_test)
TRAINIG RESULTS: 
===============================
CONFUSION MATRIX:
[[314  36]
 [ 49 138]]
ACCURACY SCORE:
0.8417
CLASSIFICATION REPORT:
                    0           1  accuracy   macro avg  weighted avg
precision    0.865014    0.793103  0.841713    0.829059  0.839972
recall       0.897143    0.737968  0.841713    0.817555  0.841713
f1-score     0.880785    0.764543  0.841713    0.822664  0.840306
support    350.000000  187.000000  0.841713  537.000000  537.000000
TESTING RESULTS: 
===============================
CONFUSION MATRIX:
[[129  21]
 [ 36  45]]
ACCURACY SCORE:
0.7532
CLASSIFICATION REPORT:
                    0          1  accuracy   macro avg  weighted avg
precision    0.781818   0.681818  0.753247    0.731818      0.746753
recall       0.860000   0.555556  0.753247    0.707778      0.753247
f1-score     0.819048   0.612245  0.753247    0.715646      0.746532
support    150.000000  81.000000  0.753247  231.000000    231.000000

2.2 Stochastic Gradient Boosting ( SGB )

Stochastic Gradient Boosting (SGB) is one of the advanced ensemble algorithms. At each iteration, SGB randomly draws a sub-sample from the training set (without replacement). The sub-sample is then utilized to fit the base model(learner) until the error becomes stable.

from sklearn.ensemble import GradientBoostingClassifier

grad_boost_clf = GradientBoostingClassifier(n_estimators=100, random_state=42)
grad_boost_clf.fit(X_train, y_train)
evaluate(grad_boost_clf, X_train, X_test, y_train, y_test)
TRAINIG RESULTS: 
===============================
CONFUSION MATRIX:
[[339  11]
 [ 26 161]]
ACCURACY SCORE:
0.9311
CLASSIFICATION REPORT:
                    0           1  accuracy   macro avg  weighted avg
precision    0.928767    0.936047  0.931099    0.932407  0.931302
recall       0.968571    0.860963  0.931099    0.914767  0.931099
f1-score     0.948252    0.896936  0.931099    0.922594  0.930382
support    350.000000  187.000000  0.931099  537.000000  537.000000
TESTING RESULTS: 
===============================
CONFUSION MATRIX:
[[126  24]
 [ 37  44]]
ACCURACY SCORE:
0.7359
CLASSIFICATION REPORT:
                    0          1  accuracy   macro avg  weighted avg
precision    0.773006   0.647059  0.735931    0.710032      0.728843
recall       0.840000   0.543210  0.735931    0.691605      0.735931
f1-score     0.805112   0.590604  0.735931    0.697858      0.729895
support    150.000000  81.000000  0.735931  231.000000    231.000000

3. Voting

Voting is a set of equally well-performing models to balance out their weaknesses. Voting uses three approaches for the voting procedure, hard, soft, and weighted.

  1. Hard voting — the majority of class labels predicted.
  2. Soft voting — the argmax of the sum of predicted probabilities.
  3. Weighted voting — the argmax of the weighted sum of predicted probabilities.
Voting ensemble — image by the author

Voting is simple and easy to implement. First, it creates two standalone models (may be more depending on the use case) from the dataset. A voting classifier is then used to wrap the models and average the submodels’ predictions when introducing the new data.

from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC

estimators = []
log_reg = LogisticRegression(solver='liblinear')
estimators.append(('Logistic', log_reg))

tree = DecisionTreeClassifier()
estimators.append(('Tree', tree))

svm_clf = SVC(gamma='scale')
estimators.append(('SVM', svm_clf))

voting = VotingClassifier(estimators=estimators)
voting.fit(X_train, y_train)

evaluate(voting, X_train, X_test, y_train, y_test)
TRAINIG RESULTS: 
===============================
CONFUSION MATRIX:
[[328  22]
 [ 75 112]]
ACCURACY SCORE:
0.8194
CLASSIFICATION REPORT:
                    0           1  accuracy   macro avg  weighted avg
precision    0.813896    0.835821  0.819367    0.824858  0.821531
recall       0.937143    0.598930  0.819367    0.768037  0.819367
f1-score     0.871182    0.697819  0.819367    0.784501  0.810812
support    350.000000  187.000000  0.819367  537.000000  537.000000
TESTING RESULTS: 
===============================
CONFUSION MATRIX:
[[135  15]
 [ 40  41]]
ACCURACY SCORE:
0.7619
CLASSIFICATION REPORT:
                    0          1  accuracy   macro avg  weighted avg
precision    0.771429   0.732143  0.761905    0.751786      0.757653
recall       0.900000   0.506173  0.761905    0.703086      0.761905
f1-score     0.830769   0.598540  0.761905    0.714655      0.749338
support    150.000000  81.000000  0.761905  231.000000    231.000000

4. Stacking

Stacking — image by the author

Stacking has the same working principle as the voting ensemble. However, stacking can control the ability to adjust the submodels’ predictions sequentially- as inputs to the meta-model, to boost the performance. In other words, stacking generates predictions from each model’s algorithm; subsequently, the meta-model uses these predictions as inputs (weights) to create the final outputs.

The superiority of stacking is that it can combine different powerful learners and make precise and robust predictions than any standalone model.

The sklearn library has the StackingClassifier() under the ensemble module, you can find the link here. However, I will implement the stacking ensemble using the ML-Ensemble library.

To make a fair comparison between stacking and the previous ensembles, I recalculated the previous accuracies using a fold of 10.

from mlens.ensemble import SuperLearner
# create a list of base-models
def get_models():
	models = list()
	models.append(LogisticRegression(solver='liblinear'))
	models.append(DecisionTreeClassifier())
	models.append(SVC(gamma='scale', probability=True))
	models.append(GaussianNB())
	models.append(KNeighborsClassifier())
	models.append(AdaBoostClassifier())
	models.append(BaggingClassifier(n_estimators=10))
	models.append(RandomForestClassifier(n_estimators=10))
	models.append(ExtraTreesClassifier(n_estimators=10))
	return models
def get_super_learner(X):
  ensemble = SuperLearner(scorer=accuracy_score, 
                          folds = 10, 
                          random_state=41)
  model = get_models()
  ensemble.add(model)
# add some layers to the ensemble structure
  ensemble.add([LogisticRegression(), RandomForestClassifier()])
  ensemble.add([LogisticRegression(), SVC()])
# add meta model
  ensemble.add_meta(SVC())
  return ensemble
# create the super learner
ensemble = get_super_learner(X_train)
# fit the super learner
ensemble.fit(X_train, y_train)
# summarize base learners
print(ensemble.data)
# make predictions on hold out set
yhat = ensemble.predict(X_test)
print('Super Learner: %.3f' % (accuracy_score(y_test, yhat) * 100))
Stacking using the superlearner class — image by the author
ACCURACY SCORE ON TRAIN: 83.24022346368714   
ACCURACY SCORE ON TEST: 76.62337662337663

Compare the performance

import plotly.graph_objects as go
fig = go.Figure()
fig.add_trace(go.Bar(
              x = test['Algo'],
              y = test['Train'],
              text = test['Train'],
              textposition='auto',
              name = 'Accuracy on Train set',
              marker_color = 'indianred'))
fig.add_trace(go.Bar(
              x = test['Algo'],
              y = test['Test'],
              text = test['Test'],
              textposition='auto',
              name = 'Accuracy on Test set',
              marker_color = 'lightsalmon'))
fig.update_traces(texttemplate='%{text:.2f}')
fig.update_layout(title_text='Comprehensive comparasion between ensembles on Train and Test set')
fig.show()
Ensembles performance comparison bar plot — Train vs Test set on accuracy metric — image by the author

As shown, the stacked ensemble performed great on the test set, with the highest classification accuracy is 76.623%. Great!

4. Conclusions and Takeaways

We have explored several types of ensembles and learned to implement them the right way to extend the model’s predictive power. We’ve also concluded some essential points to consider:

  1. Stacking showed improvement in the accuracy, robustness and got a better generalization.
  2. We may use voting in such cases when we want to set equally well-performing models to balance their weaknesses.
  3. Boosting is a great ensemble; It merely combines multiple weak learners to get a powerful one.
  4. You may consider Bagging when you want to produce a model with less variance by combining different good models — decrease overfitting.
  5. Choosing the right ensemble depends on the business problem and what outcome you desire.

Finally, I wish this gave a comprehensive guide on implementing ensembles and get the most out of them. If you encountered any issues, please list them in the comment section; I would be happy to help. The best way to encourage me is by following me here on Medium, LinkedIn, or Github. Happy learning!

Machine Learning
Data Science
Artificial Intelligence
Advanced Analytics
Python
Recommended from ReadMedium