avatarPraveen Pareek

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

2989

Abstract

les to billions of examples in distributed or memory-limited settings.</p><p id="9d9f">The scalability of XGBoost is due to several important systems and algorithmic optimizations.</p><figure id="7514"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*9y0GPQf2RCNSMfiU"><figcaption>Photo by <a href="https://unsplash.com/@askkell?utm_source=medium&amp;utm_medium=referral">Andy Kelly</a> on <a href="https://unsplash.com?utm_source=medium&amp;utm_medium=referral">Unsplash</a></figcaption></figure><p id="ff94">These innovations include:</p><ul><li>A novel tree learning algorithm is for handling sparse data</li><li>A theoretically justified weighted quantile sketch procedure enables handling instance weights in approximate tree learning.</li></ul><p id="9b58">Parallel and distributed computing makes learning faster which enables quicker model exploration. More importantly, XGBoost exploits out-of-core computation and enables data scientists to process hundreds of millions of examples on a desktop.</p><h1 id="3a4f">Sparse BLAS CSC Matrix Storage Format:</h1><p id="f63a"><a href="https://software.intel.com/content/www/us/en/develop/documentation/mkl-developer-reference-c/top/appendix-a-linear-solvers-basics/sparse-matrix-storage-formats/sparse-blas-csc-matrix-storage-format.html">Sparse BLAS CSC Matrix Storage Format</a></p><div id="d48d" class="link-block"> <a href="https://software.intel.com/content/www/us/en/develop/documentation/mkl-developer-reference-c/top/appendix-a-linear-solvers-basics/sparse-matrix-storage-formats/sparse-blas-csc-matrix-storage-format.html"> <div> <div> <h2>Sparse BLAS CSC Format</h2> <div><h3>Intel® MKL improves performance with math routines for software applications that solve large computational problems.</h3></div> <div><p>software.intel.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/)"></div> </div> </div> </a> </div><h1 id="0cde">Miscellaneous:</h1><figure id="5f3b"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*5p41kJqm0IQAcKFq"><figcaption></figcaption></figure><h1 id="8c7f">Where do boosting algorithms fit in the world of AI/ML?</h1><p id="f842">Neural Networks, logistic regression, SVMs, all of these models answer of how do we learn to solve a particular problem (take specific example of Iris Dataset, classification problem).</p><figure id="a1b0"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*gTtvOaq6Fu7DB9iE"><figcaption>Photo by <a href="https://unsplash.com/@bmetzler2017?utm_source=medium&amp;utm_medium=referral">Brian Metzler</a> on <a href="https://unsplash.com?utm_source=medium&amp;utm_medium=referral">Unsplash</a></figcaption></figure><p id="e267">But a question that should be asked before this: <b>Is this problem solv

Options

able?</b></p><p id="4983"><b>To answer this: </b>We use the concept of<b> PAC Learning.</b></p><p id="fd8e">PAC Learning quantitatively defines “is the problem solvable/learnable?”</p><p id="7fe5">PAC: Probably Approximately Correct Model</p><p id="1522">Iris Dataset: Use logistic regression → Reasonably low error</p><p id="d11d">It means that for this particular problem, logistic regression is a strong learner. {If it fits with our definition of threshold, 99% performance (0.01 < 𝟄 with probability > 1-𝛅)}</p><p id="6557">For more complex problems, a strong learner would need to be more complex {Also we need a lot more learning parameters and a lot more samples for training & we may also have a very high hardware requirement}</p><figure id="fc7c"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*RxqIZujzEA5naH1Q"><figcaption>Photo by <a href="https://unsplash.com/@kellysikkema?utm_source=medium&amp;utm_medium=referral">Kelly Sikkema</a> on <a href="https://unsplash.com?utm_source=medium&amp;utm_medium=referral">Unsplash</a></figcaption></figure><p id="d170">If we don’t have above, then:</p><ul><li>Weak Learners come to rescue.</li><li>Weak learners are algorithms which perform just slightly greater than random guessing.</li><li>If a problem can be solved by a strong learner then a weak learner should be able to do it too.</li><li>They can do it by introducing a technique called Boosting Mechanism.</li><li>Construct multiple models and then all make predictions and then we go by majority vote.</li></ul><h1 id="47ee">What’s next for you?</h1><blockquote id="558d"><p><i>If you enjoyed this article, it would really help if you hit recommend below! Follow me on <a href="https://twitter.com/imPraveenPareek">Twitter</a>, <a href="https://www.linkedin.com/in/praveenpareek/">LinkedIn</a>, and <a href="https://medium.com/@praveen.pareek">Medium</a></i></p></blockquote><blockquote id="90f7"><p><b><i>Read my previous post: <a href="https://medium.com/@praveen.pareek/interview-guide-to-boosting-algorithms-part-1-133153714073">Interview Guide to Boosting Algorithms: Part-1</a></i></b></p></blockquote><div id="c4e1" class="link-block"> <a href="https://medium.com/@praveen.pareek/interview-guide-to-boosting-algorithms-part-1-133153714073"> <div> <div> <h2>Interview Guide to Boosting Algorithms: Part-1</h2> <div><h3>Boosting is a general method for improving the accuracy of any given learning algorithm. Here I’ll discuss about…</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*WuX-NkJCZ_BMpQ4k)"></div> </div> </div> </a> </div><p id="cb67"><b>Gain Access to Expert View — <a href="https://datadriveninvestor.com/ddi-intel">Subscribe to DDI Intel</a></b></p></article></body>

Interview Guide to Boosting Algorithms: Part-2

Photo by CoWomen on Unsplash

Table of content to guide you across this article:

  • XGBoost: A Scalable Tree Boosting System
  • More about XGBoost
  • Miscellaneous

XGBoost: A Scalable Tree Boosting System

Machine learning and data-driven approaches are becoming very important in many areas.

  • Smart spam classifiers protect our email by learning from massive amounts of spam data and user feedback
  • Advertising systems learn to match the right ads with the right context
  • Fraud detection systems protect banks from malicious attackers
  • Anomaly event detection systems help experimental physicists to find events that lead to new physics.
Photo by Noble Mitchell on Unsplash

There are two important factors that drive these successful applications:

Usage of effective (statistical) models that capture the complex data dependencies and scalable learning systems that learn the model of interest from large datasets.

Among the machine learning methods used in practice, gradient tree boosting is one technique that shines in many applications.

Note: Gradient tree boosting is also known as gradient boosting machine (GBM) or gradient boosted regression tree (GBRT).

More about XGBoost:

The most important factor behind the success of XGBoost is its scalability in all scenarios.

The system runs more than ten times faster than existing popular solutions on a single machine and scales to billions of examples in distributed or memory-limited settings.

The scalability of XGBoost is due to several important systems and algorithmic optimizations.

Photo by Andy Kelly on Unsplash

These innovations include:

  • A novel tree learning algorithm is for handling sparse data
  • A theoretically justified weighted quantile sketch procedure enables handling instance weights in approximate tree learning.

Parallel and distributed computing makes learning faster which enables quicker model exploration. More importantly, XGBoost exploits out-of-core computation and enables data scientists to process hundreds of millions of examples on a desktop.

Sparse BLAS CSC Matrix Storage Format:

Sparse BLAS CSC Matrix Storage Format

Miscellaneous:

Where do boosting algorithms fit in the world of AI/ML?

Neural Networks, logistic regression, SVMs, all of these models answer of how do we learn to solve a particular problem (take specific example of Iris Dataset, classification problem).

Photo by Brian Metzler on Unsplash

But a question that should be asked before this: Is this problem solvable?

To answer this: We use the concept of PAC Learning.

PAC Learning quantitatively defines “is the problem solvable/learnable?”

PAC: Probably Approximately Correct Model

Iris Dataset: Use logistic regression → Reasonably low error

It means that for this particular problem, logistic regression is a strong learner. {If it fits with our definition of threshold, 99% performance (0.01 < 𝟄 with probability > 1-𝛅)}

For more complex problems, a strong learner would need to be more complex {Also we need a lot more learning parameters and a lot more samples for training & we may also have a very high hardware requirement}

Photo by Kelly Sikkema on Unsplash

If we don’t have above, then:

  • Weak Learners come to rescue.
  • Weak learners are algorithms which perform just slightly greater than random guessing.
  • If a problem can be solved by a strong learner then a weak learner should be able to do it too.
  • They can do it by introducing a technique called Boosting Mechanism.
  • Construct multiple models and then all make predictions and then we go by majority vote.

What’s next for you?

If you enjoyed this article, it would really help if you hit recommend below! Follow me on Twitter, LinkedIn, and Medium

Read my previous post: Interview Guide to Boosting Algorithms: Part-1

Gain Access to Expert View — Subscribe to DDI Intel

Interview
Interview Questions
Job Hunting
Artificial Intelligence
Machine Learning
Recommended from ReadMedium