avatarT Z J Y

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

1379

Abstract

e base GBM framework through systems optimization and algorithmic enhancements.</p><figure id="1ba1"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*QmornoocjaA4gERi"><figcaption></figcaption></figure><h1 id="fe38">About XGBoost Built-in Feature Importance</h1><ul><li>There are <a href="https://xgboost.readthedocs.io/en/latest/python/python_api.html?highlight=get_score#xgboost.Booster.get_score">several types of importance</a> in the XGBoost — it can be computed in several different ways. The default type is <code>gain</code> if you construct model with <code>scikit-learn</code> like API (<a href="https://xgboost.readthedocs.io/en/latest/python/python_api.html?highlight=get_score#xgboost.XGBClassifier">docs</a>). When you access <code>Booster</code> object and get the importance with <code>get_score</code> method, then default is <code>weight</code>. You can check the type of the importance with <code>xgb.importance_type</code>.</li><li>The <code>gain</code> type shows the average gain across all splits where feature was used.</li><li>The <code>weight</code> shows the number of times the feature is used to split data. This type of feature importance can favourite numerical and high cardinality features.</li><li>There are also <code>cover</code>, <code>total_gain</code>, <code>total_cover</code> types of importance.</li></ul><p id="13be">Fo

Options

r more details, please find the link below:</p><div id="e6e8" class="link-block"> <a href="https://xgboost.readthedocs.io/en/latest/get_started.html"> <div> <div> <h2>Get Started with XGBoost - xgboost 1.6.0-dev documentation</h2> <div><h3>Edit description</h3></div> <div><p>xgboost.readthedocs.io</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*V9tMhGds5C-HmG5Y)"></div> </div> </div> </a> </div><h1 id="6d60">Thanks for Reading!</h1><p id="ea51"><i>If you enjoyed it, please follow me on Medium for more. It’s great cardio for your </i>👏 <i>AND will help other people see the story.</i></p><p id="fd35"><i>If you want to continue getting this type of article, you can support me by becoming a <a href="https://medium.com/@tzjy/subscribe">Medium subscriber. It costs $5/month. A part of your subscription fee goes to me</a>.</i></p><h1 id="f095">References</h1><ul><li><a href="https://xgboost.readthedocs.io/en/latest/get_started.html">Get Started with XGBoost</a></li><li><a href="https://machinelearningmastery.com/develop-first-xgboost-model-python-scikit-learn/">How to Develop Your First XGBoost Model in Python</a></li></ul></article></body>

Finding Important Features Using XGBoost

XGBoost is a short form for Extreme Gradient Boosting. It gained popularity in data science after the famous Kaggle competition Otto Classification challenge. But how does it work exactly?

What is XGBoost?

XGBoost is a decision-tree-based ensemble Machine Learning algorithm that uses a gradient boosting framework. In prediction problems involving unstructured data (images, text, etc.) artificial neural networks tend to outperform all other algorithms or frameworks. However, when it comes to small-to-medium structured/tabular data, decision tree based algorithms are considered best-in-class right now.

Why XGBoost works well?

XGBoost and Gradient Boosting Machines (GBMs) are both ensemble tree methods that apply the principle of boosting weak learners (CARTs generally) using the gradient descent architecture. However, XGBoost improves upon the base GBM framework through systems optimization and algorithmic enhancements.

About XGBoost Built-in Feature Importance

  • There are several types of importance in the XGBoost — it can be computed in several different ways. The default type is gain if you construct model with scikit-learn like API (docs). When you access Booster object and get the importance with get_score method, then default is weight. You can check the type of the importance with xgb.importance_type.
  • The gain type shows the average gain across all splits where feature was used.
  • The weight shows the number of times the feature is used to split data. This type of feature importance can favourite numerical and high cardinality features.
  • There are also cover, total_gain, total_cover types of importance.

For more details, please find the link below:

Thanks for Reading!

If you enjoyed it, please follow me on Medium for more. It’s great cardio for your 👏 AND will help other people see the story.

If you want to continue getting this type of article, you can support me by becoming a Medium subscriber. It costs $5/month. A part of your subscription fee goes to me.

References

Machine Learning
Xgboost
Data Science
AI
Recommended from ReadMedium