avatarKajal Yadav

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

3974

Abstract

can separate the two classes, but only the SVM hyperplane can maximize the margin between the classes.</li></ul><p id="134b"><b>B0 + (B1 * X1) + (B2 * X2) = 0 </b>where, B1 and B2 determines the slope of the line and B0 (intercept) found by the learning algorithm. X1 and X2 are the two input variables.</p><figure id="e15e"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*DsYqLM-81vHUCCmhU313eA.jpeg"><figcaption>Screenshot from Internet</figcaption></figure><p id="cbd0"><b>5. Decision Trees:- </b>Decision tree algorithms are referred to as CART or Classification and Regression Trees. It is a flowchart like a tree structure, where each internal node denotes a test on an attribute, each branch represents an outcome of the test, and each leaf node (terminal node) holds a class label.</p><ul><li>A Gini score gives an idea of how good a split is by how mixed the response classes are in the groups created by the split.</li></ul><figure id="9c3a"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*MF4Ec0GbK00y7Wg7SGWA_w.jpeg"><figcaption>Screenshot from Internet</figcaption></figure><p id="c0d4"><b>6. Random Forest:-</b> Random Forests are an ensemble learning technique that builds off of decision trees.</p><ul><li>Random forests involve creating multiple decision trees using bootstrapped datasets of the original data and randomly selecting a subset of variables at each step of the decision tree.</li><li>The model then selects the mode of all of the predictions of each decision tree (bagging).</li></ul><figure id="5833"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*K1q4I_w31fJw-Ksy_33rLw.jpeg"><figcaption>Screenshot from Internet</figcaption></figure><p id="09e5"><b>7. AdaBoost:- </b>Adaptive boost is also an ensemble algorithm that leverages bagging and boosting methods to develop an enhanced predictor.</p><ul><li>AdaBoost creates a forest of stumps rather than trees. A stump is a tree that is made of only one node and two leaves.</li><li>AdaBoost takes a more iterative approach in the sense that it seeks to iteratively improve from the mistakes that the previous stump(s) made.</li></ul><figure id="f447"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*Vxb1cr73H5nKn1RUwFF5kg.jpeg"><figcaption>Screenshot from Internet</figcaption></figure><p id="74d9"><b>8. Gradient Boost:- </b>Gradient Boost is also an ensemble algorithm that uses boosting methods to develop an enhanced predictor.</p><ul><li>Unlike AdaBoost which builds stumps, Gradient Boost builds trees with usually 8–32 leaves.</li><li>Gradient Boost views the boosting problem as an optimization problem, where it uses a loss function and tries to minimize the error. This is why it’s called Gradient boost, as it’s inspired by gradient descent.</li></ul><figure id="736e"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*yflKCPCGV1yHUSH0M-cNPQ.jpeg"><figcaption>Screenshot from Internet</figcaption></figure><p id="daa7"><b>9. XG Boost:-</b> XGBoost is one of the most popular and widely used algorithms today because it is simply so powerful.</p><ul><li>It is similar to Gradient Boost but has a few extra features that make it that much stronger.</li><li>Newton Boosting — Provides a direct route to the minima than gradient descent, making it much faster.</li><li>An extra randomization parameter — reduces the correlation between trees, ultimately improving the strength of the ensemble.</li></ul><figure id="a3f3"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*h4IFwiwb-sBFlEAAd8CFQg.jpeg"><figcaption>Screenshot from Internet</figcaption></figure><p id="0ae7"><b>10. Light GBM:-</b> It is another type of boosting algorithm that has shown to be faster and sometimes more accurate than XGBoost.</p><ul><li>It uses a unique technique called Gradient-based One-side sampling (GOSS) to filter out the data instances to find a split value.</li></ul><figure id="4bbe"><img src=

Options

"https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*O1h6FJmMsgaDfcziUsZmeA.jpeg"><figcaption>Screenshot from Internet</figcaption></figure><p id="5611"><b>11. Naive Bayes:-</b> It is a classification algorithm used for binary (two-class) and multiclass classification problems. It is used when the output variable is discrete.</p><ul><li>As the name specifies, this algorithm is entirely based on Bayes's theorem. Bayes’ theorem says we can calculate the probability of a piece of data belonging to a given class if prior knowledge is given.</li><li><b>P(class|data) = (P(data|class) * P(class)) / P(data)</b></li></ul><figure id="61ff"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*T_Xi7aM3TnCMllndAVSK_g.jpeg"><figcaption>Screenshot from Internet</figcaption></figure><p id="9e0b">Please let me know in the comment section if I have forgotten any other important ML algorithm. Thanks for reading.</p><p id="a099">So, if any of my blog posts have been of help to you, and you feel generous at the moment, don’t hesitate to buy me a coffee. ☕😍</p><figure id="bb7d"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*xpxVFhGlTQmm57O_jswZ-Q.png"><figcaption></figcaption></figure><p id="b0c6">Yes, click me.</p><div id="f9f8"><pre>And yes, buying <span class="hljs-keyword">me</span> a coffee (<span class="hljs-keyword">and</span> lots <span class="hljs-keyword">of</span> <span class="hljs-keyword">it</span> <span class="hljs-keyword">if</span> you are feeling extra generous) goes a long way <span class="hljs-keyword">in</span> ensuring <span class="hljs-keyword">that</span> I keep producing content <span class="hljs-keyword">every</span> <span class="hljs-built_in">day</span> <span class="hljs-keyword">in</span> <span class="hljs-keyword">the</span> years <span class="hljs-keyword">to</span> come.</pre></div><p id="6ad5">You can reach me via the following :</p><ol><li>Subscribe to my <a href="https://www.youtube.com/channel/UCdwAaZMWiRmvIBIT96ApVjw"><b>YouTube channel</b></a> for video content coming soon <a href="https://www.youtube.com/channel/UCdwAaZMWiRmvIBIT96ApVjw"><b>here</b></a></li><li>Follow me on <a href="https://medium.com/@TechyKajal"><b>Medium</b></a></li><li>Connect and reach me on <a href="http://www.linkedin.com/in/techykajal"><b>LinkedIn</b></a></li><li>Become a member:- <a href="https://techykajal.medium.com/membership">https://techykajal.medium.com/membership</a></li></ol><p id="2c1c">Check out my other Blogs as well:</p><div id="833e" class="link-block"> <a href="https://towardsdatascience.com/15-free-open-source-data-resources-for-your-next-data-science-project-6480edee9bc1"> <div> <div> <h2>15 free & open-source data resources for your next data science project</h2> <div><h3>A consolidated list of free datasets organized by different categories for beginners as well as professionals</h3></div> <div><p>towardsdatascience.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*wWnxaX6uhm3Gd7y7)"></div> </div> </div> </a> </div><div id="7bdf" class="link-block"> <a href="https://towardsdatascience.com/8-ml-ai-projects-to-make-your-portfolio-stand-out-bfc5be94e063"> <div> <div> <h2>8 ML/AI Projects To Make Your Portfolio Stand Out</h2> <div><h3>Interesting project ideas with source code and reference articles, also attaching some research papers too.</h3></div> <div><p>towardsdatascience.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*AUTCQVNj34KP_doXcI8uPA.png)"></div> </div> </div> </a> </div></article></body>

11 ML Algorithms You Should Know

Important Machine Learning algorithms in 2021

Photo by Markus Spiske on Unsplash

Of course, there is a lot of advancement that happened all over the years in the field of data science. Some new efficient algorithms have been proposed as well through different research works. But few basics will remain the fundamentals of all advanced algorithms.

Below is the list of such 11 ML algorithms which have been mostly used in a data science field and asked in Data science interviews as well.

  1. Linear Regression:- It is the most well-known and well-understood algorithm in statistics and machine learning. Linear regression is a linear model, e.g., a model that assumes a linear relationship between the input variables (x) and the single output variable (y) which means y can be easily computed using the linear relationship.
  • When there is a single input variable(x), the equation is referred to as simple linear regression.
  • When there is more than one input variable, the equation is referred to as multiple linear regression.

The Equation: y = B0 + B1*x (where x is input variable, y is the output variable and B0 and B1 are the coefficients)

The line of best fit is found by minimizing the squared distances between the points and the line of best fit and this is known as minimizing the sum of squared residuals.

A residual is simply equal to the predicted value minus the actual value.

Screenshot from Internet

2. Logistic Regression:- Logistic regression is a classification algorithm based on the function which is used at the core of the method, logistic function or sigmoid function. It’s an S-shaped curve that is used to predict a binary outcome (1/0, Yes/No, True/False) given a set of independent variables.

  • It can also be thought of as a special case of linear regression when the outcome variable is categorical, where we are using the log of odds as a dependent variable.
  • It predicts the probability of occurrence of an event by fitting data to a logit function.

p(X) = e^(b0 + b1*X) / (1 + e^(b0 + b1*X))

Screenshot from internet

3. K-Nearest Neighbours:- K-nearest neighbors (KNN) algorithm is a supervised machine learning algorithm that can be used to solve both classification and regression problems.

  • It works by finding the distances between the new data point added and the points already existed in the two separate classes. Whatever class got the highest votes, the new data point belongs to that class.

EuclideanDistance(x, xi) = sqrt( sum( (xj — xij)² ) )

Screenshot from internet

4. Support Vector Machines:- It is a supervised machine learning algorithm that also can be used for both tasks:- classification as well as regression. however, It is mostly used in classification problems.

  • An SVM will find a hyperplane or a boundary between the two classes of data that maximizes. There are other planes as well which can separate the two classes, but only the SVM hyperplane can maximize the margin between the classes.

B0 + (B1 * X1) + (B2 * X2) = 0 where, B1 and B2 determines the slope of the line and B0 (intercept) found by the learning algorithm. X1 and X2 are the two input variables.

Screenshot from Internet

5. Decision Trees:- Decision tree algorithms are referred to as CART or Classification and Regression Trees. It is a flowchart like a tree structure, where each internal node denotes a test on an attribute, each branch represents an outcome of the test, and each leaf node (terminal node) holds a class label.

  • A Gini score gives an idea of how good a split is by how mixed the response classes are in the groups created by the split.
Screenshot from Internet

6. Random Forest:- Random Forests are an ensemble learning technique that builds off of decision trees.

  • Random forests involve creating multiple decision trees using bootstrapped datasets of the original data and randomly selecting a subset of variables at each step of the decision tree.
  • The model then selects the mode of all of the predictions of each decision tree (bagging).
Screenshot from Internet

7. AdaBoost:- Adaptive boost is also an ensemble algorithm that leverages bagging and boosting methods to develop an enhanced predictor.

  • AdaBoost creates a forest of stumps rather than trees. A stump is a tree that is made of only one node and two leaves.
  • AdaBoost takes a more iterative approach in the sense that it seeks to iteratively improve from the mistakes that the previous stump(s) made.
Screenshot from Internet

8. Gradient Boost:- Gradient Boost is also an ensemble algorithm that uses boosting methods to develop an enhanced predictor.

  • Unlike AdaBoost which builds stumps, Gradient Boost builds trees with usually 8–32 leaves.
  • Gradient Boost views the boosting problem as an optimization problem, where it uses a loss function and tries to minimize the error. This is why it’s called Gradient boost, as it’s inspired by gradient descent.
Screenshot from Internet

9. XG Boost:- XGBoost is one of the most popular and widely used algorithms today because it is simply so powerful.

  • It is similar to Gradient Boost but has a few extra features that make it that much stronger.
  • Newton Boosting — Provides a direct route to the minima than gradient descent, making it much faster.
  • An extra randomization parameter — reduces the correlation between trees, ultimately improving the strength of the ensemble.
Screenshot from Internet

10. Light GBM:- It is another type of boosting algorithm that has shown to be faster and sometimes more accurate than XGBoost.

  • It uses a unique technique called Gradient-based One-side sampling (GOSS) to filter out the data instances to find a split value.
Screenshot from Internet

11. Naive Bayes:- It is a classification algorithm used for binary (two-class) and multiclass classification problems. It is used when the output variable is discrete.

  • As the name specifies, this algorithm is entirely based on Bayes's theorem. Bayes’ theorem says we can calculate the probability of a piece of data belonging to a given class if prior knowledge is given.
  • P(class|data) = (P(data|class) * P(class)) / P(data)
Screenshot from Internet

Please let me know in the comment section if I have forgotten any other important ML algorithm. Thanks for reading.

So, if any of my blog posts have been of help to you, and you feel generous at the moment, don’t hesitate to buy me a coffee. ☕😍

Yes, click me.

And yes, buying me a coffee (and lots of it if you are feeling extra generous) goes a long way in ensuring that I keep producing content every day in the years to come.

You can reach me via the following :

  1. Subscribe to my YouTube channel for video content coming soon here
  2. Follow me on Medium
  3. Connect and reach me on LinkedIn
  4. Become a member:- https://techykajal.medium.com/membership

Check out my other Blogs as well:

Data Science
Data
Machine Learning
Algorithms
Python
Recommended from ReadMedium