avatarRuslan Brilenkov

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

4156

Abstract

filtering (<code>spam/no spam</code>, to filter out annoying spam emails);</li><li>document classification (to classify the documents into various types);</li><li>biometrical identification (to assign a label representing a particular person);</li><li>handwriting recognition (such as recognizing the <code>digits</code> and <code>letters</code> as separate classes);</li><li>breast cancer identification (<code>benign/malignant</code>, to predict the state of cancer and assign a specific treatment);</li><li>mushroom types (<code>edible/poisonous</code>, classification of mushrooms);</li></ul><p id="4b34">… to name a few.</p><p id="2702">As you can see, the range of possible applications of the classification approach is extremely wide. That should be enough by itself to motivate anyone to learn classification techniques.</p><blockquote id="7025"><p><i>Now, you might be thinking </i>“That sounds great, Ruslan. I see your point. Just show me how to use that classification and I can apply it to any of the above-mentioned problems!”.</p></blockquote><p id="7aff">Hold on, my friend! I am <b>not</b> talking about classification in analogy to <i>the Lord of the Rings:</i></p><p id="63fa" type="7">One approach to rule them all.</p><p id="3c08" type="7">Rather one approach = various algorithms.</p><p id="1677">I am talking about the whole range of topics that can be addressed by the classification approach. That approach encompasses different algorithms within itself.</p><p id="02e1">Of course, there are many types of classification algorithms. Different kinds of data might and do require different approaches. So, one algorithm that might be good for one specific dataset, might be of no use to another.</p><blockquote id="af8a"><p><b>Note:</b> <i>Usually, there is no way to know which algorithm will perform better for a given dataset — so, try them out, tune the parameters, and see what works better!</i></p></blockquote><p id="e71c">Let us talk about the 7 most popular classification algorithms.</p><h1 id="5f0e">The 7 Types of Classification Algorithms</h1><p id="e4c7">Here is a list of the most common classification algorithms:</p><ul><li><a href="https://medium.datadriveninvestor.com/k-nearest-neighbors-k-nn-machine-learning-for-complete-beginners-662b8b767ddc">K-nearest neighbors (K-NN)</a></li><li><a href="https://readmedium.com/logistic-regression-machine-learning-for-complete-beginners-cd777b32c50b">Logistic Regression</a></li><li><a href="https://medium.datadriveninvestor.com/support-vector-machine-svm-machine-learning-for-complete-beginners-7657854a2780">Support Vector Machines (SVM)</a></li><li><a href="https://readmedium.com/master-decision-trees-machine-learning-for-complete-beginners-fe4bc1e05f96">Decision Trees</a></li><li>Naive Bayes</li><li><a href="https://medium.datadriveninvestor.com/artificial-neural-network-nn-explained-in-5-minutes-with-animations-9a80f49ab190">Neural Networks (NN)</a></li><li>Linear Discriminant Analysis (LDA)</li></ul><p id="e033">While each classifier has its pros and cons, it is possible to train and use multiple classifiers. Then, make a final decision based on their results.</p><p id="47bb">For example, the K-NN classification works well in the low-dimension feature space but it fails in the high-dimension one. This is called the <i>curse of dimensionality. </i>In this case, the SVM algorithm works better and is more efficient with memory. However, the SVM is prone to overfitting and does not provide a probability estimation, which is usually desirable for classification algorithms (such as, we want to know how confident is our algorithm in assigning one or another class/label).</p><h2 id="d6d0">The Best Classification Algorithm?</h2><p id="39dc">In <a href="https://ti.arc.nasa.gov/m/profile/dhw/papers/78.pdf">their paper</a> with the beautiful name "No Free Lunch Theorems for Optimization", David H. Wolpert and William G. Macready (1997) stated:</p><p id="b111" type="7">Roughly speaking, we show that for both static and time-dependent optimization problems, the average performance of any pair of algorithms across all possible problems is identical

Options

.</p><p id="5add">But we need to choose the best solution for our problem, right?</p><p id="da30">There might not be an ideal algorithm (in comparison to all the other algorithms) but there is a <b><i>better</i></b> algorithm for a given type of data, given task, and the proper tuning of the algorithm’s parameters.</p><p id="a5f9">In general, to address the success of a classification algorithm, one can use some metric of its performance, such as accuracy, F1-score, misclassification rate, etc.</p><h1 id="8023">To be continued</h1><p id="8c76">To make this overview as short and as useful as possible, we are concluding it here. The next natural question I would like to address is a variety of performance metrics. That is how we can judge and choose the best classification algorithm.</p><p id="8e1d">Here is an <a href="https://medium.datadriveninvestor.com/9-types-of-performance-evaluation-for-classification-machine-learning-modeling-c6e73e97e528">article</a> describing the <i>9 Types of Performance Evaluation for Classification Machine Learning Modeling</i>.</p><h1 id="d362">Want to learn more?</h1><p id="e289">If you are interested in brushing up your knowledge of Python, here is a brief tutorial: from <a href="https://readmedium.com/python-tutorial-for-complete-beginners-from-hello-world-to-functions-47ceb8b96555">Hello Wolrd to Functions</a>. As I mentioned earlier, here are a few of my projects explained from the beginning until the end. It can be a great opportunity to learn something new and practice some of the Python skills:</p><ul><li><a href="https://readmedium.com/track-your-weight-with-python-4bf0cae42ef3">tracking your weight</a> from scratch in Python</li><li>analyzing <a href="https://readmedium.com/analyzing-covid-19-papers-with-python-part-1-22706eb92270">COVID-19 scientific papers</a></li><li>creating a <a href="https://readmedium.com/create-productivity-app-in-python-from-scratch-part-1-d715d1f393db">productivity app</a> (Pomodoro) from scratch</li></ul><p id="de75">Thank you for reading until the end. I hope you have learned something new. If you would like to connect or have any questions please do not hesitate to contact me.</p><p id="169b">Are you curious about the emerging field of Prompt Engineering? Grab <a href="https://ruslanbrilenkov.gumroad.com/l/promptengineering300">my new e-book</a>! You will learn and master everything from fundamental concepts to practical tips and real-world applications. Additionally, you will receive a bonus of 300 prompts and some of the free resources to kick-start your AI-driven journey. With all this value packed into one e-book, what is the price? The cost of a cup of coffee! Do not miss out on this opportunity to take your skills to the next level!</p><div id="0e12" class="link-block"> <a href="https://ruslanbrilenkov.gumroad.com/l/promptengineering300"> <div> <div> <h2>Prompt Engineering, 300 Prompts, & Free AI Resources</h2> <div><h3>Welcome to this e-book on prompt engineering — a rapidly growing field in artificial intelligence. This comprehensive…</h3></div> <div><p>ruslanbrilenkov.gumroad.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*kbPKUVsdzyKqgLhI)"></div> </div> </div> </a> </div><h1 id="6166">Contact</h1><p id="a353"><a href="https://www.linkedin.com/in/ruslan-brilenkov/"><b><i>LinkedIn</i></b></a></p><p id="7e91"><i>I recently started a <a href="https://bit.ly/RBrilenkovYT"><b>YouTube channel</b></a><b> </b>where I talk about different topics, including data science and AI news, research, and life in general among others. It is a steep learning curve for me but I invite you to <a href="https://bit.ly/RBrilenkovYT">check it out here</a>.</i></p><p id="f07a"><i>Never miss a story, join my <a href="https://ruslan-brilenkov.medium.com/subscribe"><b>mailing list</b></a>!</i></p><p id="5349"><a href="https://github.com/RuslanBrilenkov"><b><i>GitHub</i></b></a></p></article></body>

7 Types of ML Classification Algorithms.

An overview of Machine Learning classification algorithms. The best algorithm and "No free lunch theorem".

Photo by Ian Taylor on Unsplash

Intro

This article is part of my big project. The main idea is to teach everybody Python, Data Science, and Machine Learning (ML) despite their educational background. I try to explain everything in simple terms but not compromise the quality.

If ML algorithms were a tasty soup, it is still possible to eat it without knowing what is inside. However, I prefer to shed light on the ingredients and the way it was cooked.

We already covered the reasons for learning Machine Learning, went through general terms/definitions, explored which common Python libraries are used for Data Science and ML, and loaded our first dataset!

Here, we are going to look into one specific ML approach for predictive modeling, called Classification. We will go through the definition of the classification modeling and a range of real-life applications.

My idea is to make this overview concise but broad at the same time. Thus, I will cover both a general overview of the topic and provide the links to each specific algorithm for practical exercises.

So, by the end of this article, you will get a general idea about different classification algorithms and will be able to dive deeper with a few of them right away*.

* I am adding the new articles as I write them (so, please check out for updates).

It is said “Give one a fish and it will feed one for a day. Teach one how to catch a fish and it will feed one for the rest of one's life.”

My purpose in writing this article is to teach you how to catch that fish by combining theory and hands-on examples to kick-start your ML journey.

So, let us begin.

What is Classification?

Generally speaking, Classification is an act or result of classifying (according to this dictionary).

In ML, by Classification, we mean a supervised learning approach referring to classifying or categorizing some unknown items into a discrete set of classes.

Talking in terms of feature and target variables, classification attempts to learn the relationship between them.

Simply speaking, it is all about predicting a label (a categorical variable with discrete values).

Why do we care about classification algorithms in Data Science?

As mentioned above, the point of classification is categorizing a given dataset into a set of labels/classes. What does it mean for real life?

It means, using the classification technique, we can assign a set of labels to any dataset that is possible to divide into classes or assign the labels to.

Depending on your area of interest, classification is used in a wide range of industries. Some of the real-world examples are:

  • customer churn (leave/stay, for predicting if the customer will leave to another company/provider or not);
  • email filtering (spam/no spam, to filter out annoying spam emails);
  • document classification (to classify the documents into various types);
  • biometrical identification (to assign a label representing a particular person);
  • handwriting recognition (such as recognizing the digits and letters as separate classes);
  • breast cancer identification (benign/malignant, to predict the state of cancer and assign a specific treatment);
  • mushroom types (edible/poisonous, classification of mushrooms);

… to name a few.

As you can see, the range of possible applications of the classification approach is extremely wide. That should be enough by itself to motivate anyone to learn classification techniques.

Now, you might be thinking “That sounds great, Ruslan. I see your point. Just show me how to use that classification and I can apply it to any of the above-mentioned problems!”.

Hold on, my friend! I am not talking about classification in analogy to the Lord of the Rings:

One approach to rule them all.

Rather one approach = various algorithms.

I am talking about the whole range of topics that can be addressed by the classification approach. That approach encompasses different algorithms within itself.

Of course, there are many types of classification algorithms. Different kinds of data might and do require different approaches. So, one algorithm that might be good for one specific dataset, might be of no use to another.

Note: Usually, there is no way to know which algorithm will perform better for a given dataset — so, try them out, tune the parameters, and see what works better!

Let us talk about the 7 most popular classification algorithms.

The 7 Types of Classification Algorithms

Here is a list of the most common classification algorithms:

While each classifier has its pros and cons, it is possible to train and use multiple classifiers. Then, make a final decision based on their results.

For example, the K-NN classification works well in the low-dimension feature space but it fails in the high-dimension one. This is called the curse of dimensionality. In this case, the SVM algorithm works better and is more efficient with memory. However, the SVM is prone to overfitting and does not provide a probability estimation, which is usually desirable for classification algorithms (such as, we want to know how confident is our algorithm in assigning one or another class/label).

The Best Classification Algorithm?

In their paper with the beautiful name "No Free Lunch Theorems for Optimization", David H. Wolpert and William G. Macready (1997) stated:

Roughly speaking, we show that for both static and time-dependent optimization problems, the average performance of any pair of algorithms across all possible problems is identical.

But we need to choose the best solution for our problem, right?

There might not be an ideal algorithm (in comparison to all the other algorithms) but there is a better algorithm for a given type of data, given task, and the proper tuning of the algorithm’s parameters.

In general, to address the success of a classification algorithm, one can use some metric of its performance, such as accuracy, F1-score, misclassification rate, etc.

To be continued

To make this overview as short and as useful as possible, we are concluding it here. The next natural question I would like to address is a variety of performance metrics. That is how we can judge and choose the best classification algorithm.

Here is an article describing the 9 Types of Performance Evaluation for Classification Machine Learning Modeling.

Want to learn more?

If you are interested in brushing up your knowledge of Python, here is a brief tutorial: from Hello Wolrd to Functions. As I mentioned earlier, here are a few of my projects explained from the beginning until the end. It can be a great opportunity to learn something new and practice some of the Python skills:

Thank you for reading until the end. I hope you have learned something new. If you would like to connect or have any questions please do not hesitate to contact me.

Are you curious about the emerging field of Prompt Engineering? Grab my new e-book! You will learn and master everything from fundamental concepts to practical tips and real-world applications. Additionally, you will receive a bonus of 300 prompts and some of the free resources to kick-start your AI-driven journey. With all this value packed into one e-book, what is the price? The cost of a cup of coffee! Do not miss out on this opportunity to take your skills to the next level!

Contact

LinkedIn

I recently started a YouTube channel where I talk about different topics, including data science and AI news, research, and life in general among others. It is a steep learning curve for me but I invite you to check it out here.

Never miss a story, join my mailing list!

GitHub

Data Science
Machine Learning
Classification
Python
Recommended from ReadMedium