Free AI web copilot to create summaries, insights and extended knowledge, download it at here

Abstract

e</b> sketch made by the <b>author</b>. The kernel trick. In the original space the data are not linearly separable but after projecting to a higher dimensional space, they are.</figcaption></figure><h1 id="d7eb">Python working example using the Iris dataset and a linear SVC model in scikit-learn</h1><p id="e409">Reminder: The Iris dataset consists of 150 samples of flowers each having 4 features/variables (i.e. sepal width/length and petal width/length).</p><h1 id="9ddb">2D</h1><p id="237a"><b>Let’s plot the decision boundary in 2D (we will only use 2 features of the dataset):</b></p><div id="5983"><pre><span class="hljs-title">from</span> sklearn.svm <span class="hljs-keyword">import</span> SVC <span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np <span class="hljs-keyword">import</span> matplotlib.pyplot <span class="hljs-keyword">as</span> plt <span class="hljs-title">from</span> sklearn <span class="hljs-keyword">import</span> svm, datasets</pre></div><div id="8779"><pre><span class="hljs-attr">iris</span> = datasets.load_iris()<span class="hljs-comment"># Select 2 features / variables</span> <span class="hljs-attr">X</span> = iris.data[:, :<span class="hljs-number">2</span>] <span class="hljs-comment"># we only take the first two features.</span> <span class="hljs-attr">y</span> = iris.target <span class="hljs-attr">feature_names</span> = iris.feature_names[:<span class="hljs-number">2</span>] <span class="hljs-attr">classes</span> = iris.target_names</pre></div><div id="d1e7"><pre>def <span class="hljs-built_in">make_meshgrid</span>(x, y, h=.<span class="hljs-number">02</span>): x_min, x_max = x.<span class="hljs-built_in">min</span>() — <span class="hljs-number">1</span>, x.<span class="hljs-built_in">max</span>() + <span class="hljs-number">1</span> y_min, y_max = y.<span class="hljs-built_in">min</span>() — <span class="hljs-number">1</span>, y.<span class="hljs-built_in">max</span>() + <span class="hljs-number">1</span> xx, yy = np.<span class="hljs-built_in">meshgrid</span>(np.<span class="hljs-built_in">arange</span>(x_min, x_max, h), np.<span class="hljs-built_in">arange</span>(y_min, y_max, h)) return xx, yy</pre></div><div id="9b3b"><pre>def <span class="hljs-built_in">plot_contours</span>(ax, clf, xx, yy, **params): Z = clf.<span class="hljs-built_in">predict</span>(np.c_[xx.<span class="hljs-built_in">ravel</span>(), yy.<span class="hljs-built_in">ravel</span>()]) Z = Z.<span class="hljs-built_in">reshape</span>(xx.shape) out = ax.<span class="hljs-built_in">contourf</span>(xx, yy, Z, **params) return out</pre></div><div id="44ae"><pre><span class="hljs-comment"># The classification SVC model</span> <span class="hljs-attr">model</span> = svm.SVC(kernel=<span class="hljs-string">"linear"</span>) <span class="hljs-attr">clf</span> = model.fit(X, y)</pre></div><div id="47d9"><pre><span class="hljs-built_in">fig,</span> ax = plt.subplots()</pre></div><div id="9957"><pre><span class="hljs-comment"># title for the plots</span> <span class="hljs-attribute">title</span> = (‘Decision surface of linear SVC ‘) <span class="hljs-comment"># Set-up grid for plotting.</span> <span class="hljs-attribute">X0</span>, X1 = X[:, <span class="hljs-number">0</span>], X[:, <span class="hljs-number">1</span>] <span class="hljs-attribute">xx</span>, yy = make_meshgrid(X0, X1)</pre></div><div id="cc9a"><pre><span class="hljs-function"><span class="hljs-title">plot_contours</span><span class="hljs-params">(ax, clf, xx, yy, cmap=plt.cm.coolwarm, alpha=<span class="hljs-number">0.8</span>)</span></span> ax<span class="hljs-selector-class">.scatter</span>(X0, X1, c=y, cmap=plt<span class="hljs-selector-class">.cm</span><span class="hljs-selector-class">.coolwarm</span>, s=<span class="hljs-number">20</span>, edgecolors=<span class="hljs-string">"k"</span>) ax<span class="hljs-selector-class">.set_ylabel</span>(<span class="hljs-string">"{}"</span><span class="hljs-selector-class">.format</span>(feature_names<span class="hljs-selector-attr">[0]</span>)) ax<span class="hljs-selector-class">.set_xlabel</span>(<span class="hljs-string">"{}"</span><span class="hljs-selector-class">.format</span>(feature_names<span class="hljs-selector-attr">[1]</span>)) ax<span class="hljs-selector-class">.set_xticks</span>(()) ax<span class="hljs-selector-class">.set_yticks</span>(()) ax<span class="hljs-selector-class">.set_title</span>(title) plt<span class="hljs-selector-class">.show</span>()</pre></div><figure id="8339"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*qmNy2D4ZCheIpcUvbhgemQ.png"><figcaption>Output of the above python code. Fugure generated by the author.</figcaption></figure><p id="09ed">In the iris dataset, we have 3 classes of flowers and 4 features. Here we only used 2 features (so we have a 2<b>-dimensional feature space)</b> and we plotted the decision boundary of the linear SVC model. The colors of the points correspond to the classes/groups.</p><h1 id="bfac">3D</h1><p id="a0aa"><b>Let’s plot the decision boundary in 3D (we will only use 3features of the dataset):</b></p><div id="9844"><pre><span class="hljs-title">from</span> sklearn.svm <span class="hljs-keyword">import</span> SVC <span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np <span class="hljs-keyword">import</span> matplotlib.pyplot <span class="hljs-keyword">as</span> plt <span class="hljs-title">from</span> sklearn <span class="hljs-keyword">import</span> svm, datasets <span class="hljs-title">from</span> mpl_toolkits.mplot3d <span class="hljs-keyword">import</span> Axes3D</pre></div><div id="ebad"><pre><span class="hljs-attr">iris</span> = datasets.load_iris() <span class="hljs-attr">X</span> = iris.data[:, :<span class="hljs-number">3</span>] <span class="hljs-comment"># we only take the first three features.</span> <span class="hljs-attr">Y</span> = iris.target</pre></div><div id="b4e1"><pre>#make it binary classification problem <span class="hljs-keyword">X</span> = <span class="hljs-keyword">X</span>[np.logical_or(<span class="hljs-keyword">Y</span>==<span class="hljs-number">0</span>,<span class="hljs-keyword">Y</span>==<span class="hljs-number">1</span>)] <span class="hljs-keyword">Y</span> = <span class="hljs-keyword">Y</span>[np.logical_or(<span class="hljs-keyword">Y</span>==<span class="hljs-number">0</span>,<span class="hljs-keyword">Y</span>==<span class="hljs-number">1</span>)]</pre></div><div id="2711"><pre><span class="hljs-attr">model</span> = svm.SVC(kernel=<span class="hljs-string">'linear'</span>) <span class="hljs-attr">clf</span> = model.fit(X, Y)</pre></div><div id="7bd1"><pre># The equation of the separating plane <span class="hljs-keyword">is</span> given by <span class="hljs-built_in">all</span> x so that np.dot(svc.coef_[<span class="hljs-number">0</span>], x) + b <span class="hljs-built_in">=</span> <span class="hljs-number">0</span>.</pre></div><div id="a8f7"><pre><span class="hljs-comment"># Solve for w3 (z)</span> <span class="hljs-attribute">z</span> = lambda x,y: (-clf.intercept_[<span class="hljs-number">0</span>]-clf.coef_[<span class="hljs-number">0</span>][<span class="hljs-number">0</span>]*x -clf.coef_[<span class="hljs-number">0</span>][<span class="hljs-number">1</span>]*y) / clf.coef_[<span class="hljs-number">0</span>][<span class="hljs-number">2</span>] <span class="hljs-attribute">tmp</span> = np.linspace(-<span class="hljs-number">5</span>,<span class="hljs-number">5</span>,<span class="hljs-number">30</span>) <span class="hljs-attribute">x</span>,y = np.meshgrid(tmp,tmp)</pre></div><div id="f2df"><pre>fig = plt.figure() ax = fig.add_subplot(111, <span class="hljs-attribute">projection</span>=<span class="hljs-string">'3d'</span>) ax.plot3D(X[<span class="hljs-attribute">Y</span>==0,0], X[<span class="hljs-attribute">Y</span>==0,1], X[<span class="hljs-attribute">Y</span>==0,2],<span class="hljs-string">'ob'</span>) ax.plot3D(X[<span class="hljs-attribute">Y</span>==1,0], X[<span class="hljs-attribute">Y</span>==1,1], X[<span class="hljs-attribute">Y</span>==1,2],<span class="hljs-string">'sr'</span>) ax.plot_surface(x, y, z(x,y)) ax.view_init(30, 60) plt.show()</pre></div><figure id="341e"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*l1JnixlQTZiZfZPW2YY3sw.png"><figcaption>Output of the above python code. Figure generated by the author.</figcaption></figure><p id="8785">In the iris dataset, we have 3 classes of flowers and 4 features. Here we only used 3 features (so we have a <b>3-dimensional feature space</b>) and <b>only 2 classes</b> (binary classification problem). We then plotted the decision boundary of the linear SVC model. The colors of the points correspond to the 2 classes/groups.</p><h1 id="f769">Plotting the support vectors</h1><div id="5a3a"><pre><span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np <span class="hljs-keyword">import</span> matplotlib.pyplot <span class="hljs-keyword">as</span> plt <span class="hljs-title">from</span> sklearn <span class="hljs-keyword">import</span> svm <span class="hljs-title">np</span>.random.seed(<span class="hljs-number">2</span>)</pre></div><div id="6097"><pre># we create <span class="hljs-number">40</span> linearly separable <span class="hljs-built_in">points</span> X = <span class="hljs-buil

Options

t_in">np</span>.r_[<span class="hljs-built_in">np</span>.<span class="hljs-built_in">random</span>.randn(<span class="hljs-number">20</span>, <span class="hljs-number">2</span>) — [<span class="hljs-number">2</span>, <span class="hljs-number">2</span>], <span class="hljs-built_in">np</span>.<span class="hljs-built_in">random</span>.randn(<span class="hljs-number">20</span>, <span class="hljs-number">2</span>) + [<span class="hljs-number">2</span>, <span class="hljs-number">2</span>]] Y = [<span class="hljs-number">0</span>] * <span class="hljs-number">20</span> + [<span class="hljs-number">1</span>] * <span class="hljs-number">20</span></pre></div><div id="31be"><pre><span class="hljs-comment"># fit the model</span> clf = svm.SVC(<span class="hljs-attribute">kernel</span>=’linear’, <span class="hljs-attribute">C</span>=1) clf.fit(X, Y)</pre></div><div id="5f2b"><pre><span class="hljs-comment"># get the separating hyperplane</span> <span class="hljs-attr">w</span> = clf.coef_[<span class="hljs-number">0</span>] <span class="hljs-attr">a</span> = -w[<span class="hljs-number">0</span>] / w[<span class="hljs-number">1</span>] <span class="hljs-attr">xx</span> = np.linspace(-<span class="hljs-number">5</span>, <span class="hljs-number">5</span>) <span class="hljs-attr">yy</span> = a * xx — (clf.intercept_[<span class="hljs-number">0</span>]) / w[<span class="hljs-number">1</span>]</pre></div><div id="bdb7"><pre>margin = <span class="hljs-number">1</span> / <span class="hljs-built_in">np</span>.<span class="hljs-built_in">sqrt</span>(<span class="hljs-built_in">np</span>.<span class="hljs-built_in">sum</span>(clf.coef_ ** <span class="hljs-number">2</span>)) yy_down = yy — <span class="hljs-built_in">np</span>.<span class="hljs-built_in">sqrt</span>(<span class="hljs-number">1</span> + a ** <span class="hljs-number">2</span>) * margin yy_up = yy + <span class="hljs-built_in">np</span>.<span class="hljs-built_in">sqrt</span>(<span class="hljs-number">1</span> + a ** <span class="hljs-number">2</span>) * margin</pre></div><div id="19cc"><pre>plt<span class="hljs-selector-class">.figure</span>(<span class="hljs-number">1</span>, figsize=(<span class="hljs-number">4</span>, <span class="hljs-number">3</span>)) plt<span class="hljs-selector-class">.clf</span>() plt<span class="hljs-selector-class">.plot</span>(xx, yy, <span class="hljs-string">"k-"</span>) plt<span class="hljs-selector-class">.plot</span>(xx, yy_down, <span class="hljs-string">"k-"</span>) plt<span class="hljs-selector-class">.plot</span>(xx, yy_up, <span class="hljs-string">"k-"</span>)</pre></div><div id="2ca9"><pre>plt.scatter(clf.support_vectors_[:, 0], clf.support_vectors_[:, 1], <span class="hljs-attribute">s</span>=80, <span class="hljs-attribute">facecolors</span>=<span class="hljs-string">"none"</span>, <span class="hljs-attribute">zorder</span>=10, <span class="hljs-attribute">edgecolors</span>=<span class="hljs-string">"k"</span>) plt.scatter(X[:, 0], X[:, 1], <span class="hljs-attribute">c</span>=Y, <span class="hljs-attribute">zorder</span>=10, <span class="hljs-attribute">cmap</span>=plt.cm.Paired, <span class="hljs-attribute">edgecolors</span>=<span class="hljs-string">"k"</span>) plt.xlabel(<span class="hljs-string">"x1"</span>) plt.ylabel(<span class="hljs-string">"x2"</span>) plt.show()</pre></div><figure id="2bbb"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*QgILzwi9ngc9MRWFwC03iA.png"><figcaption>Output of the above python code. Figure generated by the author.</figcaption></figure><p id="61d4">The <b>double</b>-<b>circled</b> <b>points</b> represent the <b>support</b> <b>vectors</b>.</p><ul><li><b>NEW</b>: After a great deal of hard work and staying behind the scenes for quite a while, we’re excited to now offer our expertise through a platform, the “<a href="https://www.patreon.com/TheDataScienceHub"><b>Data Science Hub</b></a>” on Patreon (<a href="https://www.patreon.com/TheDataScienceHub">https://www.patreon.com/TheDataScienceHub</a>). This hub is our way of providing you with <b>bespoke consulting services</b> and comprehensive <b>responses to all your inquiries</b>, ranging from Machine Learning to strategic data analytics planning.</li></ul><h1 id="09f5">Latest posts</h1><div id="effc" class="link-block"> <a href="https://towardsdatascience.com/time-series-forecasting-predicting-stock-prices-using-facebooks-prophet-model-9ee1657132b5"> <div> <div> <h2>Time-Series Forecasting: Predicting Stock Prices Using Facebook’s Prophet Model</h2> <div><h3>Predict stock prices using a forecasting model publicly available from Facebook: The Prophet</h3></div> <div><p>towardsdatascience.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*hnJmoDkR6-inqCe_JRxW0w.png)"></div> </div> </div> </a> </div><div id="4ebb" class="link-block"> <a href="https://towardsdatascience.com/roc-curve-explained-using-a-covid-19-hypothetical-example-binary-multi-class-classification-bab188ea869c"> <div> <div> <h2>ROC Curve Explained using a COVID-19 hypothetical example: Binary & Multi-Class Classification…</h2> <div><h3>In this post I clearly explain what a ROC curve is and how to read it. I use a COVID-19 example to make my point and I…</h3></div> <div><p>towardsdatascience.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*qW3Mobeew1xxnXJnBPy8LQ.jpeg)"></div> </div> </div> </a> </div><div id="eef8" class="link-block"> <a href="https://towardsdatascience.com/pca-clearly-explained-how-when-why-to-use-it-and-feature-importance-a-guide-in-python-7c274582c37e"> <div> <div> <h2>PCA clearly explained — How, when, why to use it and feature importance: A guide in Python</h2> <div><h3>In this post I explain what PCA is, when and why to use it and how to implement it in Python using scikit-learn. Also…</h3></div> <div><p>towardsdatascience.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*ba0XpZtJrgh7UpzWcIgZ1Q.jpeg)"></div> </div> </div> </a> </div><div id="381c" class="link-block"> <a href="https://towardsdatascience.com/everything-you-need-to-know-about-min-max-normalization-in-python-b79592732b79"> <div> <div> <h2>Everything you need to know about Min-Max normalization in Python</h2> <div><h3>In this post I explain what Min-Max scaling is, when to use it and how to implement it in Python using scikit-learn but…</h3></div> <div><p>towardsdatascience.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*44jKK-vMeP4EGGvyIXPypg.png)"></div> </div> </div> </a> </div><div id="01f5" class="link-block"> <a href="https://towardsdatascience.com/how-and-why-to-standardize-your-data-996926c2c832"> <div> <div> <h2>How Scikit-Learn’s StandardScaler works</h2> <div><h3>In this post I am explaining why and how to apply Standardization using scikit-learn</h3></div> <div><p>towardsdatascience.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*UPLv3kNw9JTtNabr70dQDQ.png)"></div> </div> </div> </a> </div><h2 id="3ae0">Stay tuned & support this effort</h2><p id="05c2">If you liked and found this article useful, <b>follow</b> me! Questions? Post them as a comment and I will reply as soon as possible.</p><h2 id="f003">References</h2><p id="3d49">[1] <a href="https://www.nature.com/articles/nbt1206-1565">https://www.nature.com/articles/nbt1206-1565</a></p><p id="bb63">[1]<a href="https://en.wikipedia.org/wiki/Support_vector_machine"> </a><a href="https://en.wikipedia.org/wiki/Support_vector_machine">https://en.wikipedia.org/wiki/Support_vector_machine</a></p><p id="cf37">[2] <a href="https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html">https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html</a></p><h2 id="0b24">Get in touch with me</h2><ul><li><b>LinkedIn</b>: <a href="https://www.linkedin.com/in/serafeim-loukas/">https://www.linkedin.com/in/serafeim-loukas/</a></li><li><b>ResearchGate</b>: <a href="https://www.researchgate.net/profile/Serafeim_Loukas">https://www.researchgate.net/profile/Serafeim_Loukas</a></li><li><b>EPFL</b> <b>profile</b>: <a href="https://people.epfl.ch/serafeim.loukas">https://people.epfl.ch/serafeim.loukas</a></li><li><b>Stack</b> <b>Overflow</b>: <a href="https://stackoverflow.com/users/5025009/seralouk">https://stackoverflow.com/users/5025009/seralouk</a></li></ul></article></body>

Support Vector Machines (SVM) clearly explained

A Python tutorial for classification problems with 3D plots

In this article I explain the core of the SVMs, why and how to use them. Additionally, I show how to plot the support vectors and the decision boundaries in 2D and 3D.

**Handmade** sketch made by the **author**. An SVM illustration.

Introduction

Everyone has heard about the famous and widely-used Support Vector Machines (SVMs). The original SVM algorithm was invented by Vladimir N. Vapnik and Alexey Ya. Chervonenkis in 1963.

SVMs are supervised machine learning models that are usually employed for classification (SVC — Support Vector Classification) or regression (SVR — Support Vector Regression) problems. Depending on the characteristics of target variable (that we wish to predict), our problem is going to be a classification task if we have a discrete target variable (e.g. class labels), or a regression task if we have a continuous target variable (e.g. house prices).

SVMs are more commonly used for classification problems and for this reason, in this article, I will only focus on the SVC models.

NEW: After a great deal of hard work and staying behind the scenes for quite a while, we’re excited to now offer our expertise through a platform, the “Data Science Hub” on Patreon (https://www.patreon.com/TheDataScienceHub). This hub is our way of providing you with bespoke consulting services and comprehensive responses to all your inquiries, ranging from Machine Learning to strategic data analytics planning.
Another resource. Learn Data Science and ML with the help of an 🤖 AI-powered tutor. Start here https://aigents.co/learn choose a topic and he will show up where you need him. No paywall, no signups, no ads.

Core of the method

In this article, I am not going to go through every step of the algorithm (due to the numerous amount of online resources) but instead, I am going to explain the most important concepts and terms around SVMs.

1. The decision boundary (separating hyperplane)

The SVCs aim to find the best hyperplane (also called decision boundary) that best separates (splits) a dataset into two classes/groups (binary classification problem).

Depending of the number of the input features/variables, the decision boundary can be a line (if we had only 2 features) or a hyperplane if we have more than 2 features in our dataset.

To get the main idea think the following: Each observation (or sample/data-point) is plotted in an N-dimensional space with Nbeing the number of features/variables in our dataset. In that space, the separating hyperplane is an (N-1)-dimensional subspace.

A hyperplane is an (N-1)-dimensional subspace for an N-dimensional space.

So, as stated before, for an 2-dimensional space the decision boundary is going to be just a line as shown below.

**Handmade** sketch made by the **author**. An illustration of the decision boundary of an SVM classification model (SVC) using a dataset with only 2 features (i.e. x1 and x2). The decision boundary is a line.

Mathematically, we can define the decision boundary as follows:

Rendered latex code written by the author.

2. Support vectors

The Support vectors are just the samples (data-points) that are located nearest to the separating hyperplane. These samples would alter the position of the separating hyperplane, in the event of their removal. Thus, these are the most important samples that define the location and orientation of best decision boundary.

**Handmade** sketch made by the **author**. Point circled with purple color represent the support vectors in this toy 2-dimensional SVM problem.

3. The hard margin: How does SVM find the best hyperplane?

Several different lines (or generally, different decision boundaries) could separate our classes. But which of all is the best one?

**Handmade** sketch made by the **author**. This illustration shows 3 candidate decision boundaries that separate the 2 classes.

The distance between the hyperplane and the nearest data points (samples) is known as the SVM margin. The goal is to choose a hyperplane with the greatest possible margin between the hyperplane and any support vector. SVM algorithm finds the best decision boundary such as the margin is maximized. Here the best line is the yellow line as shown below.

**Handmade** sketch made by the **author**. The best separating line is the yellow one that maximizes the margin (green distance).

In summary, SVMs pick the decision boundary that maximizes the distance to the support vectors. The decision boundary is drawn in a way that the distance to support vectors are maximized. If the decision boundary is too close to the support vectors then, it will be sensitive to noise and not generalize well.

4. A note about the Soft margin and C parameter

Sometimes, we might want to allow (on purpose) some margin of error (misclassification). This is the main idea behind the “soft margin”. The soft margin implementation allows some samples to be misclassified or be on the wrong side of decision boundary allowing highly generalized model.

A soft margin SVM solves the following optimization problem:

Increase the distance of decision boundary to the support vectors (i.e. the margin) and
Maximize the number of points that are correctly classified in the training set.

It is clear that there is a trade-off between these two optimization goals. This trade-off is controlled by the famous C parameter. Briefly, if C is small, the penalty for misclassified data-points is low so a decision boundary with a large margin is chosen at the expense of a greater number of misclassifications. If C is large, SVM tries to minimize the number of misclassified samples and results in a decision boundary with a smaller margin.

5. What happens when there is no clear separating hyperplane (kernel SVM) ?

If we have a dataset that is linearly separable then SVMs job is usually easy. However, in real life, in most of the cases we have a linearly non-separable dataset at hand and this is when the kernel trick provides some magic.

The kernel trick projects the original data points in a higher dimensional space in order to make them linearly separable (in that higher dimensional space).

Thus, by using the kernel trick we can make our non linearly-separable data, linearly separable in a higher dimensional space.

The kernel trick is based on some Kernel functions that measure similarity of the samples. The trick does not actually transform the data points to a new, high dimensional feature space, explicitly. The kernel-SVM computes the decision boundary in terms of similarity measures in a high-dimensional feature space without actually doing the projection. Some famous kernel functions include linear, polynomial, radial basis function (RBF), and sigmoid kernels.

**Handmade** sketch made by the **author**. The kernel trick. In the original space the data are not linearly separable but after projecting to a higher dimensional space, they are.

Python working example using the Iris dataset and a linear SVC model in scikit-learn

Reminder: The Iris dataset consists of 150 samples of flowers each having 4 features/variables (i.e. sepal width/length and petal width/length).

2D

Let’s plot the decision boundary in 2D (we will only use 2 features of the dataset):

from sklearn.svm import SVC
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm, datasets

iris = datasets.load_iris()# Select 2 features / variables
X = iris.data[:, :2] # we only take the first two features.
y = iris.target
feature_names = iris.feature_names[:2]
classes = iris.target_names

def make_meshgrid(x, y, h=.02):
    x_min, x_max = x.min() — 1, x.max() + 1
    y_min, y_max = y.min() — 1, y.max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
    return xx, yy

def plot_contours(ax, clf, xx, yy, **params):
    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    out = ax.contourf(xx, yy, Z, **params)
    return out

# The classification SVC model
model = svm.SVC(kernel="linear")
clf = model.fit(X, y)

fig, ax = plt.subplots()

# title for the plots
title = (‘Decision surface of linear SVC ‘)
# Set-up grid for plotting.
X0, X1 = X[:, 0], X[:, 1]
xx, yy = make_meshgrid(X0, X1)

plot_contours(ax, clf, xx, yy, cmap=plt.cm.coolwarm, alpha=0.8)
ax.scatter(X0, X1, c=y, cmap=plt.cm.coolwarm, s=20, edgecolors="k")
ax.set_ylabel("{}".format(feature_names[0]))
ax.set_xlabel("{}".format(feature_names[1]))
ax.set_xticks(())
ax.set_yticks(())
ax.set_title(title)
plt.show()

Output of the above python code. Fugure generated by the author.

In the iris dataset, we have 3 classes of flowers and 4 features. Here we only used 2 features (so we have a 2-dimensional feature space) and we plotted the decision boundary of the linear SVC model. The colors of the points correspond to the classes/groups.

3D

Let’s plot the decision boundary in 3D (we will only use 3features of the dataset):

from sklearn.svm import SVC
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm, datasets
from mpl_toolkits.mplot3d import Axes3D

iris = datasets.load_iris()
X = iris.data[:, :3] # we only take the first three features.
Y = iris.target

#make it binary classification problem
X = X[np.logical_or(Y==0,Y==1)]
Y = Y[np.logical_or(Y==0,Y==1)]

model = svm.SVC(kernel='linear')
clf = model.fit(X, Y)

# The equation of the separating plane is given by all x so that np.dot(svc.coef_[0], x) + b = 0.

# Solve for w3 (z)
z = lambda x,y: (-clf.intercept_[0]-clf.coef_[0][0]*x -clf.coef_[0][1]*y) / clf.coef_[0][2]
tmp = np.linspace(-5,5,30)
x,y = np.meshgrid(tmp,tmp)

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot3D(X[Y==0,0], X[Y==0,1], X[Y==0,2],'ob')
ax.plot3D(X[Y==1,0], X[Y==1,1], X[Y==1,2],'sr')
ax.plot_surface(x, y, z(x,y))
ax.view_init(30, 60)
plt.show()

Output of the above python code. Figure generated by the author.

In the iris dataset, we have 3 classes of flowers and 4 features. Here we only used 3 features (so we have a 3-dimensional feature space) and only 2 classes (binary classification problem). We then plotted the decision boundary of the linear SVC model. The colors of the points correspond to the 2 classes/groups.

Plotting the support vectors

import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm
np.random.seed(2)

# we create 40 linearly separable points
X = np.r_[np.random.randn(20, 2) — [2, 2], np.random.randn(20, 2) + [2, 2]]
Y = [0] * 20 + [1] * 20

# fit the model
clf = svm.SVC(kernel=’linear’, C=1)
clf.fit(X, Y)

# get the separating hyperplane
w = clf.coef_[0]
a = -w[0] / w[1]
xx = np.linspace(-5, 5)
yy = a * xx — (clf.intercept_[0]) / w[1]

margin = 1 / np.sqrt(np.sum(clf.coef_ ** 2))
yy_down = yy — np.sqrt(1 + a ** 2) * margin
yy_up = yy + np.sqrt(1 + a ** 2) * margin

plt.figure(1, figsize=(4, 3))
plt.clf()
plt.plot(xx, yy, "k-")
plt.plot(xx, yy_down, "k-")
plt.plot(xx, yy_up, "k-")

plt.scatter(clf.support_vectors_[:, 0], clf.support_vectors_[:, 1], s=80,
 facecolors="none", zorder=10, edgecolors="k")
plt.scatter(X[:, 0], X[:, 1], c=Y, zorder=10, cmap=plt.cm.Paired,
 edgecolors="k")
plt.xlabel("x1")
plt.ylabel("x2")
plt.show()

The double-circled points represent the support vectors.

NEW: After a great deal of hard work and staying behind the scenes for quite a while, we’re excited to now offer our expertise through a platform, the “Data Science Hub” on Patreon (https://www.patreon.com/TheDataScienceHub). This hub is our way of providing you with bespoke consulting services and comprehensive responses to all your inquiries, ranging from Machine Learning to strategic data analytics planning.

Latest posts

Time-Series Forecasting: Predicting Stock Prices Using Facebook’s Prophet Model

Predict stock prices using a forecasting model publicly available from Facebook: The Prophet

towardsdatascience.com

ROC Curve Explained using a COVID-19 hypothetical example: Binary & Multi-Class Classification…

In this post I clearly explain what a ROC curve is and how to read it. I use a COVID-19 example to make my point and I…

towardsdatascience.com

PCA clearly explained — How, when, why to use it and feature importance: A guide in Python

In this post I explain what PCA is, when and why to use it and how to implement it in Python using scikit-learn. Also…

towardsdatascience.com

Everything you need to know about Min-Max normalization in Python

In this post I explain what Min-Max scaling is, when to use it and how to implement it in Python using scikit-learn but…

towardsdatascience.com

How Scikit-Learn’s StandardScaler works

In this post I am explaining why and how to apply Standardization using scikit-learn

towardsdatascience.com

Stay tuned & support this effort

If you liked and found this article useful, follow me! Questions? Post them as a comment and I will reply as soon as possible.

References

[1] https://www.nature.com/articles/nbt1206-1565

[1] https://en.wikipedia.org/wiki/Support_vector_machine

[2] https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html

Get in touch with me

LinkedIn: https://www.linkedin.com/in/serafeim-loukas/
ResearchGate: https://www.researchgate.net/profile/Serafeim_Loukas
EPFL profile: https://people.epfl.ch/serafeim.loukas
Stack Overflow: https://stackoverflow.com/users/5025009/seralouk