Free AI web copilot to create summaries, insights and extended knowledge, download it at here

2787

Abstract

an class="hljs-keyword">import</span> seaborn <span class="hljs-keyword">as</span> sns <span class="hljs-keyword">import</span> matplotlib.pyplot <span class="hljs-keyword">as</span> plt</pre></div><p id="fc74">Now we will load and check the database:</p><div id="13e5"><pre><span class="hljs-comment">#Load and read data frame:</span> df = pd.read_csv(<span class="hljs-string">'wine.csv'</span>) df</pre></div><figure id="a1df"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*xchS0rPE8SRWDB1t5MHaCQ.png"><figcaption></figcaption></figure><p id="c12c">We have 1599 observations and 12 features. We will check if all our features are numeric:</p><div id="dec0"><pre><span class="hljs-comment">#Check all features are numeric:</span> df.dtypes</pre></div><figure id="f6ba"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*nqsurjCKmZ_eiT0_fzvSmg.png"><figcaption></figcaption></figure><p id="72e2">Now we will define and transform our dataset using the Standard Scaler function:</p><div id="35e6"><pre><span class="hljs-comment">#Define X as numpy array:</span> X = np.array(df)

<span class="hljs-keyword">from</span> sklearn.preprocessing <span class="hljs-keyword">import</span> StandardScaler

<span class="hljs-comment">#Transform X:</span> scaler = StandardScaler() X = scaler.fit_transform(X)</pre></div><p id="2542">It is time to build our model, so let’s import the necessary packages:</p><div id="bc9d"><pre><span class="hljs-keyword">from</span> sklearn.cluster <span class="hljs-keyword">import</span> AffinityPropagation <span class="hljs-keyword">from</span> sklearn <span class="hljs-keyword">import</span> metrics</pre></div><p id="3e14">And now we can build the model and fit our data. Take close attention to the <b><i>preference</i></b> value:</p><div id="6c4c"><pre><span class="hljs-comment">#Fit the model:</span> af = AffinityPropagation(preference=-<span class="hljs-number">1800</span>, random_state=<span class="hljs-number">0</span>).fit(X) cluster_centers_indices = af.cluster_centers_indices_ labels = af.labels_ n_clusters_ = <span class="hljs-built_in">len</span>(cluster_centers_indices)

<span class="hljs-comment">#Print results:</span> <span class="hljs-built_in">print</span>(labels)

<span class="hljs-comment">#Print number of clusters:</span> <span class="hljs-built_in">print</span>(n_cluster_)</pre></div><p id="1f29">You can run this last part of the code several times while trying with different <b><i>preference</i></b> values. The last step is to build a graphical representation. The image you can see at the beginning of this article was obtained using this exact code while experimenting with different <b><i>preference</i></b> values.</p><div id="e259"><pre><span class="hljs-keyword">import</span> matpl

Options

otlib.pyplot <span class="hljs-keyword">as</span> plt <span class="hljs-keyword">from</span> itertools <span class="hljs-keyword">import</span> cycle

plt.close(<span class="hljs-string">"all"</span>) plt.figure(<span class="hljs-number">1</span>) plt.clf()

colors = cycle(<span class="hljs-string">"bgrcmykbgrcmykbgrcmykbgrcmyk"</span>) <span class="hljs-keyword">for</span> k, col <span class="hljs-keyword">in</span> <span class="hljs-built_in">zip</span>(<span class="hljs-built_in">range</span>(n_clusters_), colors): class_members = labels == k cluster_center = X[cluster_centers_indices[k]] plt.plot(X[class_members, <span class="hljs-number">0</span>], X[class_members, <span class="hljs-number">1</span>], col + <span class="hljs-string">"."</span>) plt.plot( cluster_center[<span class="hljs-number">0</span>], cluster_center[<span class="hljs-number">1</span>], <span class="hljs-string">"o"</span>, markerfacecolor=col, markeredgecolor=<span class="hljs-string">"k"</span>, markersize=<span class="hljs-number">14</span>, ) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> X[class_members]: plt.plot([cluster_center[<span class="hljs-number">0</span>], x[<span class="hljs-number">0</span>]], [cluster_center[<span class="hljs-number">1</span>], x[<span class="hljs-number">1</span>]], col)

plt.title(<span class="hljs-string">"Estimated number of clusters: %d"</span> % n_clusters_) plt.show()</pre></div><p id="e687">Thank you for reading! Don’t forget to subscribe to receive notifications about my future publications.</p><p id="4eb3"><b>If:</b> you liked this article, don’t forget to follow me and thus receive all updates about new publications.</p><p id="0078"><b>Else If:</b> you want to read more on the topic, you can buy my book “<a href="https://www.amazon.com/dp/B0C7J9GD7J"><b><i>Data-Driven Decisions: A Practical Introduction to Machine Learning</i></b></a>” which will give you all the information you need to start with Machine Learning. It will cost you only a coffee, and give me a small tip!</p><p id="5876"><b>Else:</b> Thank you!</p><div id="b5b8" class="link-block"> <a href="https://readmedium.com/mlearning-ai-submission-suggestions-b51e2b130bfb"> <div> <div> <h2>Mlearning.ai Submission Suggestions</h2> <div><h3>How to become a writer on Mlearning.ai</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*6xCb1sNpjadaSBuVLPTFQQ.png)"></div> </div> </div> </a> </div></article></body>

Understanding Affinity Propagation Clustering: Hands-On with SciKit-Learn

Unsupervised Learning — Clustering

Affinity Propagation is a relatively recent model, first published in 2007 by Brendan Frey and Delbert Dueck. The model is a little complex in terms of resource consumption, as it requires our machines to perform several operations, but I will try to explain it to you in simple and plain English.

The Affinity Propagation algorithm starts by considering data points that can be exemplars to form clusters. All data points are candidates to be ‘exemplar datapoints’.
All ‘exemplar datapoints’ are compared to other data points, called ‘target datapoints’, to find how similar they are.
The ‘target datapoints’ return to ‘exemplar datapoints’ if they are still available to associate. Otherwise, the ‘target datapoints’ may have already been associated with other ‘exemplar datapoints’ with whom they have higher affinity.
The ‘exemplar datapoints’ respond to ‘target datapoints’ with an updated similarity.
This dance continues until all data points are integrated into a cluster.

The Affinity Propagation algorithm does not require the researcher to specify the number of clusters, which means this algorithm is optimal for problems where we don’t know the ideal number of clusters. However, there is a way to control the number of clusters.

The PREFERENCE value:

The preference value should be used to control the number of clusters but does not correspond exactly to the number of clusters. A lower preference value (usually a negative number) will return fewer classes, while higher preference values will return more clusters in the model.

How do we control the number of clusters in the model?

We need to run the model several times with different preference values until we find a model with the desired number of clusters.

Let’s see a practical example:

We will use the wine dataset that can be downloaded from Kaggle. All code presented here was run on Google Colab.

#Import libraries:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

Now we will load and check the database:

#Load and read data frame:
df = pd.read_csv('wine.csv')
df

We have 1599 observations and 12 features. We will check if all our features are numeric:

#Check all features are numeric:
df.dtypes

Now we will define and transform our dataset using the Standard Scaler function:

#Define X as numpy array:
X = np.array(df)

from sklearn.preprocessing import StandardScaler

#Transform X:
scaler = StandardScaler()
X = scaler.fit_transform(X)

It is time to build our model, so let’s import the necessary packages:

from sklearn.cluster import AffinityPropagation
from sklearn import metrics

And now we can build the model and fit our data. Take close attention to the preference value:

#Fit the model:
af = AffinityPropagation(preference=-1800, random_state=0).fit(X)
cluster_centers_indices = af.cluster_centers_indices_
labels = af.labels_
n_clusters_ = len(cluster_centers_indices)

#Print results:
print(labels)

#Print number of clusters:
print(n_cluster_)

You can run this last part of the code several times while trying with different preference values. The last step is to build a graphical representation. The image you can see at the beginning of this article was obtained using this exact code while experimenting with different preference values.

import matplotlib.pyplot as plt
from itertools import cycle

plt.close("all")
plt.figure(1)
plt.clf()

colors = cycle("bgrcmykbgrcmykbgrcmykbgrcmyk")
for k, col in zip(range(n_clusters_), colors):
    class_members = labels == k
    cluster_center = X[cluster_centers_indices[k]]
    plt.plot(X[class_members, 0], X[class_members, 1], col + ".")
    plt.plot(
        cluster_center[0],
        cluster_center[1],
        "o",
        markerfacecolor=col,
        markeredgecolor="k",
        markersize=14,
    )
    for x in X[class_members]:
        plt.plot([cluster_center[0], x[0]], [cluster_center[1], x[1]], col)

plt.title("Estimated number of clusters: %d" % n_clusters_)
plt.show()

Thank you for reading! Don’t forget to subscribe to receive notifications about my future publications.

If: you liked this article, don’t forget to follow me and thus receive all updates about new publications.

Else If: you want to read more on the topic, you can buy my book “Data-Driven Decisions: A Practical Introduction to Machine Learning” which will give you all the information you need to start with Machine Learning. It will cost you only a coffee, and give me a small tip!

Else: Thank you!

Mlearning.ai Submission Suggestions

How to become a writer on Mlearning.ai

medium.com