Free AI web copilot to create summaries, insights and extended knowledge, download it at here

3033

Abstract

<div id="f6c3" class="link-block">

      <a href="https://code.visualstudio.com/download">

        <div>
          <div>
            <h2>Download Visual Studio Code - Mac, Linux, Windows</h2>
            <div><h3>Visual Studio Code is free and available on your favorite platform - Linux, macOS, and Windows. Download Visual Studio…</h3></div>
            <div><p>code.visualstudio.com</p></div>
          </div>
          <div>
            <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*mqoSwSyVTD0GaKF5)"></div>
          </div>
        </div>
      </a>
    </div><p id="8811"><b>Step 2: Flash your Pi Pico or Pico W with the latest MicroPython firmware</b></p><p id="4f4e">Head to the official download page of MicroPython’s UF2 file:</p><div id="4994" class="link-block">
      <a href="https://micropython.org/download/RPI_PICO/">
        <div>
          <div>
            <h2>MicroPython - Python for microcontrollers</h2>
            <div><h3>MicroPython is a lean and efficient implementation of the Python 3 programming language that includes a small subset of…</h3></div>
            <div><p>micropython.org</p></div>
          </div>
          <div>
            <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/)"></div>
          </div>
        </div>
      </a>
    </div><p id="a896">Then download the latest version you see on the page.</p><p id="b851">The specific instructions on how to use this file to flash MicroPython on your Pi Pico or Pico W are found on the official Raspberry Pi page:</p><div id="7797" class="link-block">
      <a href="https://www.raspberrypi.com/documentation/microcontrollers/micropython.html">
        <div>
          <div>
            <h2>Raspberry Pi Documentation - MicroPython</h2>
            <div><h3>The official documentation for Raspberry Pi computers and microcontrollers</h3></div>
            <div><p>www.raspberrypi.com</p></div>
          </div>
          <div>
            <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*4ekTw6kncOzJUnkj)"></div>
          </div>
        </div>
      </a>
    </div><p id="7add"><b>Step 3: Install the MicroPico extension in VS Code</b></p><p id="6adf">Now open VS Code and switch to the ‘Extensions’ tab like so:</p><figure id="74be"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*Vd06nbtJTaW_rHWMuXAHFQ.png"><figcaption></figcaption></figure><p id="374d">Now type “Micropico” in the search field. You should then see the following extension:</p><figure id="eb95"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*RQvJfKROFaqn-T0yOqDrpA.png"><figcaption></figcaption></figure><p id="a65a">Go ahead and install this extension.</p><p id="d5bd"><b>Step 4: Start a project by configuring your directory</b></p><p id="1cd7">Set up your desired folde

Options

r/workspace for your project, and open it in VS Code. Once it is opened, open up the Command Palette in VS Code like so:For Windows:Ctrl + Shift + PFor Mac:Cmd + Shift + PBy doing so, we would see an option as “Configure project” in a drop-down list like so:<figure id="6853"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*hEsmbSGjO4W6qXAaV_iBkg.png"><figcaption></figcaption></figure>Click on “Configure project”Step 6: Test by uploading example projectLet’s make sure we have set everything up correctly by running a test project. We will blink the onboard LED on the Pi Pico / Pico W.Make a new file and name it “main.py”. We will write code onto this file and upload it.If you have the Pi Pico, copy and paste the following code:<div id="dbb5"><pre>from machine import Pin import utime

led_onboard = Pin(25, Pin.OUT)

while True:

led_onboard.toggle() utime.sleep(1)</pre></div>If you instead have the Pico W, copy and paste the following code:<div id="e3cc"><pre>from machine import Pin import utime

led_onboard = Pin("LED", Pin.OUT)

while True:

led_onboard.toggle() utime.sleep(1)</pre></div>Now open up the command palette again (Ctrl/Cmd + Shift + P) and select “Upload project to Pico”. Then click the “Run” button on the bottom of VS Code like so:<figure id="197a"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*PWqElxT-TlDM4tGSQhQ4Iw.png"><figcaption></figcaption></figure>And that’s it! You will now see the onboard LED blink with a 1 second interval on your Raspberry Pi Pico or Pico W.Once you hit “Run”, a terminal in VS Code will pop up and act as a space to print outputs or take inputs, just like how you see one in Thonny IDE.Additionally, you have the option to view the pinouts of the Pi Pico with the “Show Pico Pin Map” command. After clicking on it and selecting either “Pico (H)” or “Pico W (H)”, you would see the following (Pico (H)):<figure id="513c"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*WYfHtQSKpO8IaKpArBtPuA.png"><figcaption></figcaption></figure>If you follow all these steps correctly, there should not be any errors or problems. In case you face issues, please comment down below and I will help you out as soon as I can.</article></body>

Data Science with Python — Cluster Analysis

This article is part of the “Datascience with Python” series. You can find the other stories of this series below:

Data Science with Python

Aka the best programming language for data scientists

medium.com

Cluster analysis is an important part of data science. It consists of grouping similar data points together based on their characteristics.

Today, we’ll explore how to perform cluster analysis in Python.

What is Cluster Analysis?

Cluster analysis is an unsupervised learning method used in data science to group similar data points together based on their characteristics. The main purpose of cluster analysis is to partition a dataset into subsets, or clusters, such that data points within each cluster share common traits and are dissimilar from those in other clusters.

Cluster analysis has many applications in various fields such as market segmentation, customer profiling, image processing, and biological data analysis. In marketing, cluster analysis is used to segment customers into groups with similar demographics, behavior, and preferences. In biology, cluster analysis is used to group genes or proteins based on their function or expression patterns.

There are several types of clustering techniques, including hierarchical clustering, partition-based clustering, density-based clustering, and model-based clustering. Hierarchical clustering builds a tree-like structure of clusters, while partition-based clustering assigns each data point to a specific cluster. Density-based clustering identifies dense regions of data points, while model-based clustering assumes that the data points are generated from a mixture of underlying probability distributions.

Clustering Alghorithms

There are various clustering algorithms that can be used to perform cluster analysis. The choice of the algorithm will depend mostly on the dataset.

K-Means Clustering: K-means is a popular partition-based clustering algorithm that aims to partition a dataset into K clusters. The algorithm works by randomly selecting K data points as centroids, then assigning each data point to the closest centroid based on a distance metric such as Euclidean distance. The centroids are then updated by computing the mean of all data points assigned to each cluster. The process of assigning data points and updating centroids is repeated until the centroids no longer move or a maximum number of iterations is reached.
Hierarchical Clustering: Hierarchical clustering is a tree-like clustering algorithm that recursively divides a dataset into subsets of smaller and smaller clusters. There are two main types of hierarchical clustering: agglomerative and divisive. Agglomerative clustering starts by treating each data point as a separate cluster and then iteratively merges the closest pairs of clusters until all data points belong to a single cluster. Divisive clustering starts with all data points in a single cluster and then recursively divides the clusters into smaller subsets until each cluster contains only one data point.
DBSCAN Clustering: DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a density-based clustering algorithm that identifies dense regions of data points separated by areas of lower density. The algorithm works by defining a neighborhood around each data point based on a distance metric and a user-defined radius parameter. Data points that are within a neighborhood of a minimum number of data points are considered as core points and are used to form a cluster. Data points that are not within a neighborhood of a core point are considered as noise.
Gaussian Mixture Models (GMM): GMM is a model-based clustering algorithm that assumes that the data points are generated from a mixture of Gaussian distributions. The algorithm estimates the parameters of the Gaussian distributions by maximizing the likelihood of the data points. Each data point is then assigned to the Gaussian distribution that has the highest probability of generating that data point.

Building Clustering Models in Python

Python provides several libraries that can be used to build clustering models. We’ll use scikit-learn, as it’s one of the most famous and one of the easiest to use.

To perform clustering using scikit-learn, we first need to import the necessary modules:

from sklearn.cluster import KMeans, AgglomerativeClustering, DBSCAN
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

We can then load the data into a Pandas DataFrame and perform data preparation, including data cleaning, scaling, and feature selection. Once the data is prepared, we can create an instance of the clustering algorithm and fit it to the data. For example, using KMeans:

kmeans = KMeans(n_clusters=3)
kmeans.fit(data)

We can then use the trained model to predict the cluster labels for new data points:

labels = kmeans.predict(new_data)

Example

Let’s walk through a simple example to illustrate the process of clustering using scikit-learn. I will make a more detailed article about a real use case later, for now, I write a little example so that you can try what you learned.

Suppose we have a dataset with two features, height and weight, and we want to cluster the data into three groups based on these features. We can start by loading the data into a Pandas DataFrame:

import pandas as pd

data = pd.DataFrame({
    'height': [170, 168, 180, 175, 174, 172, 169, 177, 181, 178],
    'weight': [70, 65, 80, 73, 72, 68, 66, 76, 82, 79]
})

We can then perform data preparation, including scaling the data to a common range using StandardScaler:

scaler = StandardScaler()
scaled_data = scaler.fit_transform(data)

We can then create an instance of the KMeans clustering algorithm and fit it to the data:

kmeans = KMeans(n_clusters=3)
kmeans.fit(scaled_data)

Finally, we can use the trained model to predict the cluster labels for new data points:

new_data = pd.DataFrame({
    'height': [173, 179, 171],
    'weight': [71, 81, 67]
})

scaled_new_data = scaler.transform(new_data)

labels = kmeans.predict(scaled_new_data)

print(labels)  # Output: [0 2 0]

Fine-Tuning

Once we have built a clustering model using scikit-learn, we may want to fine-tune the model to improve its performance. Here are some techniques we can use for fine-tuning our clustering models:

Choosing the optimal number of clusters: The number of clusters is a key hyperparameter in clustering algorithms. In K-means clustering, we can use the elbow method or the silhouette score to determine the optimal number of clusters. In hierarchical clustering, we can use the dendrogram to determine the optimal number of clusters.
Feature selection: In some cases, not all features are relevant for clustering. We can use feature selection techniques to identify the most important features and remove irrelevant or redundant features.
Dimensionality reduction: High-dimensional data can be difficult to cluster, so we can use dimensionality reduction techniques such as PCA or t-SNE to reduce the number of features.
Algorithm selection: Different clustering algorithms have different strengths and weaknesses, so we may want to try different algorithms to find the one that works best for our data.
Hyperparameter tuning: Clustering algorithms have several hyperparameters that can be tuned to improve performance, such as the distance metric, linkage method, and DBSCAN epsilon.

I already talked about most of these techniques, so be sure to check the other stories of this series if you want to know more about these techniques.

Final Note

Now you know how to solve clustering problems in Python.

In a next article, we will see a concrete use case. Don’t hesitate to follow me if you don’t want to miss it!

To explore the other stories of this series, click below!

Data Science with Python

Aka the best programming language for data scientists

medium.com

To explore more of my Python stories, click here! You can also access all my content by checking this page.

If you want to be notified every time I publish a new story, subscribe to me via email by clicking here!

If you’re not subscribed to medium yet and wish to support me or get access to all my stories, you can use my link:

Join Medium with my referral link — Esteban Thilliez

Read every story from Esteban Thilliez (and thousands of other writers on Medium). Your membership fee directly…

medium.com