Understanding and Implementing Self-Organizing Maps (SOM) with Python
Introduction
In the rapidly evolving world of machine learning, Self-Organizing Maps (SOMs) stand out as a fascinating and powerful tool for clustering and visualizing high-dimensional data. This blog post dives into the science and mathematics behind SOMs and provides a comprehensive Python implementation using the MiniSom library.
What is a Self-Organizing Map?
A Self-Organizing Map (SOM) is a type of unsupervised neural network introduced by Teuvo Kohonen in the 1980s. Unlike traditional neural networks that are typically used for classification or regression, SOMs are used for clustering and visualizing high-dimensional data by mapping it onto a lower-dimensional (usually 2D) grid. The primary goal of SOMs is to preserve the topological properties of the input space, meaning that similar data points in the high-dimensional space remain close to each other in the lower-dimensional map.
Key Characteristics of SOMs
- Unsupervised Learning: SOMs learn patterns in the data without needing labeled examples.
- Topology Preservation: The spatial arrangement of neurons in the SOM corresponds to the similarities in the input data.
- Dimensionality Reduction: SOMs reduce the dimensionality of the data while preserving its structure, making it easier to visualize and interpret.
The Mathematics Behind SOMs
The SOM algorithm can be summarized in the following steps:
Initialization:
- The SOM consists of a grid of neurons, each associated with a weight vector of the same dimension as the input data.
- Initialize the weight vectors randomly.
Sampling:
- Randomly select an input vector from the training data.
Best Matching Unit (BMU):
- Find the neuron whose weight vector is closest to the input vector. This neuron is called the Best Matching Unit (BMU).
- The distance metric commonly used is the Euclidean distance:
BMU=argimin∥x−wi∥
- where x is the input vector and wi is the weight vector of the i-th neuron.
Update:
- Adjust the weight vectors of the BMU and its neighbouring neurons to make them more similar to the input vector. The update rule is:
wi(t+1)=wi(t)+θ(t,i,BMU)α(t)(x(t)−wi(t))
- where α(t) is the learning rate, and θ(t,i,BMU) is the neighbourhood function that decreases with time and distance from the BMU.
Iteration:
- Repeat steps 2–4 for a large number of iterations or until convergence.
Implementing SOM in Python
To demonstrate SOMs in action, we’ll use the Iris dataset, a classic dataset in machine learning, and the MiniSom library, a lightweight implementation of SOM in Python.
Step 1: Install MiniSom
First, install the MiniSom library if you haven't already:
pip install minisom
Step 2: Load and Preprocess the Data
We’ll use the Iris dataset and normalize it for better performance.
import numpy as np
from minisom import MiniSom
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.preprocessing import MinMaxScaler
# Load the Iris dataset
data = load_iris()
X = data.data
y = data.target
# Normalize the data
scaler = MinMaxScaler()
X = scaler.fit_transform(X)Step 3: Initialize and Train the SOM
We’ll initialize a SOM with a 7x7 grid and train it on the Iris dataset.
# Initialize and train the SOM
som_size = (7, 7) # 7x7 grid
som = MiniSom(x=som_size[0], y=som_size[1], input_len=X.shape[1], sigma=1.0, learning_rate=0.5)
som.random_weights_init(X)
som.train_random(X, num_iteration=1000)Step 4: Visualize the Results
We’ll plot the results to visualize how the SOM has clustered the Iris data.
Clustering Visualization
# Plotting the results
plt.figure(figsize=(10, 10))
for i, x in enumerate(X):
w = som.winner(x) # getting the winner
plt.text(w[0] + 0.5, w[1] + 0.5, str(y[i]), color=plt.cm.tab10(y[i] / 10.), fontdict={'weight': 'bold', 'size': 11})
plt.axis([0, som_size[0], 0, som_size[1]])
plt.title('Self-Organizing Map of the Iris dataset')
plt.show()U-Matrix Visualization
The U-Matrix (Unified Distance Matrix) helps visualize the distances between the neurons, indicating cluster boundaries.
# Plot the distance map (U-matrix)
plt.figure(figsize=(10, 10))
plt.pcolor(som.distance_map().T, cmap='bone_r') # plotting the distance map
plt.colorbar()
plt.title('U-Matrix')
plt.show()Conclusion
Self-Organizing Maps are a powerful tool for clustering and visualizing high-dimensional data. They provide an intuitive way to understand the structure of complex datasets by mapping them to a lower-dimensional space while preserving the topological relationships.
In this post, we’ve covered the theoretical foundations of SOMs, their mathematical formulation, and a practical implementation using the MiniSom library in Python. By applying these techniques, you can uncover hidden patterns in your data and gain deeper insights into its underlying structure.
Feel free to experiment with different datasets and SOM configurations to explore the full potential of this fascinating neural network architecture!







