Essential Math for Machine Learning: Sigmoid
The Binary Classifier

This article is part of the series Essential Math for Machine Learning.
Introduction
The world of machine learning is filled with intriguing mathematical tools, each playing a distinct role in the learning process. Among these tools, the sigmoid function stands out as a classic, holding a special place in the history of neural networks. But what exactly is the sigmoid, and why does it matter? Buckle up, data enthusiasts, as we embark on a journey to demystify this S-shaped hero!
What is the Sigmoid?
Imagine a function that takes any number as input and squishes it between 0 and 1. That’s essentially the sigmoid, also known as the logistic function. Think of it as a gatekeeper, deciding how likely it is for a given input to belong to a specific category. Mathematically, it’s expressed as:
f(x) = 1 / (1 + e^(-x))Where x is the input, and e is the base of the natural logarithm (approximately 2.718).
Properties of the Sigmoid
- Bounded Output: As mentioned, the sigmoid’s output always falls between 0 and 1, making it well-suited for representing probabilities (0 for unlikely, 1 for highly likely).
- Smooth Gradient: The function’s smooth curve allows for efficient learning in neural networks through techniques like backpropagation.
- Interpretability: Since the output resembles probabilities, it’s easier to understand the network’s predictions.
When to Use the Sigmoid?
- Binary Classification: When your problem involves predicting only two possible outcomes (e.g., spam vs. not spam), the sigmoid’s probabilistic output shines.
- Output Layer of Neural Networks: In the early days of neural networks, the sigmoid was the go-to choice for the output layer, especially for classification tasks.
Caveats and Alternatives
- Vanishing Gradients: For deep neural networks with many layers, the sigmoid can suffer from vanishing gradients, making learning slow or ineffective.
- Limited Expressive Power: For more complex problems with multiple categories, the sigmoid’s binary-like nature might not be enough. Alternatives like ReLU or softmax are often preferred.
Python Implementation
Sigmoid:
import numpy as np
def sigmoid(x):
return 1 / (1 + np.exp(-x))
# Example usage
output = sigmoid(3) # Output will be around 0.95Derivative of Sigmoid:
import sympy as sp
def sigmoid_derivative(x):
return sp.diff(1 / (1 + sp.exp(-x)), x)
# Example usage
result = sigmoid_derivative(sp.Symbol('x'))
print(result) # Output: exp(-x) / (1 + exp(-x))**2Remember: While the sigmoid has been surpassed by other activation functions in certain areas, it remains a valuable tool in the machine learning toolbox, especially for understanding the fundamentals of neural networks and binary classification. So, the next time you encounter an S-shaped curve in your machine learning journey, remember the power of the sigmoid!
Bonus: Explore the tanh function, another S-shaped activation function with a slightly different output range (-1 to 1). Feel free to experiment and compare their behaviors!






