DL Tutorial 3 — Activation Functions and Their Importance
Learn how activation functions are used and why they are important.

Table of Contents 1. Introduction 2. What are Activation Functions? 3. Types of Activation Functions 4. How to Choose an Activation Function 5. Applications of Activation Functions 6. Conclusion
Subscribe for FREE to get your 42 pages e-book: Data Science | The Comprehensive Handbook
Get step-by-step e-books on Python, ML, DL, and LLMs.
1. Introduction
In this tutorial, you will learn about activation functions and their importance in artificial neural networks. Activation functions are mathematical functions that determine the output of a neuron based on its input. They are essential for neural networks to learn complex patterns and perform nonlinear computations.
You will learn:
- What are activation functions and why they are needed
- What are the different types of activation functions and their properties
- How to choose an activation function for your neural network
- What are some applications of activation functions in various domains
By the end of this tutorial, you will have a better understanding of activation functions and how they affect the performance and behavior of neural networks.
2. What are Activation Functions?
An activation function is a function that takes the input of a neuron and produces an output that is used as the input for the next layer of neurons in a neural network. Activation functions are important because they introduce nonlinearity into the neural network, allowing it to learn complex patterns and perform nonlinear computations.
Without activation functions, neural networks would be equivalent to a single-layer linear model, which can only learn linear relationships between the input and the output. Activation functions enable neural networks to learn nonlinear mappings between the input and the output, which are more suitable for real-world problems that involve complex and dynamic data.
Activation functions also help to regulate the output of the neurons, preventing them from becoming too large or too small, which can cause numerical instability and gradient vanishing or exploding problems. Activation functions can also affect the speed and accuracy of the learning process, as well as the generalization ability of the neural network.
In summary, activation functions are essential for neural networks to:
- Introduce nonlinearity and enable complex learning
- Regulate the output and prevent numerical instability
- Affect the learning and generalization performance
3. Types of Activation Functions
There are many types of activation functions that can be used in neural networks, each with its own advantages and disadvantages. In this section, you will learn about some of the most common and popular activation functions and their properties.
The following are some of the types of activation functions:
- Linear: A linear activation function simply returns the input as the output, without any transformation. For example, f(x) = x. A linear activation function is easy to implement and compute, but it has some major drawbacks. It cannot introduce nonlinearity into the neural network, which limits its learning ability. It also suffers from the problem of gradient saturation, which means that the gradient becomes zero or very small, making the learning process slow or ineffective.
- Sigmoid: A sigmoid activation function is a nonlinear function that maps the input to a value between 0 and 1, using a smooth S-shaped curve. For example, f(x) = 1 / (1 + exp(-x)). A sigmoid activation function is useful for binary classification problems, as it can output a probability value. It also has a clear interpretation as the firing rate of a neuron. However, it also has some drawbacks. It is prone to the problem of gradient vanishing, which means that the gradient becomes very small for large positive or negative inputs, making the learning process slow or ineffective. It is also not zero-centered, which means that the output is always positive, which can cause the gradient updates to be inconsistent.
- Tanh: A tanh activation function is a nonlinear function that maps the input to a value between -1 and 1, using a smooth S-shaped curve. For example, f(x) = (exp(x) — exp(-x)) / (exp(x) + exp(-x)). A tanh activation function is similar to a sigmoid activation function, but it has some advantages. It is zero-centered, which means that the output can be positive or negative, which can make the gradient updates more consistent. It also has a steeper slope than the sigmoid function, which can make the learning process faster. However, it still suffers from the problem of gradient vanishing, which means that the gradient becomes very small for large positive or negative inputs, making the learning process slow or ineffective.
- ReLU: A ReLU activation function is a nonlinear function that returns the input if it is positive, and zero otherwise. For example, f(x) = max(0, x). A ReLU activation function is one of the most widely used activation functions in deep learning, as it has many advantages. It is easy to implement and compute, as it does not involve any complex mathematical operations. It also does not suffer from the problem of gradient vanishing, as the gradient is either 0 or 1, which can make the learning process faster and more effective. It also introduces sparsity into the neural network, as some neurons are deactivated, which can reduce overfitting and improve generalization. However, it also has some drawbacks. It suffers from the problem of gradient saturation, which means that the gradient becomes zero for negative inputs, making the learning process slow or ineffective. It is also not zero-centered, which means that the output is always positive, which can cause the gradient updates to be inconsistent.
- Leaky ReLU: A leaky ReLU activation function is a modified version of the ReLU activation function, that returns a small positive value for negative inputs, instead of zero. For example, f(x) = max(0.01x, x). A leaky ReLU activation function is designed to overcome the problem of gradient saturation, by allowing a small gradient to flow for negative inputs, which can make the learning process faster and more effective. It also retains the advantages of the ReLU activation function, such as easy implementation and computation, and sparsity. However, it still has some drawbacks. It is not zero-centered, which means that the output is always positive, which can cause the gradient updates to be inconsistent. It also introduces a hyperparameter, which is the slope of the negative part, which can affect the performance and require tuning.
- Softmax: A softmax activation function is a nonlinear function that maps the input to a probability distribution over a set of possible outcomes, using an exponential function. For example, f(x) = exp(x) / sum(exp(x)), where the sum is over all the possible outcomes. A softmax activation function is useful for multi-class classification problems, as it can output a probability value for each class. It also has a clear interpretation as the likelihood of a neuron belonging to a class. However, it also has some drawbacks. It is computationally expensive, as it involves an exponential function and a normalization term. It also suffers from the problem of gradient vanishing, as the gradient becomes very small for large positive or negative inputs, making the learning process slow or ineffective.
These are some of the most common and popular activation functions, but there are many more that can be used in neural networks, such as ELU, SELU, Swish, Mish, etc. The choice of activation function depends on the type of problem, the architecture of the neural network, and the performance and behavior of the learning process.
4. How to Choose an Activation Function
Choosing an activation function for your neural network is not a trivial task, as it can affect the performance and behavior of your model. There is no definitive answer or rule for selecting an activation function, as different activation functions may work better or worse for different problems, architectures, and datasets. However, there are some general guidelines and considerations that can help you make an informed decision.
Some of the factors that you should consider when choosing an activation function are:
- The type of problem: Depending on the type of problem that you are trying to solve, you may need a different activation function for the output layer of your neural network. For example, if you are doing a binary classification problem, you may want to use a sigmoid activation function, as it can output a probability value between 0 and 1. If you are doing a multi-class classification problem, you may want to use a softmax activation function, as it can output a probability distribution over a set of possible classes. If you are doing a regression problem, you may want to use a linear activation function, as it can output any real value.
- The architecture of the neural network: Depending on the architecture of your neural network, such as the number of layers, the number of neurons, the type of connections, etc., you may need a different activation function for the hidden layers of your neural network. For example, if you are using a deep neural network, you may want to use a ReLU activation function, as it can prevent the problem of gradient vanishing and make the learning process faster and more effective. If you are using a recurrent neural network, you may want to use a tanh activation function, as it can preserve the information over long sequences and avoid the problem of gradient exploding.
- The properties of the activation function: Depending on the properties of the activation function, such as the range, the slope, the smoothness, the sparsity, etc., you may need a different activation function for the hidden layers of your neural network. For example, if you want to introduce nonlinearity into your neural network, you may want to use a nonlinear activation function, such as sigmoid, tanh, ReLU, etc. If you want to regulate the output of the neurons, you may want to use a bounded activation function, such as sigmoid, tanh, etc. If you want to make the learning process faster, you may want to use a steep activation function, such as tanh, ReLU, etc. If you want to reduce overfitting and improve generalization, you may want to use a sparse activation function, such as ReLU, leaky ReLU, etc.
These are some of the factors that you should consider when choosing an activation function, but they are not exhaustive or definitive. You may also need to experiment with different activation functions and compare their results, as different activation functions may have different effects on different models and datasets. You may also need to tune the hyperparameters of the activation function, such as the slope of the leaky ReLU, to optimize the performance and behavior of your model.
5. Applications of Activation Functions
Activation functions are widely used in various domains and applications, as they enable neural networks to learn complex patterns and perform nonlinear computations. In this section, you will see some examples of how activation functions are used in different fields and tasks.
Some of the applications of activation functions are:
- Computer vision: Computer vision is the field of study that deals with the processing and understanding of visual information, such as images and videos. Activation functions are used in computer vision to enable neural networks to learn features and patterns from the visual data, such as edges, shapes, colors, textures, etc. For example, convolutional neural networks (CNNs) are a type of neural network that use activation functions to apply filters and pooling operations to the input images, resulting in feature maps that capture the salient information from the images. Activation functions such as ReLU, leaky ReLU, ELU, etc. are commonly used in CNNs, as they can prevent the problem of gradient vanishing and make the learning process faster and more effective. Activation functions such as softmax are also used in CNNs for the output layer, as they can output a probability distribution over a set of possible classes, such as object recognition, face detection, scene segmentation, etc.
- Natural language processing: Natural language processing (NLP) is the field of study that deals with the processing and understanding of natural language, such as text and speech. Activation functions are used in NLP to enable neural networks to learn features and patterns from the language data, such as words, sentences, syntax, semantics, etc. For example, recurrent neural networks (RNNs) are a type of neural network that use activation functions to process sequential data, such as text and speech, by maintaining a hidden state that captures the information from the previous inputs. Activation functions such as tanh, sigmoid, etc. are commonly used in RNNs, as they can preserve the information over long sequences and avoid the problem of gradient exploding. Activation functions such as softmax are also used in RNNs for the output layer, as they can output a probability distribution over a set of possible outcomes, such as language modeling, machine translation, text summarization, speech recognition, etc.
- Reinforcement learning: Reinforcement learning (RL) is the field of study that deals with the learning and optimization of an agent’s behavior based on the feedback from the environment. Activation functions are used in RL to enable neural networks to learn policies and value functions that guide the agent’s actions and decisions. For example, deep Q-networks (DQNs) are a type of neural network that use activation functions to approximate the Q-function, which is the expected future reward for taking an action in a given state. Activation functions such as ReLU, leaky ReLU, etc. are commonly used in DQNs, as they can prevent the problem of gradient vanishing and make the learning process faster and more effective. Activation functions such as linear, softmax, etc. are also used in DQNs for the output layer, as they can output a value or a probability distribution over a set of possible actions, such as playing Atari games, controlling robots, navigating mazes, etc.
These are some of the examples of how activation functions are used in various domains and applications, but there are many more that can be explored and discovered. Activation functions are an essential component of neural networks, as they enable them to learn complex patterns and perform nonlinear computations.
6. Conclusion
In this tutorial, you have learned about activation functions and their importance in artificial neural networks. You have learned:
- What are activation functions and why they are needed
- What are the different types of activation functions and their properties
- How to choose an activation function for your neural network
- What are some applications of activation functions in various domains
Activation functions are an essential component of neural networks, as they enable them to learn complex patterns and perform nonlinear computations. Activation functions can affect the performance and behavior of your model, so you should choose them carefully and experiment with different options. You should also be aware of the advantages and disadvantages of each activation function, and how they relate to the type of problem, the architecture of the neural network, and the properties of the activation function.
We hope that this tutorial has been helpful and informative for you. If you have any questions or feedback, please feel free to contact us. Thank you for reading and happy learning!
Subscribe for FREE to get your 42 pages e-book: Data Science | The Comprehensive Handbook
Get step-by-step e-books on Python, ML, DL, and LLMs.
PlainEnglish.io 🚀
Thank you for being a part of the In Plain English community! Before you go:
- Be sure to clap and follow the writer️
- Learn how you can also write for In Plain English️
- Follow us: X | LinkedIn | YouTube | Discord | Newsletter
- Visit our other platforms: Stackademic | CoFeed | Venture






