avatarRichmond Alake

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

4819

Abstract

rios</h1><p id="50c7">In practical scenarios, or when testing the performance of the trained neural network that utilized dropout on unseen data, certain items are considered.</p><p id="2b4d">The first being the fact that <b>dropout technique is actually not implemented on every single layer within a neural network</b>; it’s commonly leveraged within the neurons in the last few layers within the network.</p><p id="f8c4">In the experiments conducted in the published <a href="https://arxiv.org/pdf/1207.0580.pdf">paper</a>, it was reported that when testing on the <a href="https://en.wikipedia.org/wiki/CIFAR-10">CIFAR-10 dataset</a>, there was an error rate of 15.6% when dropout was utilized in the last hidden layer. This was an improvement from the error rate of 16.6% that was reported when the same dataset was tested on the same convolutional neural network but with no dropout technique included in any of the layers.</p><figure id="cc4a"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*f8u4llviQH1G71HZgNOoZw.png"><figcaption><a href="https://arxiv.org/pdf/1207.0580.pdf">Comparison of error rates between models with dropout and models without dropout on the TIMIT benchmark</a></figcaption></figure><p id="f8c3">The second item is that within practical scenarios dropout isn’t utilized when evaluating a trained neural network. As a result of dropout not used during the evaluation or testing phase, the full potential of the neural network is realized. This means that all neurons within the network are active, and each neuron has more input connections than it had been trained with.</p><p id="a171">Therefore it’s expected to divide the weights of the neurons by one minus the dropout hyperparameter value(<i>dropout rate that’s used during training</i>). So if the dropout rate was 0.5 during training, then in test time the results of the weights from each neuron is halved.</p><h1 id="9d6d">Implementing Dropout Technique</h1><p id="fd70">Using TensorFlow and Keras, we are equipped with the tools to implement a neural network that utilizes the dropout technique by including dropout layers within the neural network architecture.</p><p id="38da">We only need to add one line to include a dropout layer within a more extensive neural network architecture. The Dropout class takes a few arguments, but for now, we are only concerned with the ‘rate’ argument. The dropout rate is a hyperparameter that represents the likelihood of a neuron activation been set to zero during a training step. The rate argument can take values between 0 and 1.</p><div id="0ae2"><pre><span class="hljs-attribute">keras</span>.layers.Dropout(rate=<span class="hljs-number">0</span>.<span class="hljs-number">2</span>)</pre></div><p id="76ef">From this point onwards, we will go through small steps taken to implement, train and evaluate a neural network.</p><ol><li>Load tools and libraries utilized, <a href="https://keras.io/">Keras</a> and <a href="https://www.tensorflow.org/">TensorFlow</a></li></ol><div id="e3e1"><pre><span class="hljs-keyword">import</span> tensorflow <span class="hljs-keyword">as</span> tf <span class="hljs-title">from</span> tensorflow <span class="hljs-keyword">import</span> keras</pre></div><p id="ca03">2. Load the FashionMNIST dataset, normalize images and partition dataset into test, training and validation data.</p><div id="dd50"><pre>(train_images, train_labels),(test_images, test_labels) = keras<span class="hljs-selector-class">.datasets</span><span class="hljs-selector-class">.fashion_mnist</span><span class="hljs-selector-class">.load_data</span>() train_images = train_images / <span class="hljs-number">255.0</span> test_images = test_images / <span class="hljs-number">255.0</span> validation_images = train_images<span class="hljs-selector-attr">[:5000]</span> validation_labels = train_labels<span class="hljs-selector-attr">[:5000]</span></pre></div><p id="4603">3. Create a custom model that includes a dropout layer using the Keras Model Class API.</p><div id="b325"><pre><span class="hljs-keyword">class</span> <span class="hljs-title class_">CustomModel</span>(keras.Model): <span class="hljs-keyword">def</span> <span class="hljs-title function_">init</span>(<span class="hljs-params">self, **kwargs</span>): <span class="hljs-built_in">super</span>().init(**kwargs) self.input_layer = keras.layers.Flatten(input_shape=(<span class="hljs-number">28</span>,<span class="hljs-number">28</span>)) self.hidden1 = keras.layers.Dense(<span class="hljs-number">200</span>, activation=<span class="hljs-string">'relu'</span>) self.hidden2 = keras.layers.Dense(<span class="hljs-number">100</span>, activation=<span class="hljs-string">'relu'</span>) self.hidden3 = keras.layers.Dense(<span class="

Options

hljs-number">60</span>, activation=<span class="hljs-string">'relu'</span>) self.output_layer = keras.layers.Dense(<span class="hljs-number">10</span>, activation=<span class="hljs-string">'softmax'</span>) self.dropout_layer = keras.layers.Dropout(rate=<span class="hljs-number">0.2</span>)

<span class="hljs-keyword">def</span> <span class="hljs-title function_">call</span>(<span class="hljs-params">self, <span class="hljs-built_in">input</span>, training=<span class="hljs-literal">None</span></span>):
    input_layer = self.input_layer(<span class="hljs-built_in">input</span>)
    input_layer = self.dropout_layer(input_layer)
    hidden1 = self.hidden1(input_layer)
    hidden1 = self.dropout_layer(hidden1, training=training)
    hidden2 = self.hidden2(hidden1)
    hidden2 = self.dropout_layer(hidden2, training=training)
    hidden3 = self.hidden3(hidden2)
    hidden3 = self.dropout_layer(hidden3, training=training)
    output_layer = self.output_layer(hidden3)
    <span class="hljs-keyword">return</span> output_layer</pre></div><p id="381a">4. Load the implemented model and initialize both optimizers and hyperparameters.</p><div id="5c01"><pre>model = CustomModel()

sgd = keras.optimizers.SGD(<span class="hljs-attribute">lr</span>=0.01) model.compile(<span class="hljs-attribute">loss</span>=<span class="hljs-string">"sparse_categorical_crossentropy"</span>, <span class="hljs-attribute">optimizer</span>=sgd, metrics=[<span class="hljs-string">"accuracy"</span>])</pre></div><p id="d714">5. Train the model for a total of 60 epochs</p><div id="19b3"><pre>model.fit<span class="hljs-params">(train_images, train_labels, <span class="hljs-attr">epochs</span>=60, <span class="hljs-attr">validation_data</span>=(validation_images, validation_labels)</span>)</pre></div><p id="f9b8">6. Evaluate the model on the test dataset</p><div id="a606"><pre><span class="hljs-keyword">model</span>.evaluate(test_images, test_labels)</pre></div><p id="aa12">The result of the evaluation will look similar to the example evaluation result below:</p><div id="d3d6"><pre><span class="hljs-attribute">10000</span>/<span class="hljs-number">10000</span><span class="hljs-meta"> [==============================] - 0s 34us/sample - loss: 0.3230 - accuracy: 0.8812</span></pre></div><div id="2d68"><pre><span class="hljs-string">[0.32301584649085996, 0.8812]</span></pre></div><p id="d063">The accuracy shown in the evaluation result example corresponds to the accuracy of our model of 88%.</p><p id="c77c">With some fine-tuning and training with more significant epoch numbers, the accuracy could be increased by a few percentages.</p><p id="26da"><a href="https://github.com/RichmondAlake/tensorflow_2_tutorials/blob/master/10_dropout.ipynb">Here’s a GitHub repository for the code presented in this article.</a></p><p id="0011">Dropout is a common regularization technique that is leveraged within the state of the art solutions to computer vision tasks such as pose estimation, object detection or semantic segmentation. The concept is simple to understand and easier to implement through its inclusion in many standard machine/deep learning libraries such as PyTorch, TensorFlow and Keras.</p><p id="8cf2">If you are interested in other regularization techniques and how they are implemented, have a read of the articles below.</p><p id="d538">Thanks for reading.</p><div id="31e4" class="link-block"> <a href="https://towardsdatascience.com/how-to-implement-custom-regularization-in-tensorflow-keras-4e77be082918"> <div> <div> <h2>How To Implement Custom Regularization in TensorFlow(Keras)</h2> <div><h3>Learn how to implement a custom neural network regularization technique using TensorFlow and Keras with relative ease.</h3></div> <div><p>towardsdatascience.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*HAewNEIXS4NbR8-ps2ltGw.jpeg)"></div> </div> </div> </a> </div><div id="ea10" class="link-block"> <a href="https://towardsdatascience.com/batch-normalization-in-neural-networks-code-d7c9b88da9f5"> <div> <div> <h2>Batch Normalization In Neural Networks (Code)</h2> <div><h3>Implemented With TensorFlow (Keras)</h3></div> <div><p>towardsdatascience.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*ipk6Xeowueq40fBcdf72vA.jpeg)"></div> </div> </div> </a> </div></article></body>

Technical

Understanding And Implementing Dropout In TensorFlow And Keras

Dropout is a common regularization technique that is leveraged within state-of-the-art solutions to computer vision tasks such as pose estimation, object detection or semantic segmentation.

Photo by John Matychuk on Unsplash

Introduction

This article covers the concept of the dropout technique, a technique that is leveraged in deep neural networks such as recurrent neural networks and convolutional neural network.

The Dropout technique involves the omission of neurons that act as feature detectors from the neural network during each training step. The exclusion of each neuron is determined randomly.

G.E Hinton proposed this simple technique in 2012 in the published paper: “Improving neural networks by preventing co-adaptation of feature detectors.

In this article, we will uncover the concept of dropout in-depth and look at how this technique can be implemented within neural networks using TensorFlow and Keras.

Understanding Dropout Technique

Neural networks have hidden layers in between their input and output layers, these hidden layers have neurons embedded within them, and it’s the weights within the neurons along with the interconnection between neurons is what enables the neural network system to simulate the process of what resembles learning.

Simple Neural Network built using https://playground.tensorflow.org/

The general idea is that the more neurons and layers within a neural network architecture, the greater the representational power it has. This increase in representational power means that the neural network can fit more complex functions and generalize well to training data.

Simply kept, there are more configurations for the interconnections between the neurons within the neural network layers.

Complex Neural Network built using https://playground.tensorflow.org/

The disadvantage of utilizing deeper neural networks is that they are highly prone to overfitting.

Overfitting is a common problem that is defined as the inability for a trained machine learning model to generalized well to unseen data, but the same model performs well on the data it was trained on.

The primary purpose of dropout is to minimize the effect of overfitting within a trained network.

Dropout technique works by randomly reducing the number of interconnecting neurons within a neural network. At every training step, each neuron has a chance of being left out, or rather, dropped out of the collated contribution from connected neurons.

This technique minimizes overfitting because each neuron becomes independently sufficient, in the sense that the neurons within the layers learn weight values that are not based on the cooperation of its neighbouring neurons.

Hence, we reduce the dependence on a large number of interconnecting neurons to generate a decent representational power from the trained neural network.

Supposedly you trained 7,000 different neural network architecture, to select the best one you simply take the average of all 7,000 trained neural network.

Well, the dropout technique actually mimics this scenario.

If the probability of a neuron getting dropped out in a training step is set to 0.5; we are actually training a variety of different network at each training step as it’s highly impossible that the same neurons are excluded at any two training steps. Therefore a neural network that has been trained utilizing the dropout technique is an average of all the different neurons connection combinations that have occurred at each training step.

Practical scenarios

In practical scenarios, or when testing the performance of the trained neural network that utilized dropout on unseen data, certain items are considered.

The first being the fact that dropout technique is actually not implemented on every single layer within a neural network; it’s commonly leveraged within the neurons in the last few layers within the network.

In the experiments conducted in the published paper, it was reported that when testing on the CIFAR-10 dataset, there was an error rate of 15.6% when dropout was utilized in the last hidden layer. This was an improvement from the error rate of 16.6% that was reported when the same dataset was tested on the same convolutional neural network but with no dropout technique included in any of the layers.

Comparison of error rates between models with dropout and models without dropout on the TIMIT benchmark

The second item is that within practical scenarios dropout isn’t utilized when evaluating a trained neural network. As a result of dropout not used during the evaluation or testing phase, the full potential of the neural network is realized. This means that all neurons within the network are active, and each neuron has more input connections than it had been trained with.

Therefore it’s expected to divide the weights of the neurons by one minus the dropout hyperparameter value(dropout rate that’s used during training). So if the dropout rate was 0.5 during training, then in test time the results of the weights from each neuron is halved.

Implementing Dropout Technique

Using TensorFlow and Keras, we are equipped with the tools to implement a neural network that utilizes the dropout technique by including dropout layers within the neural network architecture.

We only need to add one line to include a dropout layer within a more extensive neural network architecture. The Dropout class takes a few arguments, but for now, we are only concerned with the ‘rate’ argument. The dropout rate is a hyperparameter that represents the likelihood of a neuron activation been set to zero during a training step. The rate argument can take values between 0 and 1.

keras.layers.Dropout(rate=0.2)

From this point onwards, we will go through small steps taken to implement, train and evaluate a neural network.

  1. Load tools and libraries utilized, Keras and TensorFlow
import tensorflow as tf
from tensorflow import keras

2. Load the FashionMNIST dataset, normalize images and partition dataset into test, training and validation data.

(train_images, train_labels),(test_images, test_labels) = keras.datasets.fashion_mnist.load_data()
train_images = train_images /  255.0
test_images = test_images / 255.0
validation_images = train_images[:5000]
validation_labels = train_labels[:5000]

3. Create a custom model that includes a dropout layer using the Keras Model Class API.

class CustomModel(keras.Model):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.input_layer = keras.layers.Flatten(input_shape=(28,28))
        self.hidden1 = keras.layers.Dense(200, activation='relu')
        self.hidden2 = keras.layers.Dense(100, activation='relu')
        self.hidden3 = keras.layers.Dense(60, activation='relu')
        self.output_layer = keras.layers.Dense(10, activation='softmax')
        self.dropout_layer = keras.layers.Dropout(rate=0.2)
    
    def call(self, input, training=None):
        input_layer = self.input_layer(input)
        input_layer = self.dropout_layer(input_layer)
        hidden1 = self.hidden1(input_layer)
        hidden1 = self.dropout_layer(hidden1, training=training)
        hidden2 = self.hidden2(hidden1)
        hidden2 = self.dropout_layer(hidden2, training=training)
        hidden3 = self.hidden3(hidden2)
        hidden3 = self.dropout_layer(hidden3, training=training)
        output_layer = self.output_layer(hidden3)
        return output_layer

4. Load the implemented model and initialize both optimizers and hyperparameters.

model = CustomModel()
sgd = keras.optimizers.SGD(lr=0.01)
model.compile(loss="sparse_categorical_crossentropy", optimizer=sgd, metrics=["accuracy"])

5. Train the model for a total of 60 epochs

model.fit(train_images, train_labels, epochs=60, validation_data=(validation_images, validation_labels))

6. Evaluate the model on the test dataset

model.evaluate(test_images, test_labels)

The result of the evaluation will look similar to the example evaluation result below:

10000/10000 [==============================] - 0s 34us/sample - loss: 0.3230 - accuracy: 0.8812
[0.32301584649085996, 0.8812]

The accuracy shown in the evaluation result example corresponds to the accuracy of our model of 88%.

With some fine-tuning and training with more significant epoch numbers, the accuracy could be increased by a few percentages.

Here’s a GitHub repository for the code presented in this article.

Dropout is a common regularization technique that is leveraged within the state of the art solutions to computer vision tasks such as pose estimation, object detection or semantic segmentation. The concept is simple to understand and easier to implement through its inclusion in many standard machine/deep learning libraries such as PyTorch, TensorFlow and Keras.

If you are interested in other regularization techniques and how they are implemented, have a read of the articles below.

Thanks for reading.

Artificial Intelligence
Machine Learning
Data Science
Computer Vision
Programming
Recommended from ReadMedium