Building and training a Convolutional Neural Network (CNN) from scratch

In this post, I am going to explain how to create a convolution neural network from scratch and to train them using one-hot encoding. First, you need to install TensorFlow, Keras, OpenCV3 and then we begin.

We will be building a three-layered convolutional neural network, and then we train and test it. So, to begin we need to proceed step by step in a hierarchical fashion.

Prepare the training and testing data.
Build the CNN layers using the Tensorflow library.
Select the Optimizer.
Train the network and save the checkpoints.
Finally, we test the model.

Prepare the training and testing data

First, we need to prepare the training data so that we can provide the network with clean and unambiguous images. We have to convert our training images into categorical data using one-hot encoding, which creates binary columns with respect to each class. For instance, suppose you have 3 classes, let’s say Car, pedestrians and dog, and now you want to train them using your network. First and foremost you need to define labels representing each of the class, and in such cases, one hot encoding creates binary labels for all the classes, i.e. car :[1,0,0], pedestrians:[0,1,0] and dog:[0,0,1]

Mapping categorical values to integer values. For example:

The car is mapped to 1
pedestrians are mapped to 2
the dog is mapped to 3

Represent each integer value as a binary vector that is all zero values except the index of the integer. For example:

1 is mapped to [1,0,0]
2 is mapped to [0,1,0]
3 is mapped to [0,0,1]

Following below is the code to generate one hot encoding with respect to images (training, Validation and testing) corresponding to various classes.

import os
os.environ['KERAS_BACKEND'] = 'tensorflow'
from keras.preprocessing.image import ImageDataGenerator

gen = ImageDataGenerator()

# GENERATE TRAINING DATA
def train_images(path):
        directory = str(path)

        train_generator = gen.flow_from_directory(
        directory,  # this is the target directory
        target_size=(224, 224),  # all images will be resized
        color_mode='rgb',
        classes=None,
        batch_size=32,
        shuffle=True,
        seed=None,
        save_to_dir=None,
        save_prefix='',
        class_mode='categorical')
        return train_generator#GENERATE VALIDATION DATA
def val_images(path):
        directory = str(path)

        val_generator = gen.flow_from_directory(
        directory,  # this is the target directory
        target_size=(224, 224),  # all images will be resized
        color_mode='rgb',
        classes=None,
        batch_size=32,
        shuffle=True,
        seed=None,
        save_to_dir=None,
        save_prefix='',
        class_mode='categorical')
        return val_generator#GENERATE TEST DATA
def test_images(path):
        directory = str(path)
        test_generator = gen.flow_from_directory(
        directory,  # this is the target directory
        target_size=(224, 224),  # all images will be resized
        color_mode='rgb',
        classes=None,
        batch_size=32,
        shuffle=True,
        seed=None,
        save_to_dir=None,
        save_prefix='',
        class_mode='categorical')
        return test_generator

If you want to see how batches of images are generated with the labels, then here is the code snippet (copy and run it in your system)

import os
os.environ['KERAS_BACKEND'] = 'tensorflow'
from keras.preprocessing.image import ImageDataGenerator

gen = ImageDataGenerator()

train_generator = gen.flow_from_directory(
        'path to the directory',  # this is the target directory
        target_size=(100, 100),  # all images will be resized
        color_mode='rgb',
        classes=None,
        batch_size=64,
        shuffle=True,
        seed=None,
        save_to_dir=None,
        save_prefix='',
        class_mode='categorical')
for image_batch,labels in train_generator:
        print(image_batch,labels)

I have used flow_from_directory() method to generate the encodings, which returns batches of images along with the labels (one-hot encoding). For more details, you can look upon this site to know more about the ImageDataGenerator class:). Moreover, I have tested with 10 object categories, and you can create one hot encodings by providing the path of the directory which contains various subfolders representing different classes.

Note: I used Keras with the TensorFlow backend.

Build the CNN layers using the Tensorflow Library

To create the CNN, we first need to define the basic components of the network, which consists of convolutional_layer, pooling_layer, activation_layer, dropout_layer, and a Full_connected_layer. So now we will define a class which includes the methods defining all the components of a CNN network.

class CNN_layers:
    "layers to define architecture"""

    def add_weights(self, shape):
        """ A method to create weight connections for all the layers"""
        return tf.Variable(tf.truncated_normal(shape=shape, stddev=0.05))

    def bias(self, shape):
        """ A method to create biases for all the connections """
        return tf.Variable(tf.constant(0.05, shape=shape))

    def conv_layer(self, prev_layer, kernel, input_shape, output_shape, stride):
        """ create the convolution layers with the weights"""
        weights = self.add_weights([kernel, kernel, input_shape, output_shape])
        bias = self.bias([output_shape])
        stride = [1, stride, stride, 1]
        c_layer = tf.nn.conv2d(prev_layer, weights, stride, padding='SAME') + bias
        return c_layer

    def pooling_layer(self, c_layer, size, stride_s):
        """ create the pooling layer """
        kernel = [1, size, size, 1]
        stride = [1, stride_s, stride_s, 1]
        p_layer = tf.nn.max_pool(c_layer, kernel, stride, padding='SAME')
        return p_layer

    def flat_layer(self, prev_layer):
        """ a method to flatten the 2D features into single dimension """
        input_size = prev_layer.get_shape().as_list()
        output_size = input_size[-1] * input_size[-2] * input_size[-3]
        return tf.reshape(prev_layer, [-1, output_size]), output_size

    def fc_layer(self, prev_layer, input_shape, output_shape):
        """ Create the Fully connected layer """
        weights = self.add_weights([input_shape, output_shape])
        bias = self.bias([output_shape])
        fc = tf.add(tf.matmul(prev_layer, weights), bias)
        return fc

    def activation(self, layer):
        """ We define the activation layer which uses the Relu activation function """
        return tf.nn.relu(layer)

    def dropout(self, layer):
        return tf.nn.dropout(layer, 0.5)

We next design the architecture by calling the layer class.

def create_CNN(image, number_of_class):
    # create the first convolutional layer
    c1 = model.conv_layer(image, 5, 3, 16, 1)
    p1 = model.pooling_layer(c1, 5,2)
    l1 = model.activation(p1)

    # create the second convolutional layer
    c2 = model.conv_layer(l1, 4, 16, 32, 1)
    p2 = model.pooling_layer(c2, 5,2)
    l2 = model.activation(p2)

    # create the third convolutional layer
    c3 = model.conv_layer(l2, 3, 32, 64, 1)
    p3 = model.pooling_layer(c3, 5,2)
    l3 = model.activation(p3)

    # convert the 2D activations into single vector
    emb1, length = model.flat_layer(l3)

    # feed the network to the fully connected layer
    emb2 = model.fc_layer(emb1, length, 1024)

    # compute the activation
    emb3 = model.activation(emb2)    #Apply droput
    emb4 = model.dropout(emb3)    # OUTPUT_LAYER
    # feed the previous activations to the output layer
    net = model.fc_layer(emb3, 1024, number_of_class)
    print(net)
    return net

Regularization: Dropout

Dropout consists of randomly setting a fraction rate of input units to 0 at each update during training time, which helps prevent overfitting. In our case we set it to 0.5.

Why do we need Regularization? The answer is to prevent over-fitting and also reduces the Bias variance trade-off, which penalizes the weight matrix if the values are large. It involves making sure that the weights in our neural network do not grow too large during the training process. During training, our neural networks will converge on local minimum values of the cost function. There will be many of these local minima, and many of them will have roughly the same cost function. Some of these local minimum values will have large weights connecting the nodes and layers, others will have smaller values. We want to force our neural network to pick weights which are smaller rather than larger.

This makes our network less complex — but why is that? Consider the previous section, where we discussed that an over-fitted model has large changes in predictions compared to small changes in input. In other words, if we have a little bit of noise in our data, an over-fitted model will react strongly to that noise. The analogous situation in neural networks is when we have large weights and such a network is more likely to react strongly to noise. This is because large weights will amplify small variations in the input which could be solely due to noise. Therefore, we want to adjust the cost function to try to make the training drive the magnitude of the weights down, while still producing good predictions.

Selecting Optimizer

Now, coming to the training part, where we will be using Adam Optimizer to train the Convolutional Neural Network. But, before I want to clear some simple doubts like: Why not Vanilla Gradient Descent (GD) or SGD (Stochastic Gradient Descent) for training purpose?

#Vanilla Gradient Descent
def gradientDescent(x,n, num_iters):    m = n # number of training examples
    for i in range(num_iters):
         y = np.dot(x, w)
         w = w - alpha * (1.0/m) * np.dot(X.T, y-y')     return w
(just for example, not implemented in the code)

To perform the GD, we need to calculate the gradient of the cost function. And to calculate the gradient of the cost function, we need to sum the cost of each sample. Now, if we have millions of samples, then we have to loop through a million times or use the dot product. Well, it is very time consuming, and we need to move away from it if we are dealing with big data. In brevity, just to move a single step towards the minimum we need to do millions of iterations to compute the cost. So we next move towards SGD. Basically, in SGD, we are using the cost gradient of 1 example at each iteration, instead of using the sum of the cost gradient of ALL examples. So, if you see in the code above, in SGD we will not have the dot product.

Adam Optimizer: (RMSprop + Momentum) It is different from the classical stochastic gradient descent. In Stochastic gradient descent maintains a single learning rate (termed alpha) for all weight updates and the learning rate does not change during training, whereas, here we include the momentum terms and the learning rate adapts as the learning unfolds.

Finally, let us train the algorithm.

import tensorflow as tf
import os
from create_architecture import create_CNN
from config import *
from test_generate_labels import train_images



session =tf.Session()
# create place holders
images_holder = tf.placeholder(tf.float32, shape=[None, height, width, 3])

images_label = tf.placeholder(tf.float32, shape=[None, number_of_classes])


def trainer(number_of_images):
    """ method to train"""

    z = create_CNN(images_holder, number_of_classes)

    y_  = tf.nn.softmax(z)

    cost =                tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=z, labels=images_label))

    tf.summary.scalar('cost', cost)

    optimizer = tf.train.AdamOptimizer().minimize(cost)


    correct_prediction = tf.equal(tf.argmax(images_label, 1), tf.argmax(y_, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))


    # initialize the optimizer

    session.run(tf.global_variables_initializer())

    writer = tf.summary.FileWriter(model_save_name, graph=tf.get_default_graph())
    merged = tf.summary.merge_all()
    saver = tf.train.Saver(max_to_keep=4)

    total_batch = int(number_of_image s /batch_size)


    for epoch in range(epochs):

        tools = utils()
        avg_cost = 0

        for batch in range(total_batch):
        
            images, labels = train_images()        

            loss ,summary = session.run([cost ,merged], feed_dict={images_holder: images, images_label: labels})

            print('loss', loss)

            session.run(optimizer, feed_dict = {images_holder :images, images_label: labels})

            avg_cost += loss / total_batch


        images_test, labels_test = tools.batch_dispatch()

        test_acc = session.run(accuracy,
                               feed_dict={images_holder: images_test, images_label: labels_test})
        print('Epoch number ', epoch, "cost =", "{:.3f}".format(avg_cost), "test accuracy: {:.3f}".format(test_acc))

        # writer.add_summary(summary,counter)
        saver.save(session, os.path.join(model_save_name))



if __name_ _= ="__main__":

    number_of_images = sum([len(files) for r, d, files in os.walk("./path /to /the/ dataset")])
    print('number of images',  number_of_images)
    trainer(number_of_images)

After training, the checkpoint files are saved in the operating directory.

A note from Plain English

We are always interested in helping to promote quality content. If you have an article that you would like to submit to any of our publications, send us an email at [email protected] with your Medium username and we will get you added as a writer.