avatarRana singh

Summary

This webpage provides a step-by-step guide on implementing a softmax classifier using TensorFlow to classify the MNIST dataset of handwritten digits, with included code snippets and explanations.

Abstract

The article is a comprehensive tutorial for creating a softmax classifier, a type of neural network suitable for multi-class classification problems. It begins with instructions on installing TensorFlow and loading the MNIST dataset, then proceeds to define placeholders and variables, construct the softmax function, and specify the loss function using cross-entropy. The optimizer used for training the model is the GradientDescentOptimizer. The tutorial also includes code to initialize the model, run training iterations with mini-batches, and calculate accuracy. Additionally, it demonstrates how to plot the training and test loss over epochs and concludes with the final accuracy achieved on the MNIST dataset, which is around 90%. The detailed code for the implementation is available on GitHub.

Opinions

  • The author emphasizes the importance of using softmax for multi-class classification, as it provides a probability distribution over classes.
  • The use of mini-batches in stochastic gradient descent is recommended for efficient training, balancing computational cost and model performance.
  • InteractiveSession is suggested for running TensorFlow operations, which allows for interactive workflows.
  • The tutorial acknowledges the trade-off between using all data points versus mini-batches for each training step, favoring mini-batches for their computational efficiency.
  • The author provides additional resources and references, such as TensorFlow's documentation and a GitHub repository with the complete code, indicating a commitment to transparency and further learning.
  • The inclusion of error plotting is presented as a valuable tool for visualizing the learning process and assessing the model's performance over time.

Softmax Classifier using TensorFlow on MNIST dataset with sample code

install tensorflow

!pip install tensorflow

Loading Mnist dataset

Every MNIST data point has two parts: an image of a handwritten digit and a corresponding label. We’ll call the images “x” and the labels “y”. Both the training set and test set contain images and their corresponding labels; for example the training images are mnist.train.images and the training labels are mnist.train.labels.

import tensorflow.examples.tutorials.mnist.input_data as input_data
mnist=input_data.read_data_sets(“MNIST”, one_hot=True)

Check dimension of train and test of MNIST dataset

print(“number of data points : “, mnist.train.images.shape[0],”number of pixels in each image :”,mnist.train.images.shape[1])

Number of train data points : 55000 number of pixels in each image : 784

mnist.train.images is a tensor (an n-dimensional array) with a shape of [55000, 784]. The first dimension is an index into the list of images and the second dimension is the index for each pixel in each image. Each entry in the tensor is a pixel intensity between 0 and 1, for a particular pixel in a particular image.

print(“number of data points : “, mnist.test.labels.shape[0],” length of the one hot encoded label vector :”,mnist.test.labels.shape[1])

Number of test data points: 10000 length of the one hot encoded label vector : 10

Activate library

If you want to assign probabilities to an object being one of several different things, softmax (Multiclass Logistic regression) is the thing to do, because softmax gives us a list of values between 0 and 1 that add up to 1. Even later on, when we train more sophisticated models, the final step will be a layer of softmax.

import tensorflow as tf
import numpy as np

Defining Placeholders, Variables, predicted y and loss function

x = tf.placeholder(tf.float32, [None, 784])

x isn’t a specific value. It’s a placeholder. A placeholder can be imagined as a memory unit that we use to load various mini-batches of input data while training. We want to be able to input any number of MNIST images, each flattened into a 784-dimensional vector. We represent this as a 2-D tensor of floating-point numbers, with a shape [None, 784] (Here None means that a dimension can be of any length.)

W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))

We also need the weights and biases for our model. We could imagine treating these like additional inputs, but TensorFlow has an even better way to handle it: Variable.

y = tf.nn.softmax(tf.matmul(x, W) + b)# predicted y
y_ = tf.placeholder(tf.float32, [None, 10])#actual y

Now, First, we multiply x by W with the expression tf.matmul(x, W). This is flipped from when we multiplied them in our equation, where we had Wx, as a small trick to deal with x being a 2D tensor with multiple inputs. We then add b, and finally apply tf.nn.softmax.

cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
#Tutorial for tf.reduce_sum: https://www.dotnetperls.com/reduce-sum-tensorflow

Defining the loss function: multi-class log-loss/cross-entropy First, tf.log computes the logarithm of each element of y. Next, we multiply each element of y_ with the corresponding element of tf.log(y). Then tf.reduce_sum adds the elements in the second dimension of y, due to the reduction_indices=[1] parameter. Finally, tf.reduce_mean computes the mean over all the examples in the batch. Reduction is an operation that removes one or more dimensions from a tensor by performing certain operations across those dimensions.

Defining optimizer

train_step=tf.train.GradientDescentOptimizer(0.05).minimize(cross_entropy)
#https://www.tensorflow.org/versions/r1.2/api_guides/python/train#Optimizers

In this case, we ask TensorFlow to minimize cross_entropy using the gradient descent algorithm with a learning rate of 0.05. What TensorFlow actually does here, behind the scenes, is to add new operations to your computation-graph which implement backpropagation and gradient descent. Then it gives you back a single operation which, when run, does a step of gradient descent training, slightly tweaking your variables to reduce the loss.

Launch model

#Now launch the model in an InteractiveSession
sess = tf.InteractiveSession()

We first have to create an operation to initialize the variables we created:

tf.global_variables_initializer().run()

# We run train_step feeding in the batches data to replace the placeholders

for _ in range(1000):
 batch_xs, batch_ys = mnist.train.next_batch(100)
 sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

Each step of the loop, we get a “mini-batch” of one hundred random data points from our training set. Using small batches of random data is called stochastic training — in this case, stochastic gradient descent. Ideally, we’d like to use all our data for every step of training because that would give us a better sense of what we should be doing, but that’s expensive. So, instead, we use a different subset every time. Doing this is cheap and has much of the same benefit.

# https://stackoverflow.com/a/41863099

correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

tf.argmax(input, axis=None, name=None, dimension=None) Returns the index with the largest value across axis of a tensor.

Plotting error

# https://gist.github.com/greydanus/f6eee59eaf1d90fcb3b534a25362cea4
# https://stackoverflow.com/a/14434334
%matplotlib notebook
import matplotlib.pyplot as plt
import numpy as np
import time

def plt_dynamic(x, y, y_1, ax, colors=[‘b’]):
   ax.plot(x, y, ‘b’, label=”Train Loss”)
   ax.plot(x, y_1, ‘r’, label=”Test Loss”)
   if len(x)==1:
   plt.legend()
   fig.canvas.draw()

Now summarizing everything in a single cell

training_epochs = 15
batch_size = 1000
display_step = 1
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = y, labels = y_))
train_step = tf.train.GradientDescentOptimizer(0.05).minimize(cross_entropy)
fig,ax = plt.subplots(1,1)
ax.set_xlabel(‘epoch’) ; ax.set_ylabel(‘Soft Max Cross Entropy loss’)
xs, ytrs, ytes = [], [], []
for epoch in range(training_epochs):
 train_avg_cost = 0.
 test_avg_cost = 0.
 total_batch = int(mnist.train.num_examples/batch_size)
 # Loop over all batches
 for i in range(total_batch):
 batch_xs, batch_ys = mnist.train.next_batch(batch_size)
 _, c = sess.run([train_step, cross_entropy], feed_dict={x: batch_xs, y_: batch_ys})
 train_avg_cost += c / total_batch
 c = sess.run(cross_entropy, feed_dict={x: mnist.test.images, y_: mnist.test.labels})
 test_avg_cost += c / total_batch
xs.append(epoch)
 ytrs.append(train_avg_cost)
 ytes.append(test_avg_cost)
 plt_dynamic(xs, ytrs, ytes, ax)
plt_dynamic(xs, ytrs, ytes, ax)
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(“Accuracy:”, accuracy.eval({x: mnist.test.images, y_: mnist.test.labels}))

Above plot is between log-loss vs number of epochs where the blue line show train error and red line show the test error.

Accuracy obtains using tensor flow is 90% on MNIST dataset.

=====Detail code can be found at below GitHub link =======

https://github.com/ranasingh-gkp/Applied_AI_O/blob/master/Module%208_NN%2C%20Computer%20vision%2C%20Deep%20learning/Tensorflow_Softmax_Classifier_mnistdataset.ipynb

Reference:

======================================

Machine Learning
Deep Learning
Optimization
Data Science
Calculus
Recommended from ReadMedium