How to Perform MNIST Digit Recognition with a Multi-layer Neural Network

Human Visual System is a marvel of the world. People can readily recognise digits. But it is not as simple as it looks like. The human brain has a million neurons and billions of connections between them, which makes this exceptionally complex task of image processing easier. People can effortlessly recognize digits.

However, it turns into a challenging task for computers to recognize digits. Simple hunches about how to recognize digits become difficult to express algorithmically. Moreover, there is a significant variation in writing from person to person, which makes it immensely complex.

A Handwritten digit recognition system is the working of a machine to train itself so that it can recognize digits from different sources like emails, bank cheque, papers, images, etc.

Google Colab

Google Colab has been used to implement the network. It is a free cloud service that can be used to develop deep learning applications using popular libraries such as Keras, TensorFlow, PyTorch, and OpenCV. The most important feature that distinguishes Colab from other free cloud services is; it provides GPU and is totally free. Thus, if PC is incompatible with hardware requirements or does not support GPU, then it is the best option because a stable internet connection is the only requirement.

MNIST Datasets

MNIST stands for “Modified National Institute of Standards and Technology”. It is a dataset of 70,000 handwritten images. Each image is of 28x28 pixels i.e. about 784 features. Each feature represents only one pixel’s intensity i.e. from 0(white) to 255(black). This database is further divided into 60,000 training and 10,000 testing images.

Phases of Implementation

Import the libraries

First, we imported all the libraries that we are going to use.

We imported TensorFlow which is an open-source free library that is used for machine learning applications such as neural networks etc. Further, we imported pyplot function, which is basically used for plotting, from the matplotlib library which is used for visualisation purposes. After that, we imported NumPy i.e. Numerical Python which is used to perform various mathematical operations.

Load the dataset

The Keras library already contains some datasets such as CIFAR10, CIFAR100, Boston Housing price regression dataset, IMDB movie review sentiment classification dataset etc.

The MNIST dataset is also part of it. So, we imported it from keras.datasets and loaded it into variable “objects”. The objects.load_data() method returns us the training data(train_img), its labels(train_lab) and also the testing data(test_img) and its labels(test_lab). Out of the 70,000 images provided in the dataset, 60,000 are given for training and 10,000 are given for testing.

Before preprocessing the data, we first displayed the first 20 images of the training set with the help of for loop.

subplot() is used to add a subplot or grid-like structure to the current figure. The first argument is for “no. of rows”, second for “no. of columns” and third for position index in the grid.

Suppose we have to plot 10 images in the 4x5 grid starting from the second position in the grid. Then, it will be like

imshow() is used to display data as an image i.e. training image (train_img[i]) whereas cmap stands for the colour map. Cmap is an optional feature. Basically, if the image is in the array of shape (M, N), then the cmap controls the colour map used to display the values. cmap=‘gray’ will display image as grayscale while cmap=‘gray_r’ is used to display image as inverse grayscale.

title() sets title for each image. We have set “Digit: train_lab[i]” as the title for each image in the subplot.

subplots_adjust() is used for tuning subplot layout. In order to change the space provided between two rows, we have used hspace. If you want to change space between two columns then you can use wspace.

By default parameters of the subplot layout are,

In order to hide the axis of the image, plt.axis(‘off’) has been used.

After that, we displayed the shape of training and testing section.

(60000,28,28) means there are 60,000 images in the training set and each image is of size 28x28 pixels. Similarly, there are 10,000 images of the same size in the testing set.

So each image is of size 28x28 i.e. 784 features, and each feature represents the intensity of each pixel from 0 to 255.

You can use print(train_img[0]) to print the first training set image in the matrix form of 28x28.

We plotted the first training image on a histogram. Before normalisation,

hist() is used to plot the histogram for the first training image i.e. train_img[0]. The image has been reshaped into a 1-D array of size 784. facecolor is an optional parameter which specifies the colour of the histogram. Title of the histogram, Y-axis and X-axis have been named as “Pixel vs its intensity”, “PIXEL” and “Intensity”.

Pre-process the data

Before feeding the data to the network, we will normalize it. Normalizing the input data helps to speed up the training. Also, it reduces the chance of getting stuck in local optima, since we’re using stochastic gradient descent to find the optimal weights for the network.

The pixel values are between 0 and 255. So, scaling of input values is good when using neural network models since the scale is well known and well behaved, we can very quickly normalize the pixel values to the range 0 and 1 by dividing each value by the maximum intensity of 255.

After normalisation,

Creating the model

There are 3 ways to create a model in Keras:

The Sequential model is very straightforward and simple. It allows to build a model layer by layer.

The Functional API which is an easy-to-use, fully-featured API that supports arbitrary model architectures. This is the Keras “industry-strength” model.

Model subclassing where you implement everything from scratch on your own.

Here, we have used the Sequential model. This model has one input layer, one output layer and two hidden layers.

Sequential() is used to create a layer of the network in sequence.

.add() is used here to add the layer into the model.

In the first layer(input layer), we feed image as the input. Since each image is of size 28x28, hence we have used Flatten() to compress the input.

We have used Dense() in the other layers. It ensures that each neuron in the previous layer is connected to every neurone in the next layer.

The model is a simple neural network with two hidden layers with 512 neurons. A rectifier linear unit activation (ReLU) function is used for the neurons in the hidden layers. The nicest thing about it is that its gradient is always equal to 1, this way we can pass the maximum amount of the error through the network during back-propagation.

The output layer has 10 neurons i.e. for each class from 0 to 9. A softmax activation function is used on the output layer to turn the outputs into probability-like values.

Note: You can add more neurons int the hidden layers. You can even increase the no. of hidden layers int the model to increase efficiency. However, it will take more time during training.

Compiling the network

Next, we need to compile our model. Compiling the model takes three parameters: optimizer, loss and metrics. The optimizer controls the learning rate. We are using ‘adam’ as our optimizer. It is generally a good optimizer to use for many cases. It adjusts the learning rate throughout the training.

We will use ‘Sparse_Categorical_Crossentropy’ for our loss function because it saves time in memory as well as computation since it simply uses a single integer for a class, rather than a whole vector. A lower score indicates that the model is performing better.

In order to determine the accuracy, we will use the ‘accuracy’ metric to see the accuracy score on the validation set when we train the model.

Train the model

We will train the model with the help of fit() function. It will have parameters as training data (train_img), training labels (train_lab) and the number of epochs. The number of epochs is the number of times the model will cycle through the data. The more epochs we run, the more the model will improve, up to a certain point. After that point, the model will stop improving during each epoch.

We will save the model as project.h5

Evaluate the model

model.evaluate() method computes the loss and any metric defined when compiling the model. So in our case, the accuracy is computed on the 10,000 testing examples using the network weights given by the saved model.

Verbose can be either 0,1, or 2. By default verbose is 1.

verbose = 0, means silent.

verbose = 1, which includes both progress bar and one line per epoch.

verbose = 2, one line per epoch i.e. epoch no./total no. of epochs.

After evaluating the model, we will now check the model for the testing section.

model.predict() is used to do prediction on the testing set.

np.argmax() returns the indices of the maximum values along an axis.

Now, in order to make a prediction for a new image that is not part of MNIST dataset. We will first create a function named “load_image”.

Above function converts the image into an array of pixels which is fed to the model as an input.

In order to upload a file from local drive, we used the code:

from google.colab import files
uploaded = files.upload()

It will lead you to select a file. Click on “Choose Files” then select and upload the file and wait for the file to be uploaded 100%. You will see the name of the file once Colab has uploaded it.

In order to display image file, we used the code:

from IPython .display import Image Image(‘5img.jpeg’,width=250,height=250)
5img.jpeg is the file name.

As you can see we have successfully predicted the value as 5.

Now, if we want to run the model after a few days then, we will have to run the whole code again, which is time-consuming.

In that case, you can use the saved model i.e. project.h5

So, before closing the colab notebook, you can download the model from the folder symbol.

So, when you try to run the model again, all you have to do is upload project.h5 file from the computer by using the code :

from google.colab import files
uploaded = files.upload()

When the file is 100% uploaded, use the following code & after that, you can predict the digit for new images without running the whole code.

model=tf.keras.models.load_model(‘project.h5’)

Link for reference https://colab.research.google.com/drive/10LzhqSlJx4bnCNT6C8llhuXTDuh_WQPG?usp=sharing

Thanks for reading!

Also published at https://medium.com/@officialgargijha/mnist-handwritten-digit-recognition-using-neural-network-2b729bacb0d5