Keras is a neural networks API that runs on top of Tensorflow, Theano, or CNTK. Essentially, Keras provides high level building blocks for developing deep learning models and uses backend engines like Tensorflow to operate. As a “hello world” tutorial to Keras, we will be building a handwritten digit classifier using a convolutional neural network (CNN)!
Numpy is the core python library for scientific computing and Matplotlib is another library for creating data visualizations. If you are unfamiliar with Python and Numpy, I would highly recommend reading through this guide from the popular Stanford CS231 class on Convolutional Neural Networks as an additional resource to aid your learning.
While there are numerous ways to download these packages, my personal recommendation is to create a virtual environment (creates an isolated environment for Python projects) and use HomeBrew, which is a package manager, to install the respective packages. Here is a helpful tutorial on how to create and use virtual environments. Once you download HomeBrew from here, you can use brew to install Python, MatPlotLib, Numpy, Jupyter, and Keras.
Before we jump into coding, let’s understand the problem we are trying to solve. As humans, it is very easy for us to read and recognize a bunch of handwritten digits. Since this recognition is done unconsciously, we don’t realize how difficult this problem actually is. Now imagine teaching a computer how to recognize these digits and writing out a set of rules (otherwise known as an algorithm) to tell the computer how to distinguish each digit from another. This proves to be quite a difficult task!
Neural networks approach the problem of digit recognition very differently. Specifically, neural networks process a large dataset of handwritten digit images and then develop a model which essentially “learns” rules from these images for recognizing different digits. Thus, instead of feeding the computer a list of rules, the computer is actually coming up with its own rules and using them to accurately classify digits.
Now that we have our installation prerequisites and we understand the problem of digit recognition, we can start building our neural network! We are going to begin by importing packages and libraries that we need:
<a href="https://medium.com/media/e0a1322198a196f927e46c6af8f85896/href">https://medium.com/media/e0a1322198a196f927e46c6af8f85896/href</a>
We will be using the popular MNIST dataset of handwritten digits. This dataset is composed of 70,000 greyscale images of digits (60,000 for training and 10,000 for testing). The training set is what the model uses to actually learn from, and testing set is employed to test how well the model learned. We can use the testing set to determine the accuracy of the model. The dataset can be loaded directly from the Keras library into respective train and test datasets like this:
<a href="https://medium.com/media/239d9883002a770664ae72df05e7e8ef/href">https://medium.com/media/239d9883002a770664ae72df05e7e8ef/href</a>
Let’s focus on the X datasets for now. We can check the type and shape of each dataset:
<a href="https://medium.com/media/b03f347bdd65ebc1c5c71d501f72232c/href">https://medium.com/media/b03f347bdd65ebc1c5c71d501f72232c/href</a>
This should return the follow:
This means that we have 60,000 samples in the training dataset with dimensions of 28 x 28 pixels each, and 10,000 samples in the testing dataset with dimensions of 28 x 28 pixels as well. Moreover, the sets of images are loaded as Numpy arrays, and thus, various types of computations can be carried out on the images.
We need to explicitly set the depth of the input images, where the depth means the number of color channels. Thus a color image would have three channels, where a greyscale MNIST image would have 1 channel. Moreover, we want to convert our data type to float32, and normalize the input values from a 0 to 255 scale to a 0 to 1 scale. Here’s how we do that:
<a href="https://medium.com/media/4d1cc276bb699e7c561962d4a38c7e0e/href">https://medium.com/media/4d1cc276bb699e7c561962d4a38c7e0e/href</a>
I’ve added print statements to demonstrate how both the shape and type change after preprocessing. Running this code should return something like this:
As you can see, the X_train and X_test arrays have been refactored to another shape, another data type, and another scale. Now, let’s move on to preprocessing the Y datasets. Just to clarify, the X datasets contain the actual images and Y datasets are their respective class labels (so we should have 10 different classes, one for each digit).
If we look at the shape of our class labels training data (Y_train), we will realize that it is just a 1-D array with one label for each image. However, we want each image to have ten labels, where each label represents a different digit. And, the label’s value should be set to one if the digit matches the label, and zero if not. This is what is called one hot encoding.
<a href="https://medium.com/media/f4f38d690c4542c378d88e0a0b7cecc9/href">https://medium.com/media/f4f38d690c4542c378d88e0a0b7cecc9/href</a>
The added print statements will show you how the shape changes from shape (6000,) to (6000,10).
The next step is going to be building the neural network! We will be using Keras’ data core structure, which is called a model and is defined as way to organize layers in a neural network. Keras has two different ways of building the model: using the Sequential Model which is a linear stack of layers or the Functional API for more complex architectures. In this tutorial, we will be building a simple model using the Sequential Model. For more information on the Functional API, take a look at the documentation.
<a href="https://medium.com/media/e812798e3b95519189ae0d8c66472fac/href">https://medium.com/media/e812798e3b95519189ae0d8c66472fac/href</a>
The next step is adding layers to the model. We will not be going through the theory and math that underlie the architecture, as it requires a higher level of background and understanding. If you are interested in learning more details about layers and the order of these layers in the network, I would recommend reading up on neural networks. One helpful source is this textbook. Let's begin by adding our first layer:
<a href="https://medium.com/media/6835c75a5600a909957bd20cc56206af/href">https://medium.com/media/6835c75a5600a909957bd20cc56206af/href</a>
This is a convolutional layer that has 30 convolutional filters, each with a dimension (kernel size) of 5 x 5, and a rectifier activation function. The kernel size argument specifically dictates the number of rows and columns the filter should have. If the number of rows and columns are different, you would simply pass in a tuple, with (row, col) for the kernel size argument. Since this is the first layer in the model, it is called the input layer, and also requires the dimension of the input images to be explicitly stated. In our case, this is 28 x 28 x 1. We can now add more layers:
<a href="https://medium.com/media/609baef89dc29b8406d61be0ca2650b4/href">https://medium.com/media/609baef89dc29b8406d61be0ca2650b4/href</a>
The MaxPooling Layer is a pooling layer the reduces the number of parameters in the model by using a filter with dimensions given by the pool_size argument (in this case 2 x 2) to slide over the previous layer and take the maximum of the four values.
The Dropout Layer randomly gets rid of x percent (in this case, the rate is 20%) of the layer to reduce overfitting.
<a href="https://medium.com/media/2b8c894cbc72ef0df6402192429691c7/href">https://medium.com/media/2b8c894cbc72ef0df6402192429691c7/href</a>
The Flatten Layer simply converts the 2D matrix to a vector. The data needs to be in vector format for the next layer, Dense, to be processed.
The Dense Layer is a simple fully connected layer that has ‘units’ or otherwise known more colloquially as neurons, and a an activation function. The output layer (or the last layer) is a Dense Layer an has ten neurons (units) to represent the number of classes.
Now that we have built our model, let’s test it!
<a href="https://medium.com/media/41721edd6f927bedf60d27fce7f14da0/href">https://medium.com/media/41721edd6f927bedf60d27fce7f14da0/href</a>
Running this code should return something like this:
Results of Model
These functions (compile, fit, and evaluate) are all part of the model class. Essentially, compile configures the model, fit trains the model for given number of iterations (given as epochs), and evaluate tests the model and returns metrics on how well the model did. Since this is a beginner tutorial, we aren’t going to go into much more detail about these functions.
Through the printed results, you can see that the after ten iterations (epochs) our model has achieved an accuracy of 98.51%, and a loss of 4.91%.
You have successfully created your first neural network that accurately classifies handwritten digits! Feel free to experiment with other layers or even changing the order of layers, and see how that affects the model.