This tutorial will guide you step-by-step on how to train and deploy a deep learning model. Having scoured the internet far and wide, I found it difficult to find tutorials that take you from the beginning to the end of building and deploying deep learning models. While there are some excellent articles on various stages of this process (that I used in my own journey to deep learning), I wrote this tutorial to close what I view is an important gap.
This tutorial walks you through the entire process of training a model in TensorFlow and deploying it to Heroku — code available in the GitHub repo here.
The full list of the technology we are going to use:
TensorFlow, Keras (and of course python) have been increasingly adopted across industries and research communities:
Deep learning frameworks ranking computed by Jeff Hale, based on 11 data sources across 7 categories
TensorFlow has gradually increased its power score due to ease of use — it offers APIs for beginners and experts alike to quickly get into developing for desktop, mobile, web or cloud. For an easy introduction to TensorFlow see: (easy-tensorflow)
The Keras website explains why it’s user adoption rate has been soaring in 2018:
Since training and deployment are complicated and we want to keep it simple, I have divided this tutorial into 2 parts:
In order to benefits from this blog:
One of the hardest parts of solving a deep leaning problem is having a well-prepared dataset. There are three general steps to preparing data:
In statistics, sampling bias is a bias in which a sample is collected in such a way that some members of the intended population are less likely to be included than others. It results in a biased sample, a non-random sample of a population (or non-human factors) in which all individuals, or instances, were not equally likely to have been selected. If this is not accounted for, results can be erroneously attributed to the phenomenon under study rather than to the method of sampling.
(Khan Academy provides great examples on how to identify bias in samples and surveys.)
2. Remove outliers from data.
Outliers are extreme values that deviate from other observations on data , they may indicate a variability in a measurement, experimental errors or a novelty. In other words, an outlier is an observation that diverges from an overall pattern on a sample.
3. Transform our dataset into a language a machine can understand — numbers. I will only focus in this tutorial, on transforming datasets as the other two points require a blog to cover them in full details.
To prepare a dataset you must of course first have a dataset. We are going to use the Fashion-MNIST dataset because it is already optimized and labeled for a classification problem.
(Read more about the Fashion-MNIST dataset here.)
The Fashion-MNIST dataset has 70,000 grayscale, (28x28px) images separated into the following categories:
+-----------+---------------------+
| Label | Description |
+-----------+---------------------+
| 0 | T-shirt/top |
| 1 | Trouser |
| 2 | Pullover |
| 3 | Dress |
| 4 | Coat |
| 5 | Sandal |
| 6 | Shirt |
| 7 | Sneaker |
| 8 | Bag |
| 9 | Ankle boot |
+-----------+---------------------+
Fortunately, the majority of deep learning (DL) frameworks support Fashion-MNIST dataset out of the box, including Keras. To download the dataset yourself and see other examples you can link to the github repo — here.
from keras.datasets.fashion_mnist import load_data
# Load the fashion-mnist train data and test data(x_train, y_train), (x_test, y_test) = load_data()
# outputx_train shape: (60000, 28, 28) y_train shape: (60000,)x_test shape: (10000, 28, 28) y_test shape: (10000,)
By default load_data()function
returns training and testing dataset.
It is essential to split your data into training and testing sets.
Training data: is used to train the Neural Network (NN)
Testing data : is used to validate and optimize the results of the Neural Network during the training phase, by tuning and re-adjust the hyperparameters.
Hyperparameter are parameters whose value is set before the learning process begins.
After training a Neural Network, we run the trained model against our validation dataset to make sure that the model is generalized and is not overfitting.
What is overfitting? :)
Overfitting means a model predicts the right result when it tests against training data, but otherwise fails to predict accurately. However, if a model predicts the incorrect result for the training data, this is called underfitting. For further explanation of overfitting and underfitting.
Thus, we use the validation dataset to detect overfitting or underfitting. But, most of the time we train the model multiple times in order to have a higher score in the training and validation datasets. Since we retrain the model based on the validation dataset result, we can end up overfitting not only in training dataset but also in the validation set. In order to avoid this, we use a third dataset that is never used during training, which is the testing dataset.
Here are some of the samples of the data
norm_x_train = x_train.astype('float32') / 255norm_x_test = x_test.astype('float32') / 255
Normalize the data dimensions so that they are of approximately the same scale. In general, normalization makes very deep NN easier to train, especially in Convolutional and Recurrent neural network. Here are a nice explanation video and an article
from keras.utils import to_categorical
encoded_y_train = to_categorical(y_train, num_classes=10, dtype='float32')
encoded_y_test = to_categorical(y_test, num_classes=10, dtype='float32')
One hot encoding is a representation of categorical variables as binary vectors. Here is the full explanation if you would like to have a deep understanding, and do not hesitate to ask if you have a question
MobileNet V2 model accepts one of the following formats: (96, 96), (128, 128), (160, 160),(192, 192), or (224, 224). In addition, the image has to be 3 channel (RGB) format. Therefore, We need to resize & convert our images. from (28 X 28) to (96 X 96 X 3).
Running the previous code in all our data, may eat up a lot of memory resources; therefore, we are going to use a generator. Python Generator is a function that returns an object (iterator) which we can iterate over (one value at a time).
After we have split, normalized and converted the dataset, now we are going to train a model.
There are many techniques for training the model, we will only cover one of them, though I believe it is one of the most important methods or strategies, — Transfer Learning.
Transfer learning in deep learning means to transfer knowledge from one domain to a similar one. In our example, I have chosen the MobileNet V2 model because it’s faster to train and small in size. And most important, MobileNet is pre-trained with ImageNet dataset.
ImageNet is an image dataset organized according to the WordNet hierarchy. Each meaningful concept in WordNet, possibly described by multiple words or word phrases, is called a “synonym set” or “synset”. There are more than 100,000 synsets in WordNet, majority of them are nouns (80,000+). In ImageNet, we aim to provide on average 1000 images to illustrate each synset. Images of each concept are quality-controlled and human-annotated.[2]
Since our dataset is kind of subset of the ImageNet dataset, then we are going to transfer the knowledge of this model onto our datasets. A nice article that explains this in more detail: A Gentle Introduction to Transfer Learning for Deep Learning
The below code is important to understand when working using Transfer Learning as a technique:
for layer in
# trainable has to be false in order to freeze the layerslayer.trainable = False # or True
As using a pre-trained model (e.g. MobileNetV2 in our case), you need to pay close attention to a concept call Fine Tuning.
‘Fine Tuning’, generally, is when we freeze the weights of all the layers of the pre-trained neural networks (on dataset A [e.g. ImageNet]) except the penultimate layer and train the neural network on dataset B [e.g. Fashion-MNIST], just to learn the representations on the penultimate layer. We usually replace the last (softmax) layer with another one of our choice (depending on the number of outputs we need for the new problem.[3]
In our case, we have 10 classes, so we have the following
output_tensor = Dense(10, activation='softmax')(op)
When do we use need Fine Tuning?
from keras.optimizers import Adam
model = build_model()model.compile(optimizer=Adam(),loss='categorical_crossentropy',metrics=['categorical_accuracy'])
Now we compile the model. Some compilation definitions:
The Adam optimization algorithm is an extension to stochastic gradient descent that has recently seen broader adoption for deep learning applications in computer vision and natural language processing.[4]
A loss function (categorical_crossentropy) is a measure of how good a prediction model does in terms of being able to predict the expected outcome. [5]
categorical_accuracy is a metric function that is used to judge the performance of your model.[6]
train_generator = load_data_generator(norm_x_train, encoded_y_train, batch_size=64)
model.fit_generator(generator=train_generator,steps_per_epoch=900,verbose=1,epochs=5)
It is essential to the understand the following when training any deep learning model.
Epoch is when an entire dataset is passed forward and backward through the neural network only once. [7]
Batch Size is the total number of training examples present in a single batch [7]. And it goes a long with python generate mention previously
Iterations (steps_per_epoch) is the number of batches needed to complete one epoch [7].
For a more detailed understanding read — Epoch vs Batch Size vs Iterations.
94% accuracy after 5 epochs of training, but how do would we do in the test dataset:
test_generator = load_data_generator(norm_x_test, encoded_y_test, batch_size=64)model.evaluate_generator(generator=test_generator,steps=900,verbose=1)
86% seems reasonable for the amount spent training — , 1 hour of CPU time.
Things you could try to improve the accuracy
Make sure you save the model because we are going to use in the next part
model_name = "tf_serving_keras_mobilenetv2"model.save(f"models/{model_name}.h5")
We have prepared the fashion dataset for the MobileNetV2 model. Further, we used the MobileNet model as our base model for Transfer Learning.
In part 2 we are going to prepare the model to be server by TensorFlow Serving. Then we will deploy the model to Heroku.
Questions/suggestions? — Leave them below in the comments.