A year ago, I started learning neural network with Tensorflow. The journey is not as smooth as I thought. Thanks to Andrew Ng’s online course and several books, I have a basic understand of the theory, however, when I try to apply it in real-life projects, the syntax and api of Tensorflow sometimes confused me (Maybe I wasn’t spending enough time with it).
To get early result for my project, I directly applied the Tensorflow computer vision api. However, when it came to customising models, it took me a long time get everything sorted.
Until one day, I found MXNet Gluon.
Gluon is an interface for MXNet — the deep learning framework supported by Amazon. Gluon is similar to other higher level api like Keras, Pytorch and Chainer. But it has its unique points to be loved:
[HybridBlock](https://mxnet.incubator.apache.org/api/python/gluon.html#mxnet.gluon.HybridBlock)
. So you can develop and debug easily with your jupyter notebook, and enjoy the performance optimisation when you convert it to symbolic.block
with ease.Transfer learning is a technique to reuse the learned representation of an existing model and apply it to a different but related domain.
The reason to use transfer learning is that it takes a long time and a lot of resources to train a neural network from scratch. Usually we use transfer learning in 2 ways:
Transfer learning as a topic itself can involve a long discussion. In this article, we will mainly look at using it to initialise parameters.
husky or akita
Given a photo of our favourite pet dog, is it a husky or akita.
Why this problem? It’s better to work on problem that interests you and dataset should be easy to find. (They are cute and it’s easy to find their photo)
pip install mxnet
pip install jupyter
Data source: google images by keyword husky and akita.
The dataset will be divided into 4 parts:
sample
: 16 images for development and debugtrain
: 70%, to train modelvalidation
: 15%, to diagnose model, detect overfitting/underfitting and choose models if we have multiple versionstest
: 15%, to evaluate the accuracy of the modelMXNet provides a [im2rec.py](https://github.com/apache/incubator-mxnet/blob/master/tools/im2rec.py)
script to generate rec
file, it requires a lst
file to specify file locations and labels. The notebook for generating the lst file and rec file can be found here .
After generating the dataset, your directory structure will look like this:
We will use MobileNet for our transfer learning task. MobileNet is an efficient convolutional neural network architecture. It applies 3x3 depthwise conv and a 1x1 pointwise conv to replace regular convolution layer, which reduces computation complexity.
regular convolution vs depthwise separable convolution (reference)
Thanks to the community of Gluon, we have the MobileNets pretrained on ImageNet. Let’s explore a bit on it with Jupyter notebook.
from mxnet.gluon.model_zoo.vision import mobilenet1_0pretrained_net = mobilenet1_0(pretrained=True)print(pretrained_net)
Once you run it, you will find that the model consists of two high level blocks: features
and output
. As the output shape for ImageNet version is 1000, we need to create a model with output shape as 2.
net = mobilenet1_0(classes=2)
To reuse the weights of a pretrained model, we can directly assign it to the features
block of the new model.
from mxnet import initnet.features = pretrained_net.featuresnet.output.initialize(init.Xavier())
That’s it ~
Before training the model, let’s create the data loader for our husky & akita dataset. We will use the image augmentation functions provided by MXNet to generate new samples and help avoid overfitting.
from mxnet.image import color_normalizefrom mxnet import image
train_augs = [image.ResizeAug(224),image.HorizontalFlipAug(0.5), # flip the image horizontallyimage.BrightnessJitterAug(.3), # randomly change the brightnessimage.HueJitterAug(.1) # randomly change hue]test_augs = [image.ResizeAug(224)]
def transform(data, label, augs):data = data.astype('float32')for aug in augs:data = aug(data)data = nd.transpose(data, (2,0,1))return data, nd.array([label]).asscalar().astype('float32')
Now we can create a data iterator with the augmentations we defined.
from mxnet.gluon.data.vision import ImageRecordDataset
train_rec = './data/train/dog.rec'validation_rec = './data/validation/dog.rec'
trainIterator = ImageRecordDataset(filename=train_rec,transform=lambda X, y: transform(X, y, train_augs))validationIterator = ImageRecordDataset(filename=validation_rec,transform=lambda X, y: transform(X, y, test_augs))
Let’s define our training function.
def train(net, ctx,batch_size=64, epochs=10, learning_rate=0.01, wd=0.001):
train\_data = gluon.data.DataLoader(
trainIterator, batch\_size, shuffle=True)
validation\_data = gluon.data.DataLoader(
validationIterator, batch\_size)
net.collect\_params().reset\_ctx(ctx)
net.hybridize()
loss = gluon.loss.SoftmaxCrossEntropyLoss()
trainer = gluon.Trainer(net.collect\_params(), 'sgd', {
'learning\_rate': learning\_rate, 'wd': wd})
train\_util(net, train\_data, validation\_data,
loss, trainer, ctx, epochs, batch\_size)
train_util
function can be found here. It’s usually reused, so I will not post it here.
There are some key points to highlight:
hybridize
: function that transforms your imperative model to symbolic model, and the computation graph is optimised for faster training.Trainer
: where we define our learning rate, optimisation function and weight decayWith everything in place, we can kick start the training!
After 5 epochs, we can see that the training accuracy and validation accuracy both went up.
training process
So we achieved 0.97
accuracy in training set, however, we can’t use training accuracy to evaluate a model. The reason is that this model maybe just overfitting the distribution of training data (Once I purposely overfitted training set and it goes to 99.9%, but it did poorly in real dataset).
To evaluate the performance of our model, we need to use our test dataset
that is never used in training, model selection or parameter fine-tuning.
test_data_loader = gluon.data.DataLoader(testIterator, 64)test_acc = evaluate_accuracy(test_data_loader, net)
The test accuracy is 0.93
, so our model is not as good as we thought.
The validation accuracy reached 0.88 in first epoch, feels unreal
Yes, the reason is that husky
is an existing label in ImageNet and our dataset is not very big
there’s always a gap between training accuracy and validation accuracy
It shows that we are overfitting our model. We can applied more regularisation by increase weight_decay
or we should try to get more training data.
This article shows you how simple it is to use MXNet Gluon for transfer learning. It is also a guide when you try to apply this technique to your own problem.
There are still many interesting topics I haven’t covered, such as customising block, building your own architecture and deploying models in different environments. I will try to cover them in future posts.
Source code: https://github.com/MrXu/transfer-learning-with-gluon