How you can setup your own Convolutional Neural Network? Lets try to solve that in this article. We will be working on a Image Segmentation problem which I discussed in the first part of this series.
There are a lot of libraries available for creating a Convolutional Neural Network. We will choosing Keras and Tensorflow. First question that comes to mind is:
Why these two specifically? Why not just Tensorflow?
In the Machine Learning library space there are a lot of libraries. Tensorflow, Theano, PyTorch, Caffe and Torch are few of the notable ones. A big shoutout to PyTorch by Soumith Chintala and team. You guys created an awesome library. Hopefully you guys will will take over the world (**evil grin**).
PyTorch planning to take over the world :P Andrej Karpathy has high hopes for Tensorflow
All are low level libraries. The involve in GPU or CPU accelerations and optimisations on matrix computations. So building networks using them might become challenging. Keras is a high level library. It helps you create neuron layers. It abstract all the complexities of implementing the calculation. Keras works with either Theano or Tensorflow as a backend. I chose Tensorflow as the backend since it has a better community support.
KEras & TEnsorflow (KETE) combo rocks.
Lets get our hands dirty. Don’t think about where you can do it. Your regular systems will die while training the datasets. So lets get a AWS server. If you have an insane gaming rig then feel free to go ahead in setting it up on local. We will be using a g2.2xLarge system from AWS. It has 26 GPU Cores and costs USD 0.65 / hr. Why did we choose this. It is because this is the cheapest GPU system that is available over cloud. It will perform better than most of the available hardwares at our house. Next up is which OS to use. Definitely it makes sense to use Ubuntu 16.04 LTS but wait a min. We will be using a pre-baked AMI which has a lot of tools built in. This way we can do away with most of the setup. Search for Deep Learning AMI from AWS. There are other good AMIs on Deep Learning as well, feel free to explore. We need Python 2.7 and Tensorflow installed at least in the AMIs.
GPU instance and Deep Learning AMI on AWS
After you have selected the instance type and AMI go ahead to create a key. You can use a pre built key if you already have one. For this article, we will be creating one. Assume the name of the key file is deepkey.pem. Download the key and keep it safe at some place. Launch the instance. It will take like 5- 10 mins to create the instance. In the mean time, change the permission of the key to 400. Otherwise ssh will not let you login.
chmod 400 ~/deepkey.pem
Next go to the list view of EC2 instances. From there select the instance that got created. Copy the AWS instance Public DNS. It will look something like this ec2–52–24–183–62.us-west-2.compute.amazonaws.com
**# Next lets login to the system**ssh ec2-user@ec2–52–24–183–62.us-west-2.compute.amazonaws.com -i ~/deepkey.pem
# The AMI might be a bit backdated, so it's always better to updatesudo yum update
# Install pip to get Kerassudo yum install python-pip
**# Upgrade the pip master that got installed**sudo /usr/local/bin/pip install — upgrade pip
# Install Kerassudo /usr/local/bin/pip install keras
By default Keras gets installed with Theano as the base config. We are going to use Tensorflow. So lets change that. Open ~/.keras/keras.conf and update as shown below. The file should look like the section below.
{“image_dim_ordering”: “tf”,“epsilon”: 1e-07,“floatx”: “float32”,“backend”: “tensorflow”}
I hope you have followed through all the steps without errors. Lets test our installation. Open python and then import keras to test it out. The output should look something like below.
Test Keras installation
So now you have Python, Tensorflow and Keras installed. The AMI also gives Theano and other stuff pre installed but we are not going to use them. Don’t bother to uninstall since they won’t interfere. Enough of installation, lets dig in to code.
Don’t waste time installing, spend time on learning and implementing.
We are going to train a network which we can use to classify the dogs and cats from Kaggle. Before that we will start writing a simple model. This will help you get an understanding of how Keras works. I’ll start with the code. If you notice, there are comments before each line in the code. These comments explain to some extent what is happening in that particular line of code. To run this code, you can either use your own set of cats and dogs or you can download the sample data from Kaggle. You would have to signup and join the Kaggle competition to be able to download the sample data. Here is the Kaggle link.
from keras.preprocessing.image import ImageDataGeneratorfrom keras.models import Sequentialfrom keras.layers import Convolution2D, MaxPooling2Dfrom keras.layers import Activation, Dropout, Flatten, Dense
# expected image sizeimg_width, img_height = 150, 150
# folder containing the images on which# the network will train. The train folder# has two sub folders, dogs and cats.train_data_dir = 'data/train'
# folder containing the validation samples# folder structure is same as the training foldervalidation_data_dir = 'data/validation'
# how many images to be considered for trainingtrain_samples = 2000
# how many images to be used for validationvalidation_samples = 800
# how many runs will the network make# over the training set before starting on# validationepoch = 50
# ** Model Begins **model = Sequential()model.add(Convolution2D(32, 3, 3, input_shape=(3, img_width, img_height)))model.add(Activation('relu'))model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Convolution2D(32, 3, 3))model.add(Activation('relu'))model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Convolution2D(64, 3, 3))model.add(Activation('relu'))model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())model.add(Dense(64))model.add(Activation('relu'))model.add(Dropout(0.5))model.add(Dense(1))model.add(Activation('sigmoid'))# ** Model Ends **
model.compile(loss='binary_crossentropy',optimizer='rmsprop',metrics=['accuracy'])
# this is the augmentation configuration we will use for training# we are generating a lot of transformed images so that the model# can handle variety in the real world scenariotrain_datagen = ImageDataGenerator(rescale=1./255,shear_range=0.2,zoom_range=0.2,horizontal_flip=True)
# this is the augmentation configuration we will use for testing:# only rescalingtest_datagen = ImageDataGenerator(rescale=1./255)
# this section is actually taking images from the folder# and passing on to the ImageGenerator which then# creates a lot of transformed versionstrain_generator = train_datagen.flow_from_directory(train_data_dir,target_size=(img_width, img_height),batch_size=32,class_mode='binary')
validation_generator = test_datagen.flow_from_directory(validation_data_dir,target_size=(img_width, img_height),batch_size=32,class_mode='binary')
# this is where the actual processing happens# it will take some time to run this step.model.fit_generator(train_generator,samples_per_epoch=train_samples,nb_epoch=epoch,validation_data=validation_generator,nb_val_samples=validation_samples)
model.save_weights('trial.h5')
The code is pretty self explanatory. Replace the section between “Model Beings” and “Model Ends” to use other models. You will have your very own classifier code. I will walk you guys through the code. First you import a few Keras dependencies. Then you define the image dimensions that will pass to the network. After that you tell the code where the image sets are. Both training dataset and validation dataset. After that you build the model from where the model start beings, till the model end. I am not going into the depth of the model as this is a standard VGGNet implementation. Details about the network architecture can be found in the following arXiv paper:
Very Deep Convolutional Networks for Large-Scale Image RecognitionK. Simonyan, A. ZissermanarXiv:1409.1556
Next up in the code is generating few transforms of the data. Here you would shear, stretch, skew the dataset so that the network doesn’t get overtrained. You create generators so that the code can read images from the specified folders. After that the processing starts. The system does the training and validation for the number of epoch times mentioned. Finally we save these weights so that we can use them in future without having to train the network all over again. If you have further doubts, please highlight and ask questions. I’ll try to answer them to the best of my knowledge.
The above model is a simple one and is there only for the sake of simpler explanation. Cat and dog classification might not be that successful with the amount of data we have. So we have go for transfer learning. In Transfer learning we work on models which we train for solving similar statements. We take the trained weights and reuse them to solve a different statement altogether. We train models which we pre-train on images to classify different things. Why does this work? It is because the model that we are going to use is also something which was trained to do image classification. The layers deep inside will always be able to classify generically. These will be working at the level of detecting edges and curves. Thus the term transfer learning. Where you transfer learning from a problem statement into another one. This might work good for us. But we can make it work better. Next we train the top layers. These layers actually bother about the actual elements getting classified. We train them on our training dataset. We can call this dataset, domain specific. This gives an understanding to the network, exactly what we want to classify. So the code goes as follows:
import osimport h5pyimport numpy as npfrom keras.preprocessing.image import ImageDataGeneratorfrom keras import optimizersfrom keras.models import Sequentialfrom keras.layers import Convolution2D, MaxPooling2D, ZeroPadding2Dfrom keras.layers import Activation, Dropout, Flatten, Dense
# path to the model weights files.weights_path = 'vgg16_weights.h5'top_model_weights_path = 'fc_model.h5'# dimensions of our images.img_width, img_height = 150, 150
train_data_dir = 'data/train'validation_data_dir = 'data/validation'nb_train_samples = 2000nb_validation_samples = 800nb_epoch = 50
# build the VGG16 networkmodel = Sequential()model.add(ZeroPadding2D((1, 1), input_shape=(3, img_width, img_height)))
model.add(Convolution2D(64, 3, 3, activation='relu', name='conv1_1'))model.add(ZeroPadding2D((1, 1)))model.add(Convolution2D(64, 3, 3, activation='relu', name='conv1_2'))model.add(MaxPooling2D((2, 2), strides=(2, 2)))
model.add(ZeroPadding2D((1, 1)))model.add(Convolution2D(128, 3, 3, activation='relu', name='conv2_1'))model.add(ZeroPadding2D((1, 1)))model.add(Convolution2D(128, 3, 3, activation='relu', name='conv2_2'))model.add(MaxPooling2D((2, 2), strides=(2, 2)))
model.add(ZeroPadding2D((1, 1)))model.add(Convolution2D(256, 3, 3, activation='relu', name='conv3_1'))model.add(ZeroPadding2D((1, 1)))model.add(Convolution2D(256, 3, 3, activation='relu', name='conv3_2'))model.add(ZeroPadding2D((1, 1)))model.add(Convolution2D(256, 3, 3, activation='relu', name='conv3_3'))model.add(MaxPooling2D((2, 2), strides=(2, 2)))
model.add(ZeroPadding2D((1, 1)))model.add(Convolution2D(512, 3, 3, activation='relu', name='conv4_1'))model.add(ZeroPadding2D((1, 1)))model.add(Convolution2D(512, 3, 3, activation='relu', name='conv4_2'))model.add(ZeroPadding2D((1, 1)))model.add(Convolution2D(512, 3, 3, activation='relu', name='conv4_3'))model.add(MaxPooling2D((2, 2), strides=(2, 2)))
model.add(ZeroPadding2D((1, 1)))model.add(Convolution2D(512, 3, 3, activation='relu', name='conv5_1'))model.add(ZeroPadding2D((1, 1)))model.add(Convolution2D(512, 3, 3, activation='relu', name='conv5_2'))model.add(ZeroPadding2D((1, 1)))model.add(Convolution2D(512, 3, 3, activation='relu', name='conv5_3'))model.add(MaxPooling2D((2, 2), strides=(2, 2)))
# load the weights of the VGG16 networks# (trained on ImageNet, won the ILSVRC competition in 2014)# note: when there is a complete match between your model definition# and your weight savefile, you can simply call model.load_weights(filename)assert os.path.exists(weights_path), 'Model weights not found (see "weights_path" variable in script).'f = h5py.File(weights_path)for k in range(f.attrs['nb_layers']):if k >= len(model.layers):# we don't look at the last (fully-connected) layers in the savefilebreakg = f['layer_{}'.format(k)]weights = [g['param_{}'.format(p)] for p in range(g.attrs['nb_params'])]model.layers[k].set_weights(weights)f.close()print('Model loaded.')
# build a classifier model to put on top of the convolutional modeltop_model = Sequential()top_model.add(Flatten(input_shape=model.output_shape[1:]))top_model.add(Dense(256, activation='relu'))top_model.add(Dropout(0.5))top_model.add(Dense(1, activation='sigmoid'))
# note that it is necessary to start with a fully-trained# classifier, including the top classifier,# in order to successfully do fine-tuningtop_model.load_weights(top_model_weights_path)
# add the model on top of the convolutional basemodel.add(top_model)
# set the first 25 layers (up to the last conv block)# to non-trainable (weights will not be updated)for layer in model.layers[:25]:layer.trainable = False
# compile the model with a SGD/momentum optimizer# and a very slow learning rate.model.compile(loss='binary_crossentropy',optimizer=optimizers.SGD(lr=1e-4, momentum=0.9),metrics=['accuracy'])
# prepare data augmentation configurationtrain_datagen = ImageDataGenerator(rescale=1./255,shear_range=0.2,zoom_range=0.2,horizontal_flip=True)
test_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(train_data_dir,target_size=(img_height, img_width),batch_size=32,class_mode='binary')
validation_generator = test_datagen.flow_from_directory(validation_data_dir,target_size=(img_height, img_width),batch_size=32,class_mode='binary')
# fine-tune the modelmodel.fit_generator(train_generator,samples_per_epoch=nb_train_samples,nb_epoch=nb_epoch,validation_data=validation_generator,nb_val_samples=nb_validation_samples)
The weights for VGG16 can be acquired from my Github gist. You can also get the fc_model weight file by running this piece of code on your dataset. You can use the same set of weights from the VGG16 link shared. You can tweak the number of epoch to get a better learning, but don’t go overboard as that might lead to overfitting. I have been using this technique on a lot of practical use cases at my workplace. One use case is distinguishing between prescriptions and non prescriptions. We use the exact same model trained on ImageNet data of cats and dogs to classify prescriptions. I hope you guys can use it on practical cases in the real world. Do respond about any interesting case that you have solved using this method.
This article takes content heavily from a blog post from Keras. Do follow me on twitter and you can also signup for a small and infrequent mailing list that I maintain. If you liked this article, please hit the ❤ button to recommend it. This will help other Medium users find it.