Before I came to Germany, I was fascinated by stories about how there are no speed limits on German roads. To my disappointment, speed limits do exist almost everywhere. There are only certain areas on the highway (autobahn) which are designated as no speed limit zones, where cars and other vehicles can test the limits of their engineering.
What if a driver is speeding and he suddenly enters a zone where he has to follow a certain speed limit? One way, and the most common way, of solving this is that the driver applies brakes manually and reduces the accelaration. But hey, we’re in 2017 and computers can now recognize more than cats and dogs! So let’s try to make our computers recognize speed limit signs automatically!
I based my implementation based on a tutorial called Introduction to Convolutional Neural Networks using TensorFlow and Keras by Oliver Zeigermann. It was a really good talk and I liked that he explored the implementation from the perspective of an engineer and not that of a researcher. I have been actively trying both approaches, where I learn some math and I also learn how to implement it using various libraries.
One of the key things that I discovered in this tutorial was Microsoft Azure notebooks. During his tutorial, Zeigermann asked the audience to either run notebooks locally or on the cloud. He said he preferred the cloud since the cloud machine was more powerful and had more specs than his machine. This was true of my machine as well. Prior to this, I had always worked on Jupyter notebooks on my CPU and have had issues with my machine getting heated quickly. So I decided to try out Azure Notebooks and I had a really good experience. They are fast and can be set up in no time. And they are free! I had to search for the catch, but there isn’t one! For people just starting out in python and data science, Azure Notebooks can be help them get off the ground quickly.
The first task for implementing any data science or AI project is to find a suitable dataset. The dataset for the tutorial was provided by the presenter. He mentions in the video that the tutorial is based on a similar project by Waleed Abdulla. I am assuming he obtained the data from the other project. In that case, it is based on a dataset called Belgian Traffic Sign Dataset.
After obtaining the data, we still need to prepare our data in a form that can be used by the machine learning models. The images in the dataset are in a .ppm format. We convert this into a form usable for analysis using the library skimage. Here’s the function that performs the conversion.
import osimport skimage.data
def load_data(data_dir):directories = [d for d in os.listdir(data_dir)if os.path.isdir(os.path.join(data_dir, d))]labels = []images = []for d in directories:label_dir = os.path.join(data_dir, d)file_names = [os.path.join(label_dir, f)for f in os.listdir(label_dir) if f.endswith(".ppm")]for f in file_names:images.append(skimage.data.imread(f))labels.append(int(d))return images, labels
After we load our data using the load_data function, we try to see how our images and labels are organized. We write a function that allows us to visualize our data.
import matplotlibimport matplotlib.pyplot as plt
def display_images_and_labels(images, labels):"""Display the first image of each label."""unique_labels = set(labels)plt.figure(figsize=(15, 15))i = 1for label in unique_labels:# Pick the first image for each label.image = images[labels.index(label)]plt.subplot(8, 8, i) # A grid of 8 rows x 8 columnsplt.axis('off')plt.title("Label {0} ({1})".format(label, labels.count(label)))i += 1_ = plt.imshow(image)plt.show()display_images_and_labels(images, labels)
This results in the following.
Speed limit images with their respective labels and counts
We see that our dataset has 6 different categories of speed limit images. We have images that display 30, 50, 70, 80, 100 and 120 with 79, 81, 68, 53, 41 and 57 samples of each category respectively. This isn’t a very large dataset, but it works well enough for our problem.
Having observed the way our images are organized, let’s now look at the shape of each image, with its minimum and maximum value of RGB colors.
for image in images[:5]:print("shape: {0}, min: {1}, max: {2}".format(image.shape, image.min(), image.max()))
Output:
shape: (21, 22, 3), min: 27, max: 248shape: (23, 23, 3), min: 10, max: 255shape: (43, 42, 3), min: 11, max: 254shape: (61, 58, 3), min: 3, max: 255shape: (28, 27, 3), min: 8, max: 65
As we can clearly see from the output that the shape of each image varies along with large variation in the distribution of the minimum and maximum RGB values. Our neural network models expect all of our images to have the same shape. While resizing, let us also normalize our minimum and maximum RGB values between 0 and 1. We use skimage’s transform function to transform the images into 64x64 pixels with 3 channels for RGB.
import skimage.transformimages64 = [skimage.transform.resize(image, (64,64)) for image in images]
Finally we perform one last step of data preparation where we convert our images and labels to numpy arrays. We also use to_categorical function to convert our labels to different different categorical arrays which contain 1 if that particular category is represented else 0.
import numpy as npfrom keras.utils.np_utils import to_categorical
y = np.array(labels)X = np.array(images64)num_categories = 6y = to_categorical(y, num_categories)
Now after this stage, Zeigermann went into explaining how he experimented by training a simple keras model which had good performance on training data but didn’t do as well on testing data. He also discusses why RMSProp seems to be the better optimizer when compared to other methods like SGD, Adagrad etc.
Finally the experiments lead him to Convolutional Neural Networks. Convolutional Neural Networks or CNN are a type of neural networks that explicitly assume that the inputs are images. Much of the deep learning revolution in computer vision has been lead by CNNs. Here’s an example of their architecture.
CNN Architecture. Source: http://cs231n.github.io/convolutional-networks/
Since we are focusing on the application side of CNNs here, I will not go into the details of what each layer is and how we can tune the hyperparameters. We run our CNNs by using the Keras library. Keras is a high level API built on Tensorflow and Theano (Theano is no longer maintained). Let us look into the code now.
from keras.models import Modelfrom keras.layers import Dense, Flatten, Input, Dropoutfrom keras.layers import Convolution2D, MaxPooling2Dfrom sklearn.model_selection import train_test_split
inputs = Input(shape=(64,64,3))
x = Convolution2D(32, 4,4, border_mode='same', activation='relu')(inputs)x = Convolution2D(32, 4,4, border_mode='same', activation='relu')(x)x = Convolution2D(32, 4,4, border_mode='same', activation='relu')(x)x = MaxPooling2D(pool_size=(2,2))(x)x = Dropout(0.25)(x)
# one more blockx = Convolution2D(64, 4, 4, border_mode='same', activation='relu')(x)x = Convolution2D(64, 4, 4, border_mode='same', activation='relu')(x)x = MaxPooling2D(pool_size=(2, 2))(x)x = Dropout(0.25)(x)
x = Flatten()(x)# fully connected, 256 nodesx = Dense(256, activation='relu')(x)x = Dropout(0.50)(x)
# softmax activation, 6 categoriespredictions = Dense(6, activation='softmax')(x)
model = Model(input=inputs, output=predictions)model.compile(optimizer='rmsprop',loss='categorical_crossentropy',metrics=['accuracy'])
model.fit(X_train, y_train, nb_epoch=50, batch_size=100)
The model now runs for 50 epochs with a batch size of 100. It trains on 80% of total data which was split earlier using sklearn’s train_test_split function. Here’s what we get after running for 50 epochs.
CNN output on training data after 50 epochs
The model converges with a loss of 0.2489 and an accuracy of about 92%. Not bad for a model without any efforts on hyperparameter tuning! Let us look at the accuracy of the model on the training dataset.
train_loss, train_accuracy = model.evaluate(X_train,y_train, batch_size=32)train_loss, train_accuracy
We get an accuracy of 83.49% on the training data with a loss of 0.41. Let’s look at the performance on our testing dataset.
test_loss, test_accuracy = model.evaluate(X_test, y_test, batch_size=32)test_loss, test_accuracy
We get an accuracy of about 68 % with a loss of 1.29. This is not that great. But there is something interesting happening here, which was mentioned by Zeigermann as well. This was my second run of the model. On my previous run, I got a testing accuracy of about 95%. Clearly, the model outputs are not deterministic. Perhaps there is a way to make it more stable. I will have to read more on that.
Finally, let us visualize some of the predictions of our model on the testing dataset. We randomly sample 10 images from the testing dataset and use matplotlib to plot the predictions. We show the predicted class along with it’s ground truth value. In case the prediction of our model is correct, we color it green otherwise red.
import randomrandom.seed(3)sample_indexes = random.sample(range(len(X_test)), 10)sample_images = [X_test[i] for i in sample_indexes]sample_labels = [y_test[i] for i in sample_indexes]
#get the indices of the array using argmaxground_truth = np.argmax(sample_labels, axis=1)X_sample = np.array(sample_images)prediction = model.predict(X_sample)
predicted_categories = np.argmax(prediction, axis=1)
# Display the predictions and the ground truth visually.def display_prediction (images, true_labels, predicted_labels):fig = plt.figure(figsize=(10, 10))for i in range(len(true_labels)):truth = true_labels[i]prediction = predicted_labels[i]plt.subplot(5, 2,1+i)plt.axis('off')color='green' if truth == prediction else 'red'plt.text(80, 10, "Truth: {0}\nPrediction: {1}".format(truth, prediction),fontsize=12, color=color)plt.imshow(images[i])
display_prediction(sample_images, ground_truth, predicted_categories)
As you can see below, our model correctly predicts 8/10 randomly sampled images from the test dataset. Not bad! The previous time when I had got an accuracy of 95% on the test dataset, it predicted 10/10 images correctly!
This was a great project! I definitely did learn a lot. Thanks to Oliver Zeigermann for his tutorial (and the code) and Waleed Abdulla for his blog-post. I am looking forward to doing more walk-throughs and exploring this amazingly crazy world of Artificial Intelligence!