“Close up of modern metal sculpture of human face at Canary Wharf” by Clem Onojeghuo on Unsplash
The following contains notes and python code, compiled from a lecture given by Jeremy Howard, co-founder of fast.ai. Many thanks to Jeremy and Rachel Thomas for building fast.ai and the fast.ai library, a high-level wrapper for PyTorch. The following code is based on the fast.ai libary. For more information, watch the first lesson (of seven) in Practical Deep Learning For Coders, Part 1, which is publically available free of charge. If you are keen to learning deep learning, you won’t regret it!
We’re going to use convolutional neural networks (CNNs) to allow our computer to see — something that is only possible thanks to deep learning.
We’re going to try to create a deep learning CNN model based on data from a previous Kaggle competion called Dogs vs Cats. There are 25,000 labelled dog and cat photos available for training, and 12,500 in the test set that we have to try to label for this competition. According to the Kaggle web-site, when this competition was launched (end of 2013):
“State of the art: The current literature suggests machine classifiers can score above 80% accuracy on this task”.
So if we can beat 80%, then we will be at the cutting edge as of 2013! Ok, let’s going.
Put these at the top of every notebook, to get automatic reloading and inline plotting:
%reload_ext autoreload%autoreload 2%matplotlib inline
Here we import the libraries we need:
from fastai.imports import *from fastai.transforms import *from fastai.conv_learner import *from fastai.model import *from fastai.dataset import *from fastai.sgdr import *from fastai.plots import *
PATH = "data/dogscats/"
We set the size below to 224
because resnet uses 224 x 224 image sizes. More on this later:
sz=224
The dataset is available at http://files.fast.ai/data/dogscats.zip. You can download it directly on your server by running the following line in your terminal. wget http://files.fast.ai/data/dogscats.zip
. You should put the data in a subdirectory of your Jupyter notebook's directory, called data/
.
The fast.ai library will assume that you have train and valid directories. It also assumes that each dir will have subdirs for each class you wish to recognize (in this case, ‘cats’ and ‘dogs’).
Below will show the contents of ‘PATH’ folder; !
means run in bash.
!ls {PATH}
!ls {PATH}valid
The following code shows what’s inside the validation cats folder. This is a standard way to share or provide image classification files.
files = !ls {PATH}valid/cats | headfiles
# Example: show first cat image in the cats folder
img = plt.imread(f'{PATH}valid/cats/{files[0]}') # This is formatting string
plt.imshow(img);
I am cute
Below code shows what the raw data looks like. This is called a rank 3 tensor
aka a 3 x 3 matrix. Each cell shows red, green, and blue pixel values btwn 0 and 255.
img.shape
Output:(198, 179, 3)
img[:4,:4]
Output:array([[[ 29, 20, 23],[ 31, 22, 25],[ 34, 25, 28],[ 37, 28, 31]],
\[\[ 60, 51, 54\],
\[ 58, 49, 52\],
\[ 56, 47, 50\],
\[ 55, 46, 49\]\],
\[\[ 93, 84, 87\],
\[ 89, 80, 83\],
\[ 85, 76, 79\],
\[ 81, 72, 75\]\],
\[\[104, 95, 98\],
\[103, 94, 97\],
\[102, 93, 96\],
\[102, 93, 96\]\]\], dtype=uint8)
We’re going to use a pre-trained model, that is, a model created by some one else to solve a different problem. Instead of building a model from scratch to solve a similar problem, we’ll use a model trained on ImageNet (1.2 million images and 1000 classes) as a starting point. The model is a Convolutional Neural Network (CNN), a type of Neural Network that builds state-of-the-art models for computer vision.
We will be using the resnet34 as our pre-trained model. resnet34 is a version of the model that won the 2015 ImageNet competition. Here is more info on resnet models.
Here’s how to train and evalulate a dogs vs cats model in 4 lines of code, and under 21 seconds. Under the syntax hood below is code/wrapper written by fast.ai. The fast.ai library is updated regularly and keeps up with cuttting-edge deep leaerning research. So fast.ai makes sure that best practices are always used. In turn, this works supper fast (e.g., 10–60 seconds depending on GPU) because it sits on top of Pytorch, which is a very flexible library written by facebook.
arch=resnet34
data = ImageClassifierData.from_paths(PATH, tfms=tfms_from_model(arch, sz))
learn = ConvLearner.pretrained(arch, data, precompute=True)
learn.fit(0.01, 3)
Output:
100%|██████████| 360/360 [00:57<00:00, 6.24it/s]100%|██████████| 32/32 [00:05<00:00, 6.04it/s]
epoch trn_loss val_loss accuracy0 0.045726 0.028603 0.9892581 0.039685 0.026488 0.9902342 0.041631 0.03259 0.990234
[0.032590486, 0.990234375]
data
object contains the training and validation data.
ImageClassifierData.from_paths
reads in images and their labels given as sub-folder names:
path
: a root path of the data (used for storing trained models, precomputed values, etc)bs
: batch size. Default 64.tfms
: transformations (for data augmentations). e.g. output of tfms_from_model
. Default 'None'trn_name
: a name of the folder that contains training images. Default 'train'val_name
: a name of the folder that contains validation images. Default 'valid'test_name
: a name of the folder that contains test images. Default 'None'num_workers
: number of workers. Default '8'learn
object contains the model.
ConvLearner.pretrained
:
f
: arch. E.g., resnet34data
: previously defined data objectprecompute
: include/exclude precomputed activations. Default 'False'learn.fit
trains/fits the model through a given learning rate and epochs. In this instance, it is going to do 3 epochs with a 0.01 learning rate, meaning it is going to look at each image three times in total.
trn_loss
and val_loss
are the values of the cross-entropy loss function.
How good is this model? Well, prior to this competition, the state of the art was 80% accuracy. But the competition resulted in a huge jump to 99.0% accuracy, with the author of a popular deep learning library winning the competition. Extraordinarily, less than 4 years later, we can now beat that result in seconds!
Above model, can be used on any kind of pictures, as long as it is of things that people normally take photos of. However, things like pathology pictures or CT scans won’t do well using this model. There are some minor things we need to do to make those work. This will be covered in a subsequent notebook. Stay tuned!
As well as looking at the overall metrics, it’s also a good idea to look at examples of some of the predictions:
We will look at all of this shortly. But first, if we ever want to know about the data, we can look inside with a few of the following methods:
_# Pull the label (dependent variable)_data.val_y
_# Pull the data classes_data.classes
_# Pull the y log predictions_log_preds = learn.predict()log_preds.shape
_# Pull first 10 predictions_log_preds[:10]
_# from log probabilities to 0 or 1_preds = np.argmax(log_preds, axis=1)
_# pr(dog); i.e., anti log_probs = np.exp(log_preds[:,1])
Plotting functions
def rand_by_mask(mask): return np.random.choice(np.where(mask)[0], 4, replace=False)
def rand_by_correct(is_correct): return rand_by_mask((preds == data.val_y)==is_correct)
def plot_val_with_title(idxs, title):imgs = np.stack([data.val_ds[x][0] for x in idxs])title_probs = [probs[x] for x in idxs]print(title)return plots(data.val_ds.denorm(imgs), rows=1, titles=title_probs)
def plots(ims, figsize=(12,6), rows=1, titles=None):f = plt.figure(figsize=figsize)for i in range(len(ims)):sp = f.add_subplot(rows, len(ims)//rows, i+1)sp.axis('Off')if titles is not None: sp.set_title(titles[i], fontsize=16)plt.imshow(ims[i])
def load_img_id(ds, idx): return np.array(PIL.Image.open(PATH+ds.fnames[idx]))
def plot_val_with_title(idxs, title):imgs = [load_img_id(data.val_ds,x) for x in idxs]title_probs = [probs[x] for x in idxs]print(title)return plots(imgs, rows=1, titles=title_probs, figsize=(16,8))
def most_by_mask(mask, mult):idxs = np.where(mask)[0]return idxs[np.argsort(mult * probs[idxs])[:4]]
def most_by_correct(y, is_correct):mult = -1 if (y==1)==is_correct else 1return most_by_mask(((preds == data.val_y)==is_correct) & (data.val_y == y), mult)
Anything greater than 0.5 is dog; anything less than 0.5 is cat:
plot_val_with_title(rand_by_correct(True), "Correctly classified")
Ahhh
Anything greater than 0.5 is dog; anything less than 0.5 is cat:
plot_val_with_title(rand_by_correct(True), "Correctly classified")
Dogs?
(i.e., those with highest probability that are correct)
plot_val_with_title(most_by_correct(0, True), "Most correct cats")
plot_val_with_title(most_by_correct(1, True), "Most correct dogs")
I like the grey ones
(i.e., those with highest probability that are incorrect)
plot_val_with_title(most_by_correct(0, False), "Most incorrect cats")
plot_val_with_title(most_by_correct(1, False), "Most incorrect dogs")
(i.e., those with probability closest to 0.5)
# probabilites are closest to 0.5
most_uncertain = np.argsort(np.abs(probs -0.5))[:4]plot_val_with_title(most_uncertain, "Most uncertain predictions")
weird
nb: The images above that are dimensionally wrong (e.g., the rectangular ones) are skewing the results. We take care of this using a technique called data augmentation
. More on that in a later post.
Pro tip: if you want to make the model better, you might want to take advantage of why it is doing well and fix the things that it is doing badly. E.g., in another Jupyter notebook, try removing images that are just skewing the data, like cartoons, etc. If you figure out how to do this, let me know :)
See GitHub link below for all of above work.
Thanks for reading!
Stephen-Rimac/Image-classification-with-CNNs_Contribute to Image-classification-with-CNNs development by creating an account on GitHub._github.com