Welcome to the Part 2 of fast.ai. where we will deal with Single Object Detection . Before we start , I would like to thank Jeremy Howard and Rachel Thomas for their efforts to democratize AI.
This part assumes you to have good understanding of the Part 1. Here are the links , feel free to explore the first Part of this Series in the following order.
This blog post has been divided into two parts.
The dataset we will be using is PASCAL VOC (2007 version).
Lets get our hands dirty with the coding part.
<a href="https://medium.com/media/586dd25078ac6b3ad2f46329d0baeb14/href">https://medium.com/media/586dd25078ac6b3ad2f46329d0baeb14/href</a>
As in the case of all Machine Learning projects , there are three things to focus on :-
Step 1 will focus on getting the data in proper shape so as to do analysis on top of it.
STEP 1:- It involves classifying and localizing the largest object in each image. The step involves:-
1.1. INSTALL THE PACKAGES
Lets install the packages and download the data using the commands as shown below.
# Install the packages
# !pip install https://github.com/fastai/fastai/archive/master.zip
!pip install fastai==0.7.0
!pip install torchtext==0.2.3
!pip install opencv-python
!apt update && apt install -y libsm6 libxext6
!pip3 install http://download.pytorch.org/whl/cu80/torch-0.3.0.post4-cp36-cp36m-linux_x86_64.whl
!pip3 install torchvision
# Download the Data to the required folder
!mkdir data
!wget http://pjreddie.com/media/files/VOCtrainval_06-Nov-2007.tar -P data/
!wget https://storage.googleapis.com/coco-dataset/external/PASCAL_VOC.zip -P data/
!tar -xf data/VOCtrainval_06-Nov-2007.tar -C data/
!unzip data/PASCAL_VOC.zip -d data/
!rm -rf data/PASCAL_VOC.zip data/VOCtrainval_06-Nov-2007.tar
%matplotlib inline
%reload_ext autoreload
%autoreload 2
!pip install Pillow
from fastai.conv_learner import *
from fastai.dataset import *
from pathlib import Path
import json
import PIL
from matplotlib import patches, patheffects
Lets check what’s present in our data. We will be using the python 3 standard library pathlib for our paths and file access .
1.2. KNOW YOUR DATA USING Pathlib OBJECT.
The data folder contains different versions of Pascal VOC .
PATH = Path('data')
list((PATH/'PASCAL_VOC').iterdir())
# iterdir() helps in iterating through the directory of PASCAL_VOC
The PATH is an object oriented access to directory or file. Its a part of python library pathlib. To know how to leverage the use of pathlib function do a PATH.TAB .
Since we will be working only with pascal_train2007.json , Let’s check out the content of this file.
training_json = json.load((PATH/'PASCAL_VOC''pascal_train2007.json').open())
training_json.keys()
This file contains the Images , Type , Annotations and Categories. For making use of Tab Completion , save it in appropriate variable name.
IMAGES,ANNOTATIONS,CATEGORIES = ['images', 'annotations', 'categories']
Lets see in detail what each of these has in detail:-
For easy access to all of these , lets convert the important stuffs into dictionary comprehension and list comprehension.
FILE_NAME,ID,IMG_ID,CATEGORY_ID,BBOX = 'file_name','id','image_id','category_id','bbox'
categories = {o[ID]:o['name'] for o in training_json[CATEGORIES]}
# The categories is a dictionary having class and an ID associated with # it.
# Lets check out all of the 20 categories using the command below
categories
training_filenames = {o[ID]:o[FILE_NAME] for o in training_json[IMAGES]}
training_filenames
# contains the id and the filename of the images.
training_ids = [o[ID] for o in training_json[IMAGES]]
training_ids
# This is a list comprehension.
Now , lets check out the folder where we have all the images .
list((PATH/'VOCdevkit'/'VOC2007').iterdir())
# The JPEGImages in red is the one with all the Images in it.
JPEGS = 'VOCdevkit/VOC2007/JPEGImages'
IMG_PATH = PATH/JPEGS
# Set the path of the Images as IMG_PATH
list(IMG_PATH.iterdir())[:5]
# Check out all the Images in the Path
Note:- Each image has an unique id associated with it as shown above.
1.3. BOUNDING BOX
The main objective here is to bring our bounding box to proper format such that which can be used for plotting purpose. The bounding box coordinates are present in the annotations.
A bounding box is a box around the objects in an Image.
Earlier the Bounding box coordinates represents (column, rows, height, width). Check out the image below.
After passing the coordinates via hw_bb() function which is used to convert height_width to bounding_box, we get the coordinates of the top left and bottom right corner and in the form of (rows and columns).
def hw_bb(bb): return np.array([bb[1], bb[0], bb[3]+bb[1]-1, bb[2]+bb[0]-1])
Now , we will create a dictionary which has the image id as the key and its bounding box coordinate and the category_id as the values.
training_annotations = collections.defaultdict(lambda:[]) for o in training_json[ANNOTATIONS]: if not o['ignore']: bb = o[BBOX] bb = hw_bb(bb) training_annotations[o[IMG_ID]].append((bb,o[CATEGORY_ID]))
In the above chunk of code, we are going through all the annotations , and considering those which doesn’t say ignore . After that we append it to a dictionary where the values are the Bounding box (bbox )and the category_id(class) to its corresponding image id which is the key.
One problem is that if there is no dictionary item that exist yet, then we can’t append any list of bbox and class to it . To resolve this issue we are making use of Python’s defaultdict using the below line of code.
training_annotations = collections.defaultdict(lambda:[])
Its a dictionary but if we are accessing a key that isn’t present , then defaultdict magically creates one and sets itself equals to the value that the function returns . In this case its an empty list. So every time we access the keys in the training annotations and if it doesn’t exist , defaultdict makes a new empty list and we can append to it.
SUMMARY OF THE USEFUL IMAGE RELATED INFORMATION
Lets get into the details of the annotations of a particular image. As we can see in the snapshot below .
Some libraries take VOC format bounding boxes, so the bb_hw() function helps in resetting the dimension into original format:
bb_voc = [155, 96, 196, 174]
bb_fastai = hw_bb(bb_voc)
# We won't be using the below function for now .
def bb_hw(a): return np.array([a[1],a[0],a[3]-a[1]+1,a[2]-a[0]+1])
1.4. PLOTTING OF THE BOUNDING BOX AROUND THE OBJECT
Now we will focus on creating a bounding box around an image . For that we will create plots in steps or in separate functions . Each step serves a definite purpose towards creating a plot. Lets see the purpose of each and every step . Post that we will focus on the flow .
<a href="https://medium.com/media/b1dc0c5f2173a92ff9a9e3a2f9af368f/href">https://medium.com/media/b1dc0c5f2173a92ff9a9e3a2f9af368f/href</a>
The below code is used to get the axis on top of which we will plot a image .
def show_img(im, figsize=None, ax=None):
if not ax: fig,ax = plt.subplots(figsize=figsize) ax.imshow(im) ax.get_xaxis().set_visible(False) ax.get_yaxis().set_visible(False) return ax
Draw a rectangle around the object in the image using the following code.
def draw_rect(ax, b): patch = ax.add_patch(patches.Rectangle(b[:2], *b[-2:], fill=False, edgecolor='white', lw=2)) draw_outline(patch, 4)
The draw_outline() is used to make the text visible regardless of the background. So here we are using white text with black outline or vice-versa.
def draw_outline(o, lw): o.set_path_effects([patheffects.Stroke( linewidth=lw, foreground='black'), patheffects.Normal()])
Write the class or category to which the image belongs, in form of text near the bounding box.
def draw_text(ax, xy, txt, sz=14): text = ax.text(*xy, txt, verticalalignment='top', color='white', fontsize=sz, weight='bold') draw_outline(text, 1)
Here is the flow on how to create a bounding box around the object in an image.
ax = show_img(im)
b = bb_hw(im0_a[0])
draw_rect(ax, b)
draw_text(ax, b[:2], categories[im0_a[1]])
Let’s wrap up the flow steps, in functions as shown below:-
def draw_im(im, ann):
ax = show_img(im, figsize=(16,8))
for b,c in ann: # Destructure the annotations into bbox and class
b = bb_hw(b) # Convert it into appropriate coordinates
draw_rect(ax, b) # Draw rectangle bbox around it.
draw_text(ax, b[:2], categories[c], sz=16)
# Write some text around it
def draw_idx(i):
im_a = training_annotations[i] # Grab the annotations with the help of the image id.
im = open_image(IMG_PATH/training_filenames[i]) # Open that Image
print(im.shape) # Print its shape
draw_im(im, im_a) # Call the draw and print its text
draw_idx(17)
# Draw an image of a particular index.
Let’s wrap up of the flow in detail here:-
This is how we are locating the objects in the Images. The next step is to Classify the Largest Item in the Image. We will discuss the next step in detail in the next blog Post .
A Big Shout-out to Anwesh Satapathy and Sharwon Pius for illustrating this problem in a simple way . Please check out his github Repo and the simplified roadmap to Single object Detection .
<a href="https://medium.com/media/2b59a46fe51290a60be31d0a7b0c59db/href">https://medium.com/media/2b59a46fe51290a60be31d0a7b0c59db/href</a>
If you have any queries feel free to shoot them @ashiskumarpanda on twitter or please check it out on fastai forums.
If you see the 👏 👏 button and you like this post , feel free to do the needful 😄😄😄😄😄 .
It is a really good feeling to get appreciated by Jeremy Howard. Check out what he has to say about the Fast.ai Part 1 blog of mine . Make sure to have a look at it.
body[data-twttr-rendered="true"] {background-color: transparent;}.twitter-tweet {margin: auto !important;}
Great summary of the 2018 version of https://t.co/aQsW5afov6 - thanks for sharing @ashiskumarpanda ! https://t.co/jVUzpzp4EO
function notifyResize(height) {height = height ? height : document.documentElement.offsetHeight; var resized = false; if (window.donkey && donkey.resize) {donkey.resize(height); resized = true;}if (parent && parent._resizeIframe) {var obj = {iframe: window.frameElement, height: height}; parent._resizeIframe(obj); resized = true;}if (window.location && window.location.hash === "#amp=1" && window.parent && window.parent.postMessage) {window.parent.postMessage({sentinel: "amp", type: "embed-size", height: height}, "*");}if (window.webkit && window.webkit.messageHandlers && window.webkit.messageHandlers.resize) {window.webkit.messageHandlers.resize.postMessage(height); resized = true;}return resized;}twttr.events.bind('rendered', function (event) {notifyResize();}); twttr.events.bind('resize', function (event) {notifyResize();});if (parent && parent._resizeIframe) {var maxWidth = parseInt(window.frameElement.getAttribute("width")); if ( 500 < maxWidth) {window.frameElement.setAttribute("width", "500");}}