Before you go, check out these stories!

Hackernoon logoIntroductory Guide To Real-time Object Detection with Python by@shiv

Introductory Guide To Real-time Object Detection with Python

Author profile picture


Researchers have been studying the possibilities of giving machines the ability to distinguish and identify objects through vision for years now. This particular domain, called Computer Vision or CV, has a wide range of modern-day applications.

From being used by autonomous cars for object detection on roads to complex facial and body language recognitions that can identify possible crimes or criminal activities, CV has numerous uses in today’s world. There is no denying the fact that Object Detection is also one of the coolest applications of Computer Vision.

If you haven't yet started with Python or you are't familiar with OpenCV then refer this free Python Cheat Sheet 240+ notes and OpenCV Python Tutorial

Modern-day CV tools can easily implement object detection on images or even on live stream videos. In this article, we will look at a simple demonstration of a real-time object detector using TensorFlow. 

Setting Up A Simple Object Detector


Tensorflow >= 1.15.0

Install the latest version by executing pip install tensorflow

We are now good to go!

Setting Up The Environment

Step 1. Download or clone the TensorFlow Object Detection Code into your local machine from Github

Execute the following command in the terminal :

  • git clone

If you don’t have git installed on your machine you can choose to download the zip file from here.

Step 2. Installing the dependencies

The next step is to make sure that we have all the libraries and modules that we need to run the object detector on our machine.

Here is a list of libraries that the project depends on. (Most of the dependencies comes with Tensorflow by default)

  • Cython
  • contextlib2
  • pillow
  • lxml
  • matplotlib

In case if you find any of the module missing just execute pip install in your environment to install.

Step 3. Installing Protobuf compiler

Protobuf or Protocol buffers are Google’s language-neutral, platform-neutral, extensible mechanism for serializing structured data. It helps us define how we want our data to be structured and once structured it lets us easily write and read the structured data to and from a variety of data streams and using a variety of languages.

This is also a dependency for this project. You can learn more about Protobufs here. For now, we will install Protobuf in our machine.

Head to

Choose the appropriate version for your OS and copy the download link.

Open your terminal or command prompt, change directory to the cloned repository and execute the following commands in your terminal.

cd models/research \
wget -O \

    Note: Make sure that you decompress the file inside models/research directory

Step 4. Compiling the Protobuf compiler

Execute the following command from the research/ directory to compile the Protocol Buffer.

./bin/protoc object_detection/protos/*.proto --python_out=.

Implement Object Detection in Python

Now that we have all the dependencies installed, let’s use Python to implement Object Detection.

In the downloaded repository, change directory to

. In this directory, you will find an ipython notebook named object_detection_tutorial.ipynb. This file is a demo for Object detection which on execution will use the specified ‘
’  model to classify two test images provided in the repository.

Given below  is one of the test outputs:

There are minor changes to be introduced to detect objects from a live stream video. Make a new Jupyter notebook with in the same folder and follow along with the code given below.

In [1]:

import numpy as np
import os
import six.moves.urllib as urllib
import sys
import tarfile
import tensorflow as tf
import zipfile
from distutils.version import StrictVersion
from collections import defaultdict
from io import StringIO
from matplotlib import pyplot as plt
from PIL import Image
# This is needed since the notebook is stored in the object_detection folder.
from utils import ops as utils_ops
if StrictVersion(tf.__version__) < StrictVersion('1.12.0'):
    raise ImportError('Please upgrade your TensorFlow installation to v1.12.*.')

In [2]:

    # This is needed to display the images.
    get_ipython().run_line_magic('matplotlib', 'inline')

In [3]:

# Object detection imports
# Here are the imports from the object detection module.
from utils import label_map_util
from utils import visualization_utils as vis_util

In [4]:

# Model preparation 
# Any model exported using the `` tool can be loaded here simply by changing `PATH_TO_FROZEN_GRAPH` to point to a new .pb file.
# By default we use an "SSD with Mobilenet" model here. 
#for a list of other models that can be run out-of-the-box with varying speeds and accuracies.
# What model to download.
MODEL_NAME = 'ssd_mobilenet_v1_coco_2017_11_17'
# Path to frozen detection graph. This is the actual model that is used for the object detection.
PATH_TO_FROZEN_GRAPH = MODEL_NAME + '/frozen_inference_graph.pb'
# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = os.path.join('data', 'mscoco_label_map.pbtxt')

In [5]:

#Download Model
opener = urllib.request.URLopener()
tar_file =
for file in tar_file.getmembers():
    file_name = os.path.basename(
    if 'frozen_inference_graph.pb' in file_name:
        tar_file.extract(file, os.getcwd())

In [6]:

# Load a (frozen) Tensorflow model into memory.
detection_graph = tf.Graph()
with detection_graph.as_default():
    od_graph_def = tf.GraphDef()
    with tf.gfile.GFile(PATH_TO_FROZEN_GRAPH, 'rb') as fid:
        serialized_graph =
        tf.import_graph_def(od_graph_def, name='')

In [7]:

# Loading label map
# Label maps map indices to category names, so that when our convolution network predicts `5`,
#we know that this corresponds to `airplane`.  Here we use internal utility functions, 
#but anything that returns a dictionary mapping integers to appropriate string labels would be fine
category_index = label_map_util.create_category_index_from_labelmap(PATH_TO_LABELS, use_display_name=True)

In [8]:

def run_inference_for_single_image(image, graph):
    with graph.as_default():
        with tf.Session() as sess:
            # Get handles to input and output tensors
            ops = tf.get_default_graph().get_operations()
            all_tensor_names = { for op in ops for output in op.outputs}
            tensor_dict = {}
            for key in [
                  'num_detections', 'detection_boxes', 'detection_scores',
                  'detection_classes', 'detection_masks']:
                tensor_name = key + ':0'
                if tensor_name in all_tensor_names:
                    tensor_dict[key] = tf.get_default_graph().get_tensor_by_name(tensor_name)
            if 'detection_masks' in tensor_dict:
                # The following processing is only for single image
                detection_boxes = tf.squeeze(tensor_dict['detection_boxes'], [0])
                detection_masks = tf.squeeze(tensor_dict['detection_masks'], [0])
                # Reframe is required to translate mask from box coordinates to image coordinates and fit the image size.
                real_num_detection = tf.cast(tensor_dict['num_detections'][0], tf.int32)
                detection_boxes = tf.slice(detection_boxes, [0, 0], [real_num_detection, -1])
                detection_masks = tf.slice(detection_masks, [0, 0, 0], [real_num_detection, -1, -1])
                detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks(
                detection_masks, detection_boxes, image.shape[1], image.shape[2])
                detection_masks_reframed = tf.cast(
                tf.greater(detection_masks_reframed, 0.5), tf.uint8)
                # Follow the convention by adding back the batch dimension
                tensor_dict['detection_masks'] = tf.expand_dims(
                                    detection_masks_reframed, 0)
            image_tensor = tf.get_default_graph().get_tensor_by_name('image_tensor:0')
            # Run inference
            output_dict =, feed_dict={image_tensor: image})
            # all outputs are float32 numpy arrays, so convert types as appropriate
            output_dict['num_detections'] = int(output_dict['num_detections'][0])
            output_dict['detection_classes'] = output_dict[
            output_dict['detection_boxes'] = output_dict['detection_boxes'][0]
            output_dict['detection_scores'] = output_dict['detection_scores'][0]
            if 'detection_masks' in output_dict:
                output_dict['detection_masks'] = output_dict['detection_masks'][0]
        return output_dict

In [8]:

import cv2
cam = cv2.cv2.VideoCapture(0)
rolling = True
while (rolling):
    ret, image_np =
    image_np_expanded = np.expand_dims(image_np, axis=0)
    # Actual detection.
    output_dict = run_inference_for_single_image(image_np_expanded, detection_graph)
    # Visualization of the results of a detection.
    cv2.imshow('image', cv2.resize(image_np,(1000,800)))
    if cv2.waitKey(25) & 0xFF == ord('q'):


When you run the Jupyter notebook, the system webcam will open up and will detect all classes of objects that the original model has been trained to detect.

For more Project Ideas refer Top 25 Computer Vision Project Ideas for 2020. Happy Learning!!


Join Hacker Noon

Create your free account to unlock your custom reading experience.