From vehicle counting and smart parking systems to Autonomous Driving Assistant Systems, the demand for detecting cars, buses, and motorbikes is increasing and soon will be as common of an application as face detection. And of course, they need to run real-time to be usable in most real-world applications, because who will rely on an Autonomous Driving Assistant Systems if it cannot detect cars in front of us while driving. In this post, I will show you how you can implement your own car detector using pre-trained models that are available for download: MobileNet SSD and Xailient Car Detector. Before diving deep into the implementation, let’s gets a bit familiar and know about these models. But feel free to skip to the code and results if you wish. MobileNet SSD MobileNet is a light-weight deep neural network architecture designed for mobiles and embedded vision applications. In many real-world applications such as a self-driving car, the recognition tasks need to be carried out in a timely fashion on a computationally limited device. To fulfil this requirement, MobileNet was developed in 2017. The core layers of MobileNet is built on depth-wise separable filters. The first layer, which is a full convolution, is an exception. To learn further about MobileNet, please refer to . Around the same time (2016), SSD: Single Shot detector was also developed by Google Research team to cater the need for models that can run real-time on embedded devices without a significant trade-off in accuracy. the paper Single Shot object detection or SSD takes one single shot to detect multiple objects within the image. The SSD approach is based on a feed-forward convolutional network that produces a fixed-size collection of bounding boxes and scores for the presence of object class instances in those boxes. It’s composed of two parts: Extract feature maps, and Apply convolution filter to detect objects SSD is designed to be independent of the base network, and so it can run on top of any base networks such as VGG, YOLO, MobileNet. In the original paper, Wei Liu and team used VGG-16 network as the base to extract feature maps. To learn further about SSD, please refer to . To further tackle the practical limitations of running high resource and power-consuming neural networks on low-end devices in real-time applications, MobileNet was integrated into the SSD framework. So, when MobileNet is used as the base network in the SSD, it became the paper MobileNet SSD. The MobileNet SSD method was first trained on the COCO dataset and was then fine-tuned on PASCAL VOC reaching 72.7% mAP (mean average precision). We’ll use a MobileNet pre-trained downloaded from that was trained in Caffe-SSD framework. Download the pre-trained MobileNet SSD model and prototxt from . MobileNetSSD_deploy.caffemodel MobileSSD for Real-time Car Detection Step 1: Download pre-trained MobileNetSSD Caffe model and prototxt. https://github.com/chuanqi305/MobileNet-SSD/ here MobileNetSSD_deploy.prototxt Step 2: Implement Code to use MobileNet SSD time cv2 cv numpy np math print( )

prototxt_path = model_path = CLASSES = [ , , , , , , , , , , , , , , , , , , , , ]

net = cv.dnn.readNetFromCaffe(prototxt_path, model_path) rgb = cv.cvtColor(next_frame, cv.COLOR_BGR2RGB)
    (H, W) = next_frame.shape[: ] blob = cv.dnn.blobFromImage(next_frame, size=( , ), ddepth=cv.CV_8U)
    net.setInput(blob, scalefactor= / , mean=[ , , ])
    detections = net.forward() i np.arange( , detections.shape[ ]): confidence = detections[ , , i, ] confidence > : idx = int(detections[ , , i, ]) CLASSES[idx] != : box = detections[ , , i, : ] * np.array([W, H, W, H])
            (startX, startY, endX, endY) = box.astype( )
            
            cv.rectangle(next_frame, (startX, startY), (endX, endY), ( , , ), ) next_frame cap = cv.VideoCapture(filename) frame_width = int(cap.get(cv.CAP_PROP_FRAME_WIDTH))
    frame_height = int(cap.get(cv.CAP_PROP_FRAME_HEIGHT)) fps = size = (int(frame_width),int(frame_height))
    fourcc = cv.VideoWriter_fourcc( , , , )
    out = cv.VideoWriter()
    success = out.open( , fourcc, fps, size, )

    frame_count = t1 = time.time() :
        ret, next_frame = cap.read() ret == : frame_count += next_frame = process_frame_MobileNetSSD(next_frame) out.write(next_frame)
        
        key = cv.waitKey( ) key == : t2 = time.time() fps = str( float(frame_count / float(t2 - t1))) + print( )
    print( .format(frame_count))
    print( .format(float(t2 - t1)))
    print( .format(fps))

    cap.release()
    cv.destroyAllWindows()
    out.release() import import as import as import # load our serialized model from disk "Load MobileNetSSD model" "MobileNetSSD_deploy.prototxt" "MobileNetSSD_deploy.caffemodel" # initialize the list of class labels MobileNet SSD was trained to detect "background" "aeroplane" "bicycle" "bird" "boat" "bottle" "bus" "car" "cat" "chair" "cow" "diningtable" "dog" "horse" "motorbike" "person" "pottedplant" "sheep" "sofa" "train" "tvmonitor" : def process_frame_MobileNetSSD (next_frame) 2 # convert the frame to a blob and pass the blob through the # network and obtain the detections 300 300 1.0 127.5 127.5 127.5 127.5 # loop over the detections for in 0 2 # extract the confidence (i.e., probability) associated # with the prediction 0 0 2 # filter out weak detections by ensuring the `confidence` # is greater than the minimum confidence if 0.7 # extract the index of the class label from the # detections list 0 0 1 # if the class label is not a car, ignore it if "car" continue # compute the (x, y)-coordinates of the bounding box # for the object 0 0 3 7 "int" 0 255 0 3 return : def VehicheDetection_UsingMobileNetSSD (filename) # Write output file # Define the codec and create VideoWriter object 20 'm' 'p' '4' 'v' 'output_mobilenetssd.mov' True 0 # start timer while True # Reads the next video frame into memory if False break 1 # write frame 50 if 27 # Hit ESC key to stop break # end timer # calculate FPS ' FPS' "/MobileNetSSD Car Detector" "Frames processed: {}" "Elapsed time: {:.2f}" "FPS: {}" I ran the above code on two different devices: (Parts of this code is inspired from PyImageSearch blog.) Experiments: On my dev machine, which is Lenovo Yoga 920 with Ubuntu18.04 operating system. On low-cost, resource-constrained device, which is Raspberry Pi 3B+ with Raspbian Buster operating system. Results: On my dev machine, Lenovo Yoga, with MobileNet SSD, I got an inference speed of 23.3 FPS and when I ran RaspberryPi 3B+, the inference speed was 0.9 FPS, using all 4 cores.Pretty dramatic. This experiment shows that if you have a powerful device to run the MobileNetSSD, it performs well and will serve the real-time requirement.  But if your application is targeted to be deployed on a computationally limited IoT/embedded device such as the Raspberry Pi, this does not seem to be a good fit for a real-time application. Xailient Xailient model uses selective attention approach to perform detection. It is inspired by the working mechanism of the human eye.Xailient models are optimized to run on low power devices that are memory and resource-constrained. Now let’s see how Xailient Pre-trained Car detector performs. We’ll use a Xailient’s pre-trained car detector model downloaded from Xailient Car Detector for Real-time Car Detection Step-1: Download pre-trained Car Detector model. console.xailient.com. Step 2: Implement Code to use Xailient Car detector mode time cv2 cv numpy np math xailient dnn print( )
THRESHOLD = detectum = dnn.Detector() _, bboxes = detectum.process_frame(next_frame, THRESHOLD) i bboxes:
        cv.rectangle(next_frame, (i[ ], i[ ]), (i[ ], i[ ]), ( , , ), ) next_frame cap = cv.VideoCapture(filename) frame_width = int(cap.get(cv.CAP_PROP_FRAME_WIDTH))
    frame_height = int(cap.get(cv.CAP_PROP_FRAME_HEIGHT)) fps = size = (int(frame_width),int(frame_height))
    fourcc = cv.VideoWriter_fourcc( , , , )
    out = cv.VideoWriter()
    success = out.open( , fourcc, fps, size, )

    frame_count = t1 = time.time() :
        ret, next_frame = cap.read() ret == : frame_count += next_frame = process_frame_xailient(next_frame) out.write(next_frame)

        key = cv.waitKey( ) key == : t2 = time.time() fps = str( float(frame_count / float(t2 - t1))) + print( )
    print( .format(frame_count))
    print( .format(float(t2 - t1)))
    print( .format(fps))

    cap.release()
    cv.destroyAllWindows()
    out.release() import import as import as import from import # initialize Xailient model "Initialize Xailient model" 0.6 # Value between 0 and 1 for confidence score : def process_frame_xailient (next_frame) # Extract bbox coords # Loop through list (if empty this will be skipped) and overlay green bboxes # Format of bboxes is: xmin, ymin (top left), xmax, ymax (bottom right) for in 0 1 2 3 0 255 0 3 return : def VehicheDetection_UsingXailient (filename) # Write output file # Define the codec and create VideoWriter object 20 'm' 'p' '4' 'v' 'output_xailient.mov' True 0 # start timer while True # Reads the next video frame into memory if False break 1 # write frame 50 if 27 # Hit ESC key to stop break # end timer # calculate FPS ' FPS' "/nXailient Car Detector" "Frames processed: {}" "Elapsed time: {:.2f}" "FPS: {}" I ran the above code the same two sets of devices: Experiments: On my dev machine, which is Lenovo Yoga 920 with Ubuntu18.04 operating system. On low-cost, resource constrained device, which is Raspberry Pi 3B+ with Raspbian Buster operating system. Results: On dev machine, there is a slight improvement on inference speed when using Xailient Car Detector even when only 1 core is used. On Raspberry Pi, however, Xailient processes 8x more frames per second with a single core. Summarizing the results of both models: The video I used for this experiment was downloaded from In this post, we looked the need for real-time detection models, briefly introduced MobileNet, SSD, MobileNetSSD and Xailient, all of which were developed to solve the same challenge: to run detection models on low-powered, resource-constrained IoT/embedded devices with a right balance of speed and accuracy. We used pre-trained MobileNetSSD and Xailient car detector models and performed experiments on two separate devices: dev machine and a low-cost IoT device. Results show a slight improvement in speed of Xailient Car detector over MobileNetSSD, in the dev machine and a significant improvement in the low-cost IoT device, even when only 1 core was used. If you want to extend your car detection application to car tracking and speed estimation, a very good blog by PyImageSearch. Pexels.com Originally published in xailient.com/blog here’s References Howard, Andrew G., et al. "Mobilenets: Efficient convolutional neural networks for mobile vision applications." arXiv preprint arXiv:1704.04861 (2017). Liu, Wei, et al. "Ssd: Single shot multibox detector." European conference on computer vision. Springer, Cham, 2016. https://www.pyimagesearch.com/2019/12/02/opencv-vehicle-detection-tracking-and-speed-estimation/ https://honingds.com/blog/ssd-single-shot-object-detection-mobilenet-opencv/ https://github.com/chuanqi305/MobileNet-SSD https://mc.ai/object-detection-with-ssd-and-mobilenet/ https://machinethink.net/blog/mobilenet-v2/#:~:text=SSD%20is%20designed%20to%20be,detection%20portion%20of%20the%20network. Previously published at https://www.xailient.com/post/real-time-vehicle-detection-with-mobilenet-ssd-and-xailient

Building Real-Time Vehicle Detection System

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

The Noonification: Use This 7-Step McKinsey Framework to Solve Any Problem (1/10/2023)

The Noonification: A Taxonomy of Inclusiveness (1/11/2024)

The Noonification: What is the InfiniteNature-Zero AI Model? (11/19/2022)

10 Ways AI Has Changed Our Lives

100 Days of AI, Day 8: Experimenting With Microsoft's Semantic Kernel Using GPT-4

10 Top Open Source AI Technologies For Startups

The Noonification: Use This 7-Step McKinsey Framework to Solve Any Problem (1/10/2023)

The Noonification: A Taxonomy of Inclusiveness (1/11/2024)

The Noonification: What is the InfiniteNature-Zero AI Model? (11/19/2022)

10 Ways AI Has Changed Our Lives

100 Days of AI, Day 8: Experimenting With Microsoft's Semantic Kernel Using GPT-4

10 Top Open Source AI Technologies For Startups

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps