We’re building a data-first marketplace, one where data and assets can be shared and traded.
The marketplace will contain all that this report contains (and much more for a lot more datasets). The report was created to help readers quickly find common resources and/or assets for a given dataset and a specific task, in this case dataset=COCO, task=object detection. I’m open to suggestions, questions, and criticism. Please email me or message me to start a conversation.
I have broken up the report into the following blogs:
- Part 1: COCO Summary Card. Each link will take you to the longer report where you can learn more. The next 3 parts represent a specific section in the report
- Part 2 (this one): About COCO and examples and tutorials (companies / platforms / articles / more), including tools and platforms used to work with COCO (or object detection tasks): FiftyOne, DataTorch, Know Your Data (KYD), OpenCV, OpenVINO, CVAT, Roboflow, SuperAnnotate, OpenMMLab, Coral, Amazon, Facebook, Google, Microsoft, NVIDIA, Weights and Biases, Other (PyImageSearch, Immersive Limit, Tensorflow, Viso.ai)
- Part 3: Process - This part is about the tools and platforms that can be used for different phases of data preparate or data processing involved in vision, object detection, and specifically COCO-related tasks. It will also discuss synthetic data and data quality.
- Part 4: Models - This part is about a quick introduction to some pre-trained models and some corresponding readings.
If you have feedback please review this link (Marketplace — Coming Soon | ReasoNets) and email me at [email protected] Looking forward to starting a conversation.
Year released: The first version of MS COCO dataset was released in 2014.
License: Creative Commons Attribution 4.0 License.
COCO is a large-scale object detection, segmentation, and captioning dataset. COCO has several features
- Object segmentation
- Recognition in context
- Superpixel stuff segmentation
- 330K images (>200K labeled)
- 1.5 million object instances
- 80 object categories
- 91 stuff categories
- 5 captions per image
- 250,000 people with keypoints
List of the COCO Object Classes: The COCO dataset classes include the following pre-trained 80 objects. Click here to see the representation of these objects in the dataset.
The first version of MS COCO dataset was released in 2014. It contains 164,000 images split into training (83,000), validation (41,000) and test (41,000) sets. In 2015 an additional test set of 81,000 images was released, including all the previous test images and 40,000 new images.
Based on community feedback, in 2017 the training/validation split was changed from 83K/41K to 118K/5K. The new split uses the same images and annotations. The 2017 test set is a subset of 41K images of the 2015 test set. Additionally, the 2017 release contains a new unannotated dataset of 123K images.
Structure and format
The “COCO format” is the following JSON structure, which also includes labels and metadata:
- Info: Provides a high-level description and versioning information about your dataset.
- Licenses: Provides a list of image licenses with unique IDs to be specified by your images. It specifies the copyright to use the image.
- Images: Provides a list of images and relevant metadata.
- Categories: Provides a list of classification categories and supercategories of objects that are present in an image, each with a unique ID. (Note if you want to use a model pretrained on COCO out of the box, then you’d need to follow the COCO classes/categories).
- Annotations: Provides annotations each with a unique ID and the image ID it relates to. This contains the metadata about the categories related to an object, such as the location, size, and object category.
This dataset is used to set benchmarks for the following tasks: object detection, panoptic semantic segmentation, keypoint detection, dense pose estimation.
Object Detection: Objects are annotated with a bounding box and class label
Panoptic Semantic Segmentation: The boundary of objects are labeled with a mask and object classes are labeled with a class label
Keypoint Detection: This task involves simultaneously detecting people and localizing their keypoints.
DensePose: Involves mapping all human pixels of an RGB image to the 3D surface of the human body.
In this document, we will mainly focus on object detection. Please read Object Detection in 2022: The Definitive Guide and A Beginner’s Guide to Object Detection for quick tutorials on object detection (more detailed tutorials are available throughout the document.)
Average Precision (AP)
The following 12 metrics are used for characterizing the performance of an object detector on COCO:
Mean Average Precision (MAP) metric
Here is a quick (but very good) article on evaluation metrics for both object detection and COCO.
Examples and tutorials (companies / platforms / articles / more)
[More to be added later]
COCO dataset can be found here: COCO dataset
Datasets can be found here: FiftyOne Dataset Zoo
Models can be found here: : FiftyOne Model Zoo
FiftyOne provides the building blocks for optimizing your dataset analysis pipeline. You can visualize complex labels, evaluate your models, explore scenarios of interest, identify failure modes, find annotation mistakes, and much more. It is tightly integrated with CVAT for annotation and label refinement.
The COCO team has partnered with the open-source tool FiftyOne to make it easier to download, visualize, and evaluate the COCO dataset. It facilitates visualization and access to COCO data resources and serves as an evaluation tool for model analysis on COCO. Here’s the official documentation.
The FiftyOne tool has three components: the Python library, the App, and the Brain.
- FiftyOne Library: FiftyOne’s core library provides “a structured yet dynamic representation to explore your datasets”. It allows you to efficiently query and manipulate your dataset by adding custom tags, model predictions and more.
- FiftyOne App: The FiftyOne App is a graphical user interface that makes it easy to explore and rapidly gain intuition into your datasets. It allows you to visualize labels like bounding boxes and segmentations overlaid on the samples; sort, query and slice the dataset into any subset of interest; and more.
- FiftyOne Brain: The FiftyOne Brain is “a library of machine learning-powered capabilities that provide insights into your datasets and recommend ways to modify your datasets that will lead to measurably better performance of your models.” This is a closed-source solution.
- FiftyOne Quickstart Colab notebook.ipynb: This notebook provides a brief walkthrough of FiftyOne, highlighting features that help build datasets and computer vision models.
- The COCO Dataset: Best Practices for Downloading, Visualization, and Evaluation
- How to work with object detection datasets in COCO format: This post introduces FiftyOne to visualize and facilitate access to COCO dataset resources and evaluation. You can, a) download specific subsets of COCO, b) visualize the data and labels, c) evaluate your models on COCO easily and in few lines of code. [Detailed breakdown of tutorials in the report]
“Easily collaborate on custom computer vision datasets.”
DataTorch has an open source (https://open.datatorch.io/) collaborative data annotation tool where you can plug in any cloud storage, annotate files with your team, and export in COCO and other formats. You can also work online on the platform. DataTorch is a developer tool for building computer vision models. DataTorch revolves around the management of projects, which encapsulate of all of the data, people, and work related to a particular model.
- Import COCO Annotations | DataTorch Documentation
- Quickstart | DataTorch Documentation: Get started annotating a dataset and exporting it in COCO format right away.
- : “Analyzing visual environments is a major objective of computer vision; it includes detecting what items are there, localizing them in 2D and 3D, identifying their properties, and describing their relationships. As a result, the dataset could be used to train item recognition and classification methods. COCO is frequently used to test the efficiency of real-time object recognition techniques. Modern neural networking modules can understand the COCO dataset’s structure. Contemporary AI-driven alternatives are not quite skillful in creating complete precision in findings that lead to a fact that the COCO dataset is a substantial reference point for CV to train, test, polish, and refine models for faster scaling of the annotation pipeline. The COCO standard specifies how your annotations and picture metadata are saved on disc at a substantial stage. Furthermore, the COCO dataset is an addition to transfer learning, in which the material utilized for one model is utilized to start another.”
- Building Computer Vision Datasets in Coco Format — Blog contains the YouTube tutorial to build computer vision dataset using Datatorch.
COCO dataset can be found here: COCO dataset
Datasets can be found here: 70 datasets supported by TensorFlow Datasets
KYD allows users to explore the dataset by information that wasn’t originally in the dataset. “The tool annotates the existing data using machine learning models like Cloud Vision labels, Cloud Vision face detection, and general image quality metrics (e.g. sharpness and brightness).”
You cannot run Know Your Data on your own data yet. For now, Know Your Data works for image-based datasets supported by the TensorFlow Datasets API. Here are the official documentation and Github links.
- Explore the COCO dataset on KYD: “Know Your Data helps researchers, engineers, product teams, and decision makers understand datasets with the goal of improving data quality, and helping mitigate fairness and bias issues.”
- Explore the Coco Captions dataset: KYD allows you to explore fairness and bias issues by comparing features. You can see how labels correlate with protected entities.
- A Dataset Exploration Case Study with Know Your Data: “We demonstrate some of the functionality of a dataset exploration tool, Know Your Data (KYD), recently introduced at Google I/O, using the COCO Captions dataset as a case study. Using this tool, we find a range of gender and age biases in COCO Captions — biases that can be traced to both dataset collection and annotation practices. KYD is a dataset analysis tool that complements the growing suite of responsible AI tools being developed across Google and the broader research community. Currently, KYD only supports analysis of a small set of image datasets, but we’re working hard to make the tool accessible beyond this set.”
OpenCV (Open Source Computer Vision Library) is a library of programming functions mainly aimed at real-time computer vision. It is a software toolkit for processing real-time image and video, as well as providing analytics, and machine learning capabilities. It was originally created in 2000 by Intel. Github: https://github.com/opencv/opencv. According to Intel, “using OpenCV developers can access many advanced computer vision algorithms used for image and video processing in 2D and 3D as part of their programs. The algorithms are otherwise only found in high-end image and video processing software.”
OpenCV provides several modules for working on computer vision problems that are supported on the current popular deep learning frameworks: Tensorflow, Keras, and PyTorch.
OpenCV’s trained models can be executed on CPUs or NVIDIA or Intel GPUs. OpenVINO (see below) optimizes running OpenCV capabilities on Intel hardware.
OpenCV also has launched both a) hardware devices called the OpenCV AI Kit (OAK) (OAK-1 or OAK-D) and b) OpenCV AI Marketplace. “
OAK is a modular, open-source ecosystem composed of MIT-licensed hardware, software, and AI training — that allows you to embed the super-power of spatial AI plus accelerated computer vision functions into your product. OAK provides in a single, cohesive solution what would otherwise require cobbling together disparate hardware and software components.
The marketplace is called modeplace.ai and was built with OAK in mind.
- Introduction to the COCO Dataset — OpenCV
- OpenCV: OpenCV Tutorials
- Courses — OpenCV
- Getting Started with OpenCV | LearnOpenCV: This series of posts will help you get started with OpenCV — the most popular computer vision library in the world. Also, check out Getting Started with PyTorch and Getting Started with Tensorflow / Keras.
- Official OpenCV Courses | LearnOpenCV: additional courses
- : “In this OpenCV Weekly Webinar, Roboflow CEO Joseph Nelson joins OpenCV CEO Satya Mallick to discuss the fundamentals of deploying computer vision models, including common pitfalls and best practices. That includes deploying to the a web hosted API, to the edge, and even in-browser for live webcam use.”
- Using OpenCV AI Kit with Modelplace.AI To Create Real-time Reaction Videos: “We’re excited to show you some of the new site’s [modelplace.ai] features, and how to build a simple but elegant product using OAK and the OpenCV AI Marketplace.”
- Roboflow has created tutorial content on using OAK, including how to deploy to OAK-1 and how to use OAK-D with a custom model.
OpenVINO (Open Visual Inference and Neural Network Optimization) is an open-source toolkit for optimizing and deploying AI inference (across various Intel specific hardware devices).
OpenVINO (see below) optimizes running OpenCV capabilities on Intel hardware. Here’s the official documentation.
Models can be found here: Model Zoo
- Tutorials — OpenVINO™ documentation — Version(latest): OpenVINO series from learnopencv.com
- Introduction to Intel OpenVINO Toolkit
- Post Training Quantization with OpenVino Toolkit
- Running OpenVino Models on Intel Integrated GPU
- Introduction to OpenVino Deep Learning Workbench
- Intel OpenVINO with OpenCV | by Sanchit Singh
CVAT tool is part of the OpenVINO toolkit and was originally designed to accelerate the process of annotating videos and images for use in training computer vision algorithms.
- Computer Vision Annotation Tool (CVAT) — 2022 Overview — viso.ai
- How to use CVAT for computer vision [2022 updates] (roboflow.com)
- : By Roboflow (see below): “We walkthrough how to use the Computer Vision Annotation Tool (CVAT), a free tool for labeling images open sourced by Intel, as well as labeling best practices. Learn how to creating bounding boxes and prepare your computer vision dataset from scratch.”
- Developing OpenCV’s CVAT (benhoff.net)
COCO dataset can be found here: COCO dataset
Datasets can be found here: Computer Vision Datasets
Models can be found here: Computer Vision Model Library
“The Roboflow Model Library contains pre-configured model architectures for easily training computer vision models. Just add the link from your Roboflow dataset and you’re ready to go! We even include the code to export to common inference formats like TFLite, ONNX, and CoreML.”
Roboflow empowers developers to build their own computer vision applications, no matter their skillset or experience. We provide all of the tools needed to convert raw images into a custom trained computer vision model and deploy it for use in applications. Roboflow supports object detection and classification models. Here’s the official documentation.
- Check out for lots of interesting videos and tutorials.
- ” “In this video, we take a deep dive into the Microsoft Common Objects in Context Dataset (COCO). We show a COCO object detector live, COCO benchmark results, COCO example images, COCO class distribution, and more!”
- Complete Guide to Creating COCO Datasets | Udemy: “Build your own image datasets automatically with Python.”
- https://roboflow.com/formats/coco-json: “COCO format is not anywhere near universal and so you may find yourself needing to convert it to another format for a model (or export to COCO JSON from another format if you happen to be using a model that supports it). Roboflow is the universal tool for computer vision format conversion and can seamlessly input and output files in COCO JSON format.” The COCO dataset comes down in a special format called COCO JSON.
- : This video has each step of the process building a working computer vision model.
- OpenCV related (see above): OpenCV has launched hardware devices called the OpenCV AI Kit (OAK). Roboflow has created tutorial content on using OAK, including how to deploy to OAK-1 and how to use OAK-D with a custom model.https://www.superannotate.com/
COCO dataset can be found here: COCO Segmentation Dataset
Datasets can be found here: Computer Vision Datasets
SupperAnnotate is an end-to-end platform to annotate, version, and manage ground truth data.
Here’s the Github link.
Datasets has Computer Vision Datasets which provides an “easily accessible way of exploring public datasets using SuperAnnotate’s data curation platform.” From there you can explore the COCO dataset. The SuperAnnotate Python SDK allows access to the platform without web browser.
- SuperAnnotate Python SDK 4.3.4 documentation: This tutorial covers how to use the Python SDK.
- Visually explore the COCO Segmentation dataset
- Introduction to the COCO dataset
- Guide To SuperAnnotate — The Most Robust Image and Video Annotator Tool (analyticsindiamag.com): This tutorial covers the SuperAnnotate Desktop app and Python SDK. “SuperAnnotate platform provides end to end service for automating computer vision projects, starting from data engineering(generating high-quality training data) to model creation(training using neural networks). Allows project management through team creation and share via an API through Python SDK to measure progress. SuperAnnotate works with pixel-accurate annotations.”
Datasets can be found here: OpenMMLab Datasets
OpenMMLab is an open-source algorithm platform for computer vision.
- released more than 20 high-quality projects and toolboxes in various research areas such as image classification, object detection, semantic segmentation, action recognition, etc.
- made public more than 300 algorithms and 2,300 checkpoint
- Github link: OpenMMLab · GitHub (open-source) [see MMDetection]
MMDetection is an open source object detection toolbox based on PyTorch. It is a part of the OpenMMLab project. It consists of:
- Training recipes for object detection and instance segmentation.
- 360+ pre-trained models to use for fine-tuning (or training afresh).
- Dataset support for popular vision datasets such as COCO, Cityscapes, LVIS and PASCAL VOC.
Major features of the toolbox
- You can construct a customized object detection framework by combining different modules.
- the toolbox directly supports popular and contemporary detection frameworks, e.g. Faster RCNN, Mask RCNN, RetinaNet, etc.
- All basic bbox and mask operations run on GPUs.
- The toolbox stems from the codebase developed by the MMDet team, who won COCO Detection Challenge in 2018.
- Google Colab: object_detection — Colaboratory (google.com)
- Customize Datasets — MMDetection 2.24.1 documentation: In MMDetection, OpenMMLap recommends to convert the data into COCO formats and to do the conversion offline. The tutorial shows how you only need to modify the config’s data annotation paths and classes after the conversion of your data.
- MMDetection: An Object Detection Python Tool — Analytics India Magazine: “MMDetection is a Python toolbox built as a codebase exclusively for object detection and instance segmentation tasks. It is built in a modular way with PyTorch implementation. There are numerous methods available for object detection and instance segmentation collected from various well-acclaimed models. It enables quick training and inference with quality. On the other hand, the toolbox contains weights for more than 200 pre-trained networks, making the toolbox an instant solution in the object detection domain.”
- Since MMDetection is a toolbox containing many pre-built models and each model has its own architecture, this toolbox defines a general architecture that can adapt to any model. This general architecture comprises the following parts: Backbone, Neck, DenseHead (AnchorHead/AnchorFreeHead), RoIExtractor, RoIHead (BBoxHead/MaskHead)
Models can be found here: Models | Coral.
Coral is a complete toolkit to build products with local AI. “Our on-device inferencing capabilities allow you to build products that are efficient, private, fast and offline.” Coral has trained TensorFlow models for the Edge TPU for image classification, object detection, semantic segmentation, pose estimation, speech recognition.
- Models — Object Detection | Coral: “This page provides several trained models that are compiled for the Edge TPU, example code to run them, plus information about how to train your own model with TensorFlow.”
- Retrain SSD MobileNet V1 detector for the Edge TPU (TF1) — Colaboratory (google.com): “this tutorial shows you how to retrain a MobileNet V1 SSD model so that it detects two pets: Abyssinian cats and American Bulldogs (from the Oxford-IIIT Pets Dataset), using TensorFlow r1.15.”
[if I have time, I’ll add more hardware-specific and local / offline object detection or COCO-specific tutorials.]
- COCO format — Rekognition
- Transforming COCO datasets — Rekognition: “COCO is a format for specifying large-scale object detection, segmentation, and captioning datasets. This Python example shows you how to transform a COCO object detection format dataset into an Amazon Rekognition Custom Labels bounding box format manifest file. This section also includes information that you can use to write your own code.”
- Object detection with Detectron2 on Amazon SageMaker | AWS Machine Learning Blog: “In this post, we discuss Detectron2, an object detection and segmentation framework released by Facebook AI Research (FAIR), and its implementation on Amazon SageMaker to solve a dense object detection task for retail. This post includes an associated sample notebook, which you can run to demonstrate all the features discussed in this post. For more information, see the GitHub repository.”
- Object detection and model retraining with Amazon SageMaker and Amazon Augmented AI | AWS Machine Learning Blog: “In this post, we use Amazon SageMaker to build, train, and deploy an ML model for object detection and use Amazon Augmented AI (Amazon A2I) to build and render a custom worker template that allows reviewers to identify or review objects found in an image. You can also use Amazon Rekognition for object detection to identify objects from a predefined set of classes, or use Amazon Rekogition Custom Labels to train your custom model to detect objects and scenes in images that are specific to your business needs, simply by bringing your own data.
- Run a SageMaker TensorFlow object detection model in batch mode | by Niels van den Berg: “For a computer vision project, I need to apply an object detection model on a large set of images. This blog post describes how this can be done in Amazon SageMaker using Batch Transform Jobs with the TensorFlow object detection model API.”
- How to use detectron2 How to Get Started With Facebook’s Detectron2 | by Rob: “The purpose of this guide is to show how to easily implement a pretrained Detectron2 model, able to recognize objects represented by the classes from the COCO (Common Object in COntext) dataset.”
- GitHub — facebookresearch/detectron2: Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
- End-to-end object detection with Transformers (facebook.com): “we are releasing Detection Transformers (DETR), an important new approach to object detection and panoptic segmentation. DETR completely changes the architecture compared with previous object detection systems. It is the first object detection framework to successfully integrate Transformers as a central building block in the detection pipeline.”
- See “DETR” in Models section.
- CO3D Dataset (facebook.com): “Common Objects in 3D (CO3D) is a dataset designed for learning category-specific 3D reconstruction and new-view synthesis using multi-view images of common object categories. The dataset has been introduced in our ICCV 2021 Paper. The CO3D dataset contains a total of 1.5 million frames from nearly 19,000 videos capturing objects from 50 MS-COCO categories. As such, it surpasses alternatives in terms of both the number of categories and objects.”
- Downloading, preprocessing, and uploading the COCO dataset | Cloud TPU | Google Cloud: “This topic describes how to prepare the COCO dataset for models that run on Cloud TPU.”
- Getting started with the built-in image object detection algorithm | AI Platform Training | Google Cloud: “In this tutorial, you train an image object detection model without writing any code. You submit the COCO dataset to AI Platform Training for training, and then you deploy the model on AI Platform Training to get predictions. The resulting model classifies common objects within images of complex everyday scenes.”
- Vision AI | Derive Image Insights via ML | Cloud Vision API | Google Cloud: Detect and classify multiple objects including the location of each object within the image. Learn more about object detection with Vision API and AutoML Vision.
- Create and explore datasets with labels — Azure Machine Learning | Microsoft Docs: “In this article, you’ll learn how to export the data labels from an Azure Machine Learning data labeling project and load them into popular formats such as, a pandas dataframe for data exploration.”
- Prepare data for computer vision tasks — Azure Machine Learning | Microsoft Docs: “In this article, you learn how to prepare image data for training computer vision models with automated machine learning in Azure Machine Learning.”
- Tutorial: AutoML- train object detection model — Azure Machine Learning | Microsoft Docs: “In this tutorial, you learn how to train an object detection model using Azure Machine Learning automated ML with the Azure Machine Learning CLI extension v2 or the Azure Machine Learning Python SDK v2 (preview). This object detection model identifies whether the image contains objects, such as a can, carton, milk bottle, or water bottle.”
- Build an Object Detection Solution with Microsoft Azure Custom Vision Service | Pluralsight: “The Microsoft cloud includes a collection of services that help you create advanced AI solutions. This course will teach you how to build an object detection solution with Azure Custom Vision.”
- Preparing State-of-the-Art Models for Classification and Object Detection with NVIDIA TAO Toolkit: “NVIDIA TAO Toolkit lets you take your own custom dataset and fine-tune it with one of the many popular network architectures to produce a task-specific model…With TAO Toolkit, you can achieve state-of-the-art accuracy using public datasets while maintaining high inference throughput for deployment. This post shows you how to train object detection and image classification models using TAO Toolkit to achieve the same accuracy as in the literature and open-sourced implementations. We trained on public datasets such as ImageNet, PASCAL VOC, and MS COCO as a comparison with published results in the literature or open-source community. This post discusses the complete workflow to reach state-of-the-art accuracy on several popular model architectures.”
- The NVIDIA Train, Adapt, and Optimize (TAO) Toolkit: TAO Toolkit
- Preparing Models for Object Detection with Real and Synthetic Data and NVIDIA TAO Toolkit: “In this post, we show you how we used the TAO Toolkit quantized-aware training and model pruning to accomplish this, and how to replicate the results yourself. We show you how to create an airplane detector, but you should be able to fine-tune the model for various satellite detection scenarios of your own.”
- Object Detection on GPUs in 10 Minutes: “This post covers what you need to get up to speed using NVIDIA GPUs to run high performance object detection pipelines quickly and efficiently.”
- GitHub — CHETHAN-CS/Nvidia-jetson-inference: “Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson.”
Weights and Biases
- Choosing the right model for object detection — Weights & Biases (wandb.ai): “Today we’ll try out a couple of different models by comparing their performance on a custom dataset. One of the most important steps is to visualize the performance metrics on the go to get a good idea of what’s working and what’s not. We’ll use Weights and Biases (WandB) to log and visualize performance metrics.”
Other / Independent
- Immersive Limit:: : “A detailed walkthrough of the COCO Dataset JSON Format, specifically for object detection (instance segmentations).”
- Create COCO Annotations From Scratch
- PyImageSearch: PyTorch object detection with pre-trained networks: “In this tutorial, you will learn how to perform object detection with pre-trained networks using PyTorch. Utilizing pre-trained object detection networks, you can detect and recognize 90 common objects that your computer vision application will “see” in everyday life.” [And so many more for object detection: https://pyimagesearch.com/?s=object+detection]
- Tensorflow: Custom object detection in the browser using TensorFlow.js: In this post, we are going to develop an end-to-end solution using TensorFlow to train a custom object-detection model in Python, then put it into production, and run real-time inferences in the browser through TensorFlow.js.
- Viso.ai: What is COCO dataset?: “Everything you need to know about the popular Microsoft COCO dataset that is widely used for machine learning Projects. We will cover what you can do with MS COCO and what makes it different from alternatives such as Google’s OID (Open Images Dataset).”