We (ReasoNets) are building a dataset-first marketplace focusing on the end-to-end machine learning pipeline.
Buyers will be able to find and use the datasets and assets they need, and sellers will be able to earn money while building those datasets and assets.
Our goal: incentivize efficient and effective data usage, processing, and actions. Sign up for updates!
Take a look at the report to quickly find common data resources and/or assets for dataset=COCO, task=object detection. We're open to suggestions, questions, and criticism - let's start a conversation.
People Mentioned
Companies Mentioned
We’re building a data-first marketplace, one where data and assets can be shared and traded.
The marketplace will contain all that this report contains (and much more for a lot more datasets). The report was created to help readers quickly find common resources and/or assets for a given dataset and a specific task, in this case dataset=COCO, task=object detection. I’m open to suggestions, questions, and criticism. Please email me or message me to start a conversation.
I have broken up the report into the following blogs:
Part 1: COCO Summary Card. Each link will take you to the longer report where you can learn more. The next 3 parts represent a specific section in the report
Part 2 (this one): About COCO and examples and tutorials (companies / platforms / articles / more), including tools and platforms used to work with COCO (or object detection tasks): FiftyOne, DataTorch, Know Your Data (KYD), OpenCV, OpenVINO, CVAT, Roboflow, SuperAnnotate, OpenMMLab, Coral, Amazon, Facebook, Google, Microsoft, NVIDIA, Weights and Biases, Other (PyImageSearch, Immersive Limit, Tensorflow, Viso.ai)
Part 3: Process - This part is about the tools and platforms that can be used for different phases of data preparate or data processing involved in vision, object detection, and specifically COCO-related tasks. It will also discuss synthetic data and data quality.
Part 4: Models - This part is about a quick introduction to some pre-trained models and some corresponding readings.
API: COCO API: This package provides Matlab, Python, and Lua APIs that assists in loading, parsing, and visualizing the annotations in COCO. The Matlab and Python APIs are complete, the Lua API provides only basic functionality.
Description
COCO is a large-scale object detection, segmentation, and captioning dataset. COCO has several features
Object segmentation
Recognition in context
Superpixel stuff segmentation
330K images (>200K labeled)
1.5 million object instances
80 object categories
91 stuff categories
5 captions per image
250,000 people with keypoints
List of the COCO Object Classes: The COCO dataset classes include the following pre-trained 80 objects. Click here to see the representation of these objects in the dataset.
The first version of MS COCO dataset was released in 2014. It contains 164,000 images split into training (83,000), validation (41,000) and test (41,000) sets. In 2015 an additional test set of 81,000 images was released, including all the previous test images and 40,000 new images.
Based on community feedback, in 2017 the training/validation split was changed from 83K/41K to 118K/5K. The new split uses the same images and annotations. The 2017 test set is a subset of 41K images of the 2015 test set. Additionally, the 2017 release contains a new unannotated dataset of 123K images.
Structure and format
The “COCO format” is the following JSON structure, which also includes labels and metadata:
Info: Provides a high-level description and versioning information about your dataset.
Licenses: Provides a list of image licenses with unique IDs to be specified by your images. It specifies the copyright to use the image.
Images: Provides a list of images and relevant metadata.
Categories: Provides a list of classification categories and supercategories of objects that are present in an image, each with a unique ID. (Note if you want to use a model pretrained on COCO out of the box, then you’d need to follow the COCO classes/categories).
Annotations: Provides annotations each with a unique ID and the image ID it relates to. This contains the metadata about the categories related to an object, such as the location, size, and object category.
Tasks
This dataset is used to set benchmarks for the following tasks: object detection, panoptic semantic segmentation, keypoint detection, dense pose estimation.
Object Detection: Objects are annotated with a bounding box and class label
Panoptic Semantic Segmentation: The boundary of objects are labeled with a mask and object classes are labeled with a class label
Keypoint Detection: This task involves simultaneously detecting people and localizing their keypoints.
DensePose: Involves mapping all human pixels of an RGB image to the 3D surface of the human body.
FiftyOne provides the building blocks for optimizing your dataset analysis pipeline. You can visualize complex labels, evaluate your models, explore scenarios of interest, identify failure modes, find annotation mistakes, and much more. It is tightly integrated with CVAT for annotation and label refinement.
The COCO team has partnered with the open-source tool FiftyOne to make it easier to download, visualize, and evaluate the COCO dataset. It facilitates visualization and access to COCO data resources and serves as an evaluation tool for model analysis on COCO. Here’s the official documentation.
The FiftyOne tool has three components: the Python library, the App, and the Brain.
FiftyOne Library: FiftyOne’s core library provides “a structured yet dynamic representation to explore your datasets”. It allows you to efficiently query and manipulate your dataset by adding custom tags, model predictions and more.
FiftyOne App: The FiftyOne App is a graphical user interface that makes it easy to explore and rapidly gain intuition into your datasets. It allows you to visualize labels like bounding boxes and segmentations overlaid on the samples; sort, query and slice the dataset into any subset of interest; and more.
FiftyOne Brain: The FiftyOne Brain is “a library of machine learning-powered capabilities that provide insights into your datasets and recommend ways to modify your datasets that will lead to measurably better performance of your models.” This is a closed-source solution.
Tutorials
FiftyOne Quickstart Colab notebook.ipynb: This notebook provides a brief walkthrough of FiftyOne, highlighting features that help build datasets and computer vision models.
How to work with object detection datasets in COCO format: This post introduces FiftyOne to visualize and facilitate access to COCO dataset resources and evaluation. You can, a) download specific subsets of COCO, b) visualize the data and labels, c) evaluate your models on COCO easily and in few lines of code. [Detailed breakdown of tutorials in the report]
DataTorch
About
“Easily collaborate on custom computer vision datasets.”
DataTorch has an open source (https://open.datatorch.io/) collaborative data annotation tool where you can plug in any cloud storage, annotate files with your team, and export in COCO and other formats. You can also work online on the platform. DataTorch is a developer tool for building computer vision models. DataTorch revolves around the management of projects, which encapsulate of all of the data, people, and work related to a particular model.
Building Computer Vision Datasets in Coco Format — YouTube: “Analyzing visual environments is a major objective of computer vision; it includes detecting what items are there, localizing them in 2D and 3D, identifying their properties, and describing their relationships. As a result, the dataset could be used to train item recognition and classification methods. COCO is frequently used to test the efficiency of real-time object recognition techniques. Modern neural networking modules can understand the COCO dataset’s structure. Contemporary AI-driven alternatives are not quite skillful in creating complete precision in findings that lead to a fact that the COCO dataset is a substantial reference point for CV to train, test, polish, and refine models for faster scaling of the annotation pipeline. The COCO standard specifies how your annotations and picture metadata are saved on disc at a substantial stage. Furthermore, the COCO dataset is an addition to transfer learning, in which the material utilized for one model is utilized to start another.”
KYD allows users to explore the dataset by information that wasn’t originally in the dataset. “The tool annotates the existing data using machine learning models like Cloud Vision labels, Cloud Vision face detection, and general image quality metrics (e.g. sharpness and brightness).”
You cannot run Know Your Data on your own data yet. For now, Know Your Data works for image-based datasets supported by the TensorFlow Datasets API. Here are the official documentation and Github links.
Tutorials
Explore the COCO dataset on KYD: “Know Your Data helps researchers, engineers, product teams, and decision makers understand datasets with the goal of improving data quality, and helping mitigate fairness and bias issues.”
Explore the Coco Captions dataset: KYD allows you to explore fairness and bias issues by comparing features. You can see how labels correlate with protected entities.
A Dataset Exploration Case Study with Know Your Data: “We demonstrate some of the functionality of a dataset exploration tool, Know Your Data (KYD), recently introduced at Google I/O, using the COCO Captions dataset as a case study. Using this tool, we find a range of gender and age biases in COCO Captions — biases that can be traced to both dataset collection and annotation practices. KYD is a dataset analysis tool that complements the growing suite of responsible AI tools being developed across Google and the broader research community. Currently, KYD only supports analysis of a small set of image datasets, but we’re working hard to make the tool accessible beyond this set.”
OpenCV (Open Source Computer Vision Library) is a library of programming functions mainly aimed at real-time computer vision. It is a software toolkit for processing real-time image and video, as well as providing analytics, and machine learning capabilities. It was originally created in 2000 by Intel. Github: https://github.com/opencv/opencv. According to Intel, “using OpenCV developers can access many advanced computer vision algorithms used for image and video processing in 2D and 3D as part of their programs. The algorithms are otherwise only found in high-end image and video processing software.”
OpenCV provides several modules for working on computer vision problems that are supported on the current popular deep learning frameworks: Tensorflow, Keras, and PyTorch.
OpenCV’s trained models can be executed on CPUs or NVIDIA or Intel GPUs. OpenVINO (see below) optimizes running OpenCV capabilities on Intel hardware.
OAK is a modular, open-source ecosystem composed of MIT-licensed hardware, software, and AI training — that allows you to embed the super-power of spatial AI plus accelerated computer vision functions into your product. OAK provides in a single, cohesive solution what would otherwise require cobbling together disparate hardware and software components.
The marketplace is called modeplace.ai and was built with OAK in mind.
OpenCV + Roboflow: Getting Edge-y — Computer Vision Deployment Techniques — YouTube: “In this OpenCV Weekly Webinar, Roboflow CEO Joseph Nelson joins OpenCV CEO Satya Mallick to discuss the fundamentals of deploying computer vision models, including common pitfalls and best practices. That includes deploying to the a web hosted API, to the edge, and even in-browser for live webcam use.”
OpenVINO (Open Visual Inference and Neural Network Optimization) is an open-source toolkit for optimizing and deploying AI inference (across various Intel specific hardware devices).
OpenVINO (see below) optimizes running OpenCV capabilities on Intel hardware. Here’s the official documentation.
CVAT tool is part of the OpenVINO toolkit and was originally designed to accelerate the process of annotating videos and images for use in training computer vision algorithms.
How to Label Images for Object Detection with CVAT — YouTube: By Roboflow (see below): “We walkthrough how to use the Computer Vision Annotation Tool (CVAT), a free tool for labeling images open sourced by Intel, as well as labeling best practices. Learn how to creating bounding boxes and prepare your computer vision dataset from scratch.”
“The Roboflow Model Library contains pre-configured model architectures for easily training computer vision models. Just add the link from your Roboflow dataset and you’re ready to go! We even include the code to export to common inference formats like TFLite, ONNX, and CoreML.”
Roboflow empowers developers to build their own computer vision applications, no matter their skillset or experience. We provide all of the tools needed to convert raw images into a custom trained computer vision model and deploy it for use in applications. Roboflow supports object detection and classification models. Here’s the official documentation.
Exploring The COCO Dataset (YouTube)” “In this video, we take a deep dive into the Microsoft Common Objects in Context Dataset (COCO). We show a COCO object detector live, COCO benchmark results, COCO example images, COCO class distribution, and more!”
https://roboflow.com/formats/coco-json: “COCO format is not anywhere near universal and so you may find yourself needing to convert it to another format for a model (or export to COCO JSON from another format if you happen to be using a model that supports it). Roboflow is the universal tool for computer vision format conversion and can seamlessly input and output files in COCO JSON format.” The COCO dataset comes down in a special format called COCO JSON.
Datasets has Computer Vision Datasets which provides an “easily accessible way of exploring public datasets using SuperAnnotate’s data curation platform.” From there you can explore the COCO dataset. The SuperAnnotate Python SDK allows access to the platform without web browser.
Guide To SuperAnnotate — The Most Robust Image and Video Annotator Tool (analyticsindiamag.com): This tutorial covers the SuperAnnotate Desktop app and Python SDK. “SuperAnnotate platform provides end to end service for automating computer vision projects, starting from data engineering(generating high-quality training data) to model creation(training using neural networks). Allows project management through team creation and share via an API through Python SDK to measure progress. SuperAnnotate works with pixel-accurate annotations.”
OpenMMLab is an open-source algorithm platform for computer vision.
released more than 20 high-quality projects and toolboxes in various research areas such as image classification, object detection, semantic segmentation, action recognition, etc.
made public more than 300 algorithms and 2,300 checkpoint
Customize Datasets — MMDetection 2.24.1 documentation: In MMDetection, OpenMMLap recommends to convert the data into COCO formats and to do the conversion offline. The tutorial shows how you only need to modify the config’s data annotation paths and classes after the conversion of your data.
MMDetection: An Object Detection Python Tool — Analytics India Magazine: “MMDetection is a Python toolbox built as a codebase exclusively for object detection and instance segmentation tasks. It is built in a modular way with PyTorch implementation. There are numerous methods available for object detection and instance segmentation collected from various well-acclaimed models. It enables quick training and inference with quality. On the other hand, the toolbox contains weights for more than 200 pre-trained networks, making the toolbox an instant solution in the object detection domain.”
Since MMDetection is a toolbox containing many pre-built models and each model has its own architecture, this toolbox defines a general architecture that can adapt to any model. This general architecture comprises the following parts: Backbone, Neck, DenseHead (AnchorHead/AnchorFreeHead), RoIExtractor, RoIHead (BBoxHead/MaskHead)
Coral is a complete toolkit to build products with local AI. “Our on-device inferencing capabilities allow you to build products that are efficient, private, fast and offline.” Coral has trained TensorFlow models for the Edge TPU for image classification, object detection, semantic segmentation, pose estimation, speech recognition.
Transforming COCO datasets — Rekognition: “COCO is a format for specifying large-scale object detection, segmentation, and captioning datasets. This Python example shows you how to transform a COCO object detection format dataset into an Amazon Rekognition Custom Labels bounding box format manifest file. This section also includes information that you can use to write your own code.”
Object detection with Detectron2 on Amazon SageMaker | AWS Machine Learning Blog: “In this post, we discuss Detectron2, an object detection and segmentation framework released by Facebook AI Research (FAIR), and its implementation on Amazon SageMaker to solve a dense object detection task for retail. This post includes an associated sample notebook, which you can run to demonstrate all the features discussed in this post. For more information, see the GitHub repository.”
How to use detectron2 How to Get Started With Facebook’s Detectron2 | by Rob: “The purpose of this guide is to show how to easily implement a pretrained Detectron2 model, able to recognize objects represented by the classes from the COCO (Common Object in COntext) dataset.”
End-to-end object detection with Transformers (facebook.com): “we are releasing Detection Transformers (DETR), an important new approach to object detection and panoptic segmentation. DETR completely changes the architecture compared with previous object detection systems. It is the first object detection framework to successfully integrate Transformers as a central building block in the detection pipeline.”
CO3D Dataset (facebook.com): “Common Objects in 3D (CO3D) is a dataset designed for learning category-specific 3D reconstruction and new-view synthesis using multi-view images of common object categories. The dataset has been introduced in our ICCV 2021 Paper. The CO3D dataset contains a total of 1.5 million frames from nearly 19,000 videos capturing objects from 50 MS-COCO categories. As such, it surpasses alternatives in terms of both the number of categories and objects.”
Getting started with the built-in image object detection algorithm | AI Platform Training | Google Cloud: “In this tutorial, you train an image object detection model without writing any code. You submit the COCO dataset to AI Platform Training for training, and then you deploy the model on AI Platform Training to get predictions. The resulting model classifies common objects within images of complex everyday scenes.”
Tutorial: AutoML- train object detection model — Azure Machine Learning | Microsoft Docs: “In this tutorial, you learn how to train an object detection model using Azure Machine Learning automated ML with the Azure Machine Learning CLI extension v2 or the Azure Machine Learning Python SDK v2 (preview). This object detection model identifies whether the image contains objects, such as a can, carton, milk bottle, or water bottle.”
Preparing State-of-the-Art Models for Classification and Object Detection with NVIDIA TAO Toolkit: “NVIDIA TAO Toolkit lets you take your own custom dataset and fine-tune it with one of the many popular network architectures to produce a task-specific model…With TAO Toolkit, you can achieve state-of-the-art accuracy using public datasets while maintaining high inference throughput for deployment. This post shows you how to train object detection and image classification models using TAO Toolkit to achieve the same accuracy as in the literature and open-sourced implementations. We trained on public datasets such as ImageNet, PASCAL VOC, and MS COCO as a comparison with published results in the literature or open-source community. This post discusses the complete workflow to reach state-of-the-art accuracy on several popular model architectures.”
The NVIDIA Train, Adapt, and Optimize (TAO) Toolkit: TAO Toolkit
Preparing Models for Object Detection with Real and Synthetic Data and NVIDIA TAO Toolkit: “In this post, we show you how we used the TAO Toolkit quantized-aware training and model pruning to accomplish this, and how to replicate the results yourself. We show you how to create an airplane detector, but you should be able to fine-tune the model for various satellite detection scenarios of your own.”
Object Detection on GPUs in 10 Minutes: “This post covers what you need to get up to speed using NVIDIA GPUs to run high performance object detection pipelines quickly and efficiently.”
GitHub — CHETHAN-CS/Nvidia-jetson-inference: “Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson.”
Weights and Biases
Choosing the right model for object detection — Weights & Biases (wandb.ai): “Today we’ll try out a couple of different models by comparing their performance on a custom dataset. One of the most important steps is to visualize the performance metrics on the go to get a good idea of what’s working and what’s not. We’ll use Weights and Biases (WandB) to log and visualize performance metrics.”
PyImageSearch: PyTorch object detection with pre-trained networks: “In this tutorial, you will learn how to perform object detection with pre-trained networks using PyTorch. Utilizing pre-trained object detection networks, you can detect and recognize 90 common objects that your computer vision application will “see” in everyday life.” [And so many more for object detection: https://pyimagesearch.com/?s=object+detection]
Tensorflow: Custom object detection in the browser using TensorFlow.js: In this post, we are going to develop an end-to-end solution using TensorFlow to train a custom object-detection model in Python, then put it into production, and run real-time inferences in the browser through TensorFlow.js.
Viso.ai: What is COCO dataset?: “Everything you need to know about the popular Microsoft COCO dataset that is widely used for machine learning Projects. We will cover what you can do with MS COCO and what makes it different from alternatives such as Google’s OID (Open Images Dataset).”