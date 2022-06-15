We (ReasoNets) are building a dataset-first marketplace focusing on the end-to-end machine learning pipeline. Buyers will be able to find and use the datasets and assets they need, and sellers will be able to earn money while building those datasets and assets. Our goal: incentivize efficient and effective data usage, processing, and actions. Sign up for updates! Take a look at the report to quickly find common data resources and/or assets for dataset=COCO, task=object detection. We're open to suggestions, questions, and criticism - let's start a conversation.

We’re building a data-first marketplace, one where data and assets can be shared and traded.

The report was created to help readers quickly find common resources and/or assets for a given dataset and a specific task, in this case dataset=COCO, task=object detection.

I have broken up the report into the following blogs:

Part 1: COCO Summary Card. Each link will take you to the longer report where you can learn more. The next 3 parts represent a specific section in the report Part 2 (this one): About COCO and examples and tutorials (companies / platforms / articles / more), including tools and platforms used to work with COCO (or object detection tasks): FiftyOne, DataTorch, Know Your Data (KYD), OpenCV, OpenVINO, CVAT, Roboflow, SuperAnnotate, OpenMMLab, Coral, Amazon, Facebook, Google, Microsoft, NVIDIA, Weights and Biases, Other (PyImageSearch, Immersive Limit, Tensorflow, Viso.ai) Part 3: Process - This part is about the tools and platforms that can be used for different phases of data preparate or data processing involved in vision, object detection, and specifically COCO-related tasks. It will also discuss synthetic data and data quality. Part 4: Models - This part is about a quick introduction to some pre-trained models and some corresponding readings.

If you have feedback please email me at [email protected]

Who: Microsoft

Year released: The first version of MS COCO dataset was released in 2014.

License: Creative Commons Attribution 4.0 License.

Links

Website: https://cocodataset.org/

Github: cocodataset.github.io

Paper: Microsoft COCO: Common Objects in Context

API: COCO API: This package provides Matlab, Python, and Lua APIs that assists in loading, parsing, and visualizing the annotations in COCO. The Matlab and Python APIs are complete, the Lua API provides only basic functionality.

Description

COCO is a large-scale object detection, segmentation, and captioning dataset. COCO has several features

Object segmentation

Recognition in context

Superpixel stuff segmentation

330K images (>200K labeled)

1.5 million object instances

80 object categories

91 stuff categories

5 captions per image

250,000 people with keypoints

List of the COCO Object Classes: The COCO dataset classes include the following pre-trained 80 objects. Click here to see the representation of these objects in the dataset.

The first version of MS COCO dataset was released in 2014. It contains 164,000 images split into training (83,000), validation (41,000) and test (41,000) sets. In 2015 an additional test set of 81,000 images was released, including all the previous test images and 40,000 new images.

Based on community feedback, in 2017 the training/validation split was changed from 83K/41K to 118K/5K. The new split uses the same images and annotations. The 2017 test set is a subset of 41K images of the 2015 test set. Additionally, the 2017 release contains a new unannotated dataset of 123K images.

Structure and format

The “COCO format” is the following JSON structure, which also includes labels and metadata:

Info : Provides a high-level description and versioning information about your dataset.

: Provides a high-level description and versioning information about your dataset. Licenses : Provides a list of image licenses with unique IDs to be specified by your images. It specifies the copyright to use the image.

: Provides a list of image licenses with unique IDs to be specified by your images. It specifies the copyright to use the image. Images: Provides a list of images and relevant metadata.

Categories : Provides a list of classification categories and supercategories of objects that are present in an image, each with a unique ID. (Note if you want to use a model pretrained on COCO out of the box, then you’d need to follow the COCO classes/categories).

: Provides a list of classification categories and supercategories of objects that are present in an image, each with a unique ID. (Note if you want to use a model pretrained on COCO out of the box, then you’d need to follow the COCO classes/categories). Annotations: Provides annotations each with a unique ID and the image ID it relates to. This contains the metadata about the categories related to an object, such as the location, size, and object category.

Tasks

This dataset is used to set benchmarks for the following tasks: object detection, panoptic semantic segmentation, keypoint detection, dense pose estimation.

Object Detection: Objects are annotated with a bounding box and class label

Panoptic Semantic Segmentation: The boundary of objects are labeled with a mask and object classes are labeled with a class label

Keypoint Detection: This task involves simultaneously detecting people and localizing their keypoints.

DensePose: Involves mapping all human pixels of an RGB image to the 3D surface of the human body.

In this document, we will mainly focus on object detection. Please read Object Detection in 2022: The Definitive Guide and A Beginner’s Guide to Object Detection for quick tutorials on object detection (more detailed tutorials are available throughout the document.)

Evaluation metrics

Average Precision (AP)

The following 12 metrics are used for characterizing the performance of an object detector on COCO:

Mean Average Precision (MAP) metric

Here is a quick (but very good) article on evaluation metrics for both object detection and COCO.

Examples and tutorials (companies / platforms / articles / more)

About

COCO dataset can be found here: COCO dataset

Datasets can be found here: FiftyOne Dataset Zoo

Models can be found here: : FiftyOne Model Zoo





FiftyOne provides the building blocks for optimizing your dataset analysis pipeline. You can visualize complex labels, evaluate your models, explore scenarios of interest, identify failure modes, find annotation mistakes, and much more. It is tightly integrated with CVAT for annotation and label refinement.





The COCO team has partnered with the open-source tool FiftyOne to make it easier to download, visualize, and evaluate the COCO dataset. It facilitates visualization and access to COCO data resources and serves as an evaluation tool for model analysis on COCO. Here’s the official documentation.





The FiftyOne tool has three components: the Python library, the App, and the Brain.

FiftyOne Library: FiftyOne’s core library provides “a structured yet dynamic representation to explore your datasets”. It allows you to efficiently query and manipulate your dataset by adding custom tags, model predictions and more.

FiftyOne’s core library provides “a structured yet dynamic representation to explore your datasets”. It allows you to efficiently query and manipulate your dataset by adding custom tags, model predictions and more. FiftyOne App: The FiftyOne App is a graphical user interface that makes it easy to explore and rapidly gain intuition into your datasets. It allows you to visualize labels like bounding boxes and segmentations overlaid on the samples; sort, query and slice the dataset into any subset of interest; and more.

The FiftyOne App is a graphical user interface that makes it easy to explore and rapidly gain intuition into your datasets. It allows you to visualize labels like bounding boxes and segmentations overlaid on the samples; sort, query and slice the dataset into any subset of interest; and more. FiftyOne Brain: The FiftyOne Brain is “a library of machine learning-powered capabilities that provide insights into your datasets and recommend ways to modify your datasets that will lead to measurably better performance of your models.” This is a closed-source solution.

Tutorials

DataTorch

About

“Easily collaborate on custom computer vision datasets.”





DataTorch has an open source (https://open.datatorch.io/) collaborative data annotation tool where you can plug in any cloud storage, annotate files with your team, and export in COCO and other formats. You can also work online on the platform . DataTorch is a developer tool for building computer vision models. DataTorch revolves around the management of projects, which encapsulate of all of the data, people, and work related to a particular model.

Tutorials

Import COCO Annotations | DataTorch Documentation Quickstart | DataTorch Documentation: Get started annotating a dataset and exporting it in COCO format right away. Building Computer Vision Datasets in Coco Format — YouTube: “Analyzing visual environments is a major objective of computer vision; it includes detecting what items are there, localizing them in 2D and 3D, identifying their properties, and describing their relationships. As a result, the dataset could be used to train item recognition and classification methods. COCO is frequently used to test the efficiency of real-time object recognition techniques. Modern neural networking modules can understand the COCO dataset’s structure. Contemporary AI-driven alternatives are not quite skillful in creating complete precision in findings that lead to a fact that the COCO dataset is a substantial reference point for CV to train, test, polish, and refine models for faster scaling of the annotation pipeline. The COCO standard specifies how your annotations and picture metadata are saved on disc at a substantial stage. Furthermore, the COCO dataset is an addition to transfer learning, in which the material utilized for one model is utilized to start another.” Building Computer Vision Datasets in Coco Format — Blog contains the YouTube tutorial to build computer vision dataset using Datatorch.

About

COCO dataset can be found here: COCO dataset

Datasets can be found here: 70 datasets supported by TensorFlow Datasets





KYD allows users to explore the dataset by information that wasn’t originally in the dataset. “The tool annotates the existing data using machine learning models like Cloud Vision labels, Cloud Vision face detection, and general image quality metrics (e.g. sharpness and brightness).”





You cannot run Know Your Data on your own data yet. For now, Know Your Data works for image-based datasets supported by the TensorFlow Datasets API. Here are the official documentation and Github links.

Tutorials

Explore the COCO dataset on KYD: “Know Your Data helps researchers, engineers, product teams, and decision makers understand datasets with the goal of improving data quality, and helping mitigate fairness and bias issues.” Explore the Coco Captions dataset: KYD allows you to explore fairness and bias issues by comparing features. You can see how labels correlate with protected entities. A Dataset Exploration Case Study with Know Your Data: “We demonstrate some of the functionality of a dataset exploration tool, Know Your Data (KYD), recently introduced at Google I/O, using the COCO Captions dataset as a case study. Using this tool, we find a range of gender and age biases in COCO Captions — biases that can be traced to both dataset collection and annotation practices. KYD is a dataset analysis tool that complements the growing suite of responsible AI tools being developed across Google and the broader research community. Currently, KYD only supports analysis of a small set of image datasets, but we’re working hard to make the tool accessible beyond this set.”

About

OpenCV (Open Source Computer Vision Library) is a library of programming functions mainly aimed at real-time computer vision. It is a software toolkit for processing real-time image and video, as well as providing analytics, and machine learning capabilities. It was originally created in 2000 by Intel. Github: https://github.com/opencv/opencv. According to Intel, “using OpenCV developers can access many advanced computer vision algorithms used for image and video processing in 2D and 3D as part of their programs. The algorithms are otherwise only found in high-end image and video processing software.”





OpenCV provides several modules for working on computer vision problems that are supported on the current popular deep learning frameworks: Tensorflow, Keras, and PyTorch.

OpenCV’s trained models can be executed on CPUs or NVIDIA or Intel GPUs. OpenVINO (see below) optimizes running OpenCV capabilities on Intel hardware.





OpenCV also has launched both a) hardware devices called the OpenCV AI Kit (OAK) (OAK-1 or OAK-D) and b) OpenCV AI Marketplace. “

OAK is a modular, open-source ecosystem composed of MIT-licensed hardware, software, and AI training — that allows you to embed the super-power of spatial AI plus accelerated computer vision functions into your product. OAK provides in a single, cohesive solution what would otherwise require cobbling together disparate hardware and software components.





The marketplace is called modeplace.ai and was built with OAK in mind.

Tutorials

About

OpenVINO (Open Visual Inference and Neural Network Optimization) is an open-source toolkit for optimizing and deploying AI inference (across various Intel specific hardware devices).





OpenVINO (see below) optimizes running OpenCV capabilities on Intel hardware. Here’s the official documentation.





Models can be found here: Model Zoo

Tutorials

About

CVAT tool is part of the OpenVINO toolkit and was originally designed to accelerate the process of annotating videos and images for use in training computer vision algorithms.

Tutorials

About

COCO dataset can be found here: COCO dataset

Datasets can be found here: Computer Vision Datasets

Models can be found here: Computer Vision Model Library





“The Roboflow Model Library contains pre-configured model architectures for easily training computer vision models. Just add the link from your Roboflow dataset and you’re ready to go! We even include the code to export to common inference formats like TFLite, ONNX, and CoreML.”

Roboflow empowers developers to build their own computer vision applications, no matter their skillset or experience. We provide all of the tools needed to convert raw images into a custom trained computer vision model and deploy it for use in applications. Roboflow supports object detection and classification models. Here’s the official documentation.

Tutorials

About

COCO dataset can be found here: COCO Segmentation Dataset

Datasets can be found here: Computer Vision Datasets





SupperAnnotate is an end-to-end platform to annotate, version, and manage ground truth data.

Here’s the Github link.





Datasets has Computer Vision Datasets which provides an “easily accessible way of exploring public datasets using SuperAnnotate’s data curation platform.” From there you can explore the COCO dataset. The SuperAnnotate Python SDK allows access to the platform without web browser.

Tutorials

About

Datasets can be found here: OpenMMLab Datasets





OpenMMLab is an open-source algorithm platform for computer vision.

released more than 20 high-quality projects and toolboxes in various research areas such as image classification, object detection, semantic segmentation, action recognition, etc.

made public more than 300 algorithms and 2,300 checkpoint

Github link: OpenMMLab · GitHub (open-source) [see MMDetection]

MMDetection is an open source object detection toolbox based on PyTorch. It is a part of the OpenMMLab project. It consists of:

Training recipes for object detection and instance segmentation.

360+ pre-trained models to use for fine-tuning (or training afresh).

Dataset support for popular vision datasets such as COCO, Cityscapes, LVIS and PASCAL VOC.

Major features of the toolbox

You can construct a customized object detection framework by combining different modules.

the toolbox directly supports popular and contemporary detection frameworks, e.g. Faster RCNN, Mask RCNN, RetinaNet, etc.

All basic bbox and mask operations run on GPUs.

The toolbox stems from the codebase developed by the MMDet team, who won COCO Detection Challenge in 2018.

Tutorials

Google Colab: object_detection — Colaboratory (google.com) Customize Datasets — MMDetection 2.24.1 documentation: In MMDetection, OpenMMLap recommends to convert the data into COCO formats and to do the conversion offline. The tutorial shows how you only need to modify the config’s data annotation paths and classes after the conversion of your data. MMDetection: An Object Detection Python Tool — Analytics India Magazine: “MMDetection is a Python toolbox built as a codebase exclusively for object detection and instance segmentation tasks. It is built in a modular way with PyTorch implementation. There are numerous methods available for object detection and instance segmentation collected from various well-acclaimed models. It enables quick training and inference with quality. On the other hand, the toolbox contains weights for more than 200 pre-trained networks, making the toolbox an instant solution in the object detection domain.” Since MMDetection is a toolbox containing many pre-built models and each model has its own architecture, this toolbox defines a general architecture that can adapt to any model. This general architecture comprises the following parts: Backbone, Neck, DenseHead (AnchorHead/AnchorFreeHead), RoIExtractor, RoIHead (BBoxHead/MaskHead)

About

Models can be found here: Models | Coral.





Coral is a complete toolkit to build products with local AI. “Our on-device inferencing capabilities allow you to build products that are efficient, private, fast and offline.” Coral has trained TensorFlow models for the Edge TPU for image classification, object detection, semantic segmentation, pose estimation, speech recognition.

Tutorials

Weights and Biases

Choosing the right model for object detection — Weights & Biases (wandb.ai): “Today we’ll try out a couple of different models by comparing their performance on a custom dataset. One of the most important steps is to visualize the performance metrics on the go to get a good idea of what’s working and what’s not. We’ll use Weights and Biases (WandB) to log and visualize performance metrics.”

