A Definitive Guide To Build Training Data For Computer Vision
Tech giants like Google, Microsoft, Amazon, and Facebook have declared their product strategies with the AI first approach. The AI effect has influenced the product roadmaps of all enterprise companies which now have prominent AI based applications getting launched each quarter to automate their business processes. Computer Vision, specifically, is being vastly explored and applied across industries from traditional banking to cutting edge self-driving cars.
Amazing isn’t it!
But, how does one start to implement Computer Vision or CV in short? The major steps are as follows:
- Collect lots of data
- Label it
- Get GPUs — Training ML models require huge computational resources
- Choose an algorithm -> Train your model -> Test it -> Teach the model what it doesn’t know yet
- Repeat the above point till you get acceptable quality
Each of these 5 steps has their own list of technical and operational challenges. In this article, I will help you out with #2 (Labeling of training data) to get you started.
I have written about the ways you can start gathering training data. This depends on the use case you plan to work on.
Popular Use cases for Computer Vision:
1.Self-driving cars (Waymo, Tesla, Cruise) — An autonomous vehicle needs to identify what’s in front of it (and behind!), be it another car, the road sign, a pedestrian or even a stray chicken chasing its dreams.
2. Drones — Amazon wants to deliver your groceries via drones pretty soon. Drones need to know what’s in front of them so that they don’t bump into a bird or an electrical cable. Drones are also used for security surveillance & reconnaissance operations by the military.
3. Mapping & Satellites (Mapbox, HERE, Orbital Insight) — The amount of image data captured by satellites is blowing up! This data is being used to identify survivors in hurricane affected regions, to enhance maps and even to predict Walmart’s sales based on the number of cars parked in their parking lots!
4. Robotics — CV is being used to build robotic hands that can identify and pick items off a shelf, industrial automation and even play soccer.
5. OCR for BFSI — Document transcription for credit rating, loan processing, transcribing handwritten data and more.
6. Medicine — In computer assisted surgery it’s important to detect the surgery tools in the images using check-list that combines managerial and technical methods of Computer Vision.
7. Agriculture Technology — A former Japanese embedded-systems designer Makoto Koike started using use-cases for machine learning and deep learning thereby helping out at his parents’ cucumber farm and was amazed by the effort levels it generally takes to sort cucumbers by size, shape, color and other essential attributes.
The first step in the process I mentioned earlier is collecting data. If you’re just getting started, There are some great free and paid standard datasets:
Existing Open Labeled Dataset Repositories:
- Common Objects in Context (COCO)
- Google’s Open Images
- The University of Edinburgh School of Informatics’ CVonline: Image Databases
- Yet Another Computer Vision Index To Datasets (YACVID)
- CV datasets on GitHub
- UCI Machine Learning Repository
- Udacity Self driving car datasets
- Cityscapes Dataset
- Autonomous driving dataset by Comma.ai
- MNIST handwritten datasets
These datasets serve as a good starting point for anyone looking to get started with learning ML. They are even useful if you want to build a simple model for a side project. But for most practical purposes they just don’t cut it.
The real edge of your CV model can only be developed by collecting proprietary training data similar to the data you expect your final model to work well on. This data is often nuanced and different from generally available datasets.
There are many different ways in which data is collected. You could scrape the internet or use data captured by your users (like Facebook, Google Photos) or data collected from car cameras (Waymo, Tesla) or you could even buy datasets from re-sellers!
Labeling the Data
Once you have the data, you need to label it. There are primarily 2 things you need to be concerned about here:
- How do you label the data?
- Who labels the data?
Note: The data for the use cases mentioned above is usually images, videos or even 3D point clouds in case of LIDAR equipped cars. For the sake of simplicity, we’ll only consider images for now.
Choosing Image Annotation Tools
Lots of image annotation tools are available online. However, selecting the right one for your needs might be a problem. Here, are the pointers to consider while selecting a tool.
Factors to consider:
- Tool setup time and effort
- Labeling Accuracy
- Labeling Speed
Most Popular Image Annotation Tools: (under MIT License)
Comma Coloring — Helps train the machine learning behind Comma.ai’s self-driving technology. For instance, you are presented with a photo from a car dash-cam and then asked to color in sections of the image; like which part of the image is the sky, which part is the road, identify any traffic signs etc. They’ve also open sourced the code behind the project here .
Annotorious — Helps annotate images and label them. Add drawings and comments to images on your Web page. You can get started with less than three lines of code. It is MIT Licensed, used for free in commercial and non-commercial projects.
LabelME — Helps you build image databases for computer vision research. You can contribute to the database by visiting the annotation tool.
Another list of Image Annotation tools (Free to use):
- Alp’s Labeling Tools for Deep Learning
- VGG Image Annotator (VIA)
- LEAR Image Annotation tool by Alexander Kläser
- Image Annotator Plugin for Drupal
- Demon Image Annotation Plugin for WordPress
- Landmarker.io , Sloth , vatic , ViPER-GT , Fiji , MediaTeam GTEditor , LabelD and Imglab
Building Custom Annotation Tools from scratch
If open tools don’t fit your needs, you might have to put in engineering resources to customize them or even build something from scratch. This is understandably very costly and no one wants to do this unless it’s necessary.
Specialized Annotation Tools
Companies like Playment build special tools which incorporate the best practices learned from annotating thousands of images every day across a variety of scenarios & use-cases. We have world class UX designers constantly improving the annotators’ experience and making the annotation process more efficient.
Choosing Labor Pools
You could hire an intern, ask your colleagues to help you out or if you have the money and the time you could set up an operations team. But none of these options can scale as you grow.
You could just outsource it and relax. Easy, isn’t it?
Not really. You’ll need to hire a BPO who understand AI, onboard them onto your tool, train them on annotation best practices, build more tools to view their work, build QA models to ensure labeling accuracy, ensure they’re not cutting slack & that you’re getting the bang for your buck. Doesn’t seem like you’ll be able to relax much, though.
Or you could reach out to a specialist like Playment. You’ll just need to share the data, a few gold standard examples and labeling guidelines. Playment takes care of the rest. We label training data for mid to large scale enterprises across the world, with enterprise grade SLAs. What that means is that you get assured quality & turn-around-times at scale.
What to choose? When?
There are situations you just do it in-house for a small set, and outsource it when data is huge.
Here’s something we hear our clients say very often:
“Hi team. We need this service BADLY. We are going through MTurk at the moment but our team needs something simpler”
No outsourcing firm or agent can solve scale for 100,000 image annotations in a small amount of time. Crowdsourcing provides scale. But traditional crowdsourcing platforms like Amazon Mechanical Turk is mere a microtasks freelancing marketplace where all the effort of task creation, worker incentivization, QA is the task creator. This is like eating a pie from the crusty end!
A fully-managed solution like Playment, is a hybrid model to annotate images for training data options. Using Human+AI, we completely automate the annotation process through sending tasks to the right cloud labor, which leads to superior object detection.
Here’s, How Playment Works
From determining the crowd capacity, creating workflows to handling task design, instructions, qualifying/managing/paying annotators, and QA, this approach requires the least amount of effort from the customer (by far). With guaranteed enterprise SLAs, you get better quality than in-house annotator with the scale and speed of crowdsourcing, minus the time and effort at your end.