Image annotation is one of the most important tasks in computer vision. With numerous applications, computer vision essentially strives to give a machine eyes – the ability to see and interpret the world. At times, machine learning projects seem to unlock futuristic technology we never thought possible. AI-powered applications like augmented reality, automatic speech recognition, and neural machine translation have the potential to change lives and businesses around the world. Likewise, the technologies that computer vision can give us (autonomous vehicles, facial recognition, unmanned drones) are extraordinary.
However, none of these amazing computer vision technologies would be possible without image annotation. This article will explain what image annotation is and five image annotation services provided by numerous training data companies around the world.
Image annotation is the human-powered task of annotating an image with labels. These labels are predetermined by the AI engineer and are chosen to give the computer vision model information about what is shown in the image.
Depending on the project, the amount of labels on each image can vary. Some projects will require only one label to represent the content of an entire image (image classification). Other projects could require multiple objects to be tagged within a single image, each with a different label.
How Does Image Annotation Work?
To create annotated images you need three things:
Most image annotation projects begin with sourcing and training annotators to perform the annotation tasks. AI is a very specialized field, but AI training data annotation doesn’t always have to be. While you need higher education in machine learning to be able to create a self-driving car, you don’t need a master’s degree to draw boxes around cars in images (bounding box annotation). Thus, most annotators don’t have degrees in machine learning.
However, these annotators should be thoroughly trained on the specifications and guidelines of each annotation project, as every company will have different requirements. Once the annotators are trained on how to annotate the data, they will get to work annotating hundreds or thousands of images on a platform dedicated to image annotation. This platform is a software that should have all the necessary tools for the specific type of annotation being performed.
With 2D bounding boxes, annotators must draw a box around the object they want to annotate within the image. Sometimes these target objects will be the same, i.e. “Please draw boxes around every bicycle in this image.”
Other times, there may be more than one target object, “Please draw boxes around every car, pedestrian, and bicycle in this image.” In those cases, after drawing the box, the annotator would then have to choose from a list of labels to attribute to the object within the box.
Also known as cuboids, 3D bounding boxes are almost the same as 2D bounding boxes except that they also can show approximate depth of the target objects being annotated. Similar to 2D bounding box annotations, annotators draw boxes around the target objects, making sure to place anchor points at the object's edges. Sometimes a portion of the target object may be blocked. In such cases, the annotators would approximate the location of the target object’s blocked edge(s).
Whereas bounding boxes deal with annotating multiple objects in an image, Image classification is the process of associating an entire image with just one label. A simple example of image classification is labeling types of animals. Annotators would be given images of animals and asked to classify or categorize each image based on the animal species.
Feeding this annotated image data to a computer vision model would teach the model the visual characteristics unique to each type of animal. In theory, the model would then be able to categorize new unannotated animal images into the proper species categories.
As their title suggests, lines and splines annotation is the labeling of straight or curved lines on images. Annotators would be tasked with annotating lanes, sidewalks, power lines, and other boundary indicators. Images annotated with lines and splines are mainly used for lane and boundary recognition. As well, they are also often used for trajectory planning in drones.
From autonomous vehicles and drones to robotics in warehouses and more, lines and splines annotations are useful in a variety of use cases.
Sometimes target objects with irregular shapes can’t be easily annotated with bounding boxes or cuboids. Polygon annotation allows annotators to plot points on each vertex of the target object. This annotation method allows all of the object’s exact edges to be annotated, regardless of its shape.
Like bounding boxes, the pixels within the annotated edges would then be tagged with a label to describe the target object.
Bounding boxes, cuboids, and polygons all deal with the task of annotating individual objects in an image. However, semantic segmentation is the annotation of every pixel within an image. Instead of giving annotators a list of objects to be annotated, they are given a list of segment labels to divide the image into.
A good example is semantic segmentation in traffic images for autonomous vehicles. A typical semantic segmentation task could ask annotators to “segment the image by vehicles, bicycles, pedestrians, obstacles, sidewalks, roads, and buildings”.
Each segment is usually indicated by a unique color code. Annotators would draw lines around the pixels they want to annotate and select the appropriate label. The end result would look something like this:
Image via medium.com/intro-to-artificial-intelligence
Hopefully, this article helped you understand the basics of five in-demand image annotation services in machine learning. If you’re looking for more reading on image annotation and AI, be sure to check out:
Image Annotation - an overview
Image Annotation for Video Games