Image Annotation Types For Computer Vision And Its Use
5,221 reads
5,221 reads

Image Annotation Types For Computer Vision And Its Use Cases

by Andy GoughSeptember 7th, 2019
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Bounding boxes are one of the most commonly used types of image annotations in computer vision. Polygonal segmentation tells a computer vision system where to look for an object. Line annotations are used primarily to delineate boundaries between one part of an image and another. 3D Cuboids are a powerful type of image annotation, similar to bounding boxes in that they distinguish where a classifier should look for objects. The most common image annotation for computer vision systems is the creation of dots/points across an image.

Company Mentioned

Mention Thumbnail

Coins Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - Image Annotation Types For Computer Vision And Its Use Cases
Andy Gough HackerNoon profile picture

There are many types of image annotations for computer vision out there, and each one of these annotation techniques has different applications.

Are you curious about what you can accomplish with these various annotation techniques? Let’s take a look at the different annotation methods used for computer vision applications, along with some unique use cases for these different computer vision annotation types.

Types Of Image Annotation

Before we dive into use cases for computer vision image annotation, we need to be acquainted with the different image annotation methods themselves. Let’s analyze the most common image annotation techniques.

1. Bounding Boxes

Bounding boxes are one of the most commonly used types of image annotation in all of computer vision, thanks in part to their versatility and simplicity. Bounding boxes enclose objects and assist the computer vision network in locating objects of interest. They are easy to create, declared by simply specifying X and Y coordinates for the upper left and bottom right corners of the box. 

The bounding box can be applied to almost any conceivable object, and they can substantially improve the accuracy of an object detection system.

2. Polygonal Segmentation

Another type of image annotation is polygonal segmentation, and the theory behind it is just an extension of the theory behind bounding boxes. Polygonal segmentation tells a computer vision system where to look for an object, but thanks to using complex polygons and not simply a box, the object’s location and boundaries can be determined with much greater accuracy. 

The advantage of using polygonal segmentation over bounding boxes is that it cuts out much of the noise/unnecessary pixels around the object that can potentially confuse the classifier.

3. Line Annotation

Line annotation involves the creation of lines and splines, which are used primarily to delineate boundaries between one part of an image and another. Line annotation is used when a region that needs to be annotated can be conceived of as a boundary, but it is too small or thin for a bounding box or other type of annotation to make sense. 

Splines and lines are easy to create annotations for and commonly used for situations like training warehouse robots to recognize differences between parts of a conveyor belt, or for autonomous vehicles to recognize lanes.

4. Landmark Annotation

A fourth type of image annotation for computer vision systems is landmark annotation, sometimes referred to as dot annotation, owing to the fact that it involves the creation of dots/points across an image. Just a few dots can be used to label objects in images containing many small objects, but it is common for many dots to be joined together to represent the outline or skeleton of an object. 

The size of the dots can be varied, and larger dots are sometimes used to distinguish important/landmark areas from surrounding areas. 

5. 3D Cuboids

3D cuboids are a powerful type of image annotation, similar to bounding boxes in that they distinguish where a classifier should look for objects. However, 3D cuboids have depth in addition to height and width. 

Anchor points are typically placed at the edges of the item, and the space between the anchors is filled in with a line. This creates a 3D representation of the object, which means the computer vision system can learn to distinguish features like volume and position in a 3D space.

6. Semantic Segmentation

Semantic segmentation is a form of image annotation that involves separating an image into different regions, assigning a label to every pixel in an image. 

Regions of an image that carry different semantic meanings/definitions are considered separate from other regions. For example, one portion of an image could be “sky”, while another could be “grass”. The key idea is that regions are defined based on semantic information, and that the image classifier gives a label to every pixel that comprises that region. 

Use Cases For Image Annotation Types

1. Bounding Boxes

Bounding boxes are used in computer vision image annotation for the purpose of helping networks localize objects. Models that localize and classify objects benefit from bounding boxes. Common uses for bounding boxes include any situation where objects are being checked for collisions against each other. 

An obvious application of bounding boxes and object detection is autonomous driving. Autonomous driving systems must be able to locate vehicles on the road, but they could also be applied to situations like tagging objects in construction sites to help analyze site safety and for robots to recognize objects in different environments.

Use cases for bounding boxes include: 

Using drone footage to monitor the progress of construction projects, from the initial laying of foundation all the way through to completion when the house is ready for move in.

Recognizing food products and other items in grocery stores to automate aspects of the checkout process. 

Detecting exterior vehicle damage, enabling detailed analysis of vehicles when insurance claims are made.

2. Polygonal Segmentation

Polygonal segmentation is the process of annotating objects using many complex polygons, allowing the capturing of objects with irregular shapes. When precision is of importance, polygonal segmentation is used over bounding boxes. Because polygons can capture the outline of an object, they eliminate the noise that can be found within a bounding box, something that can potentially throw off the accuracy of the model. 

Polygonal segmentation is useful in autonomous driving, where it can highlight irregularly shaped objects like logos and street signs, and more precisely locate cars compared to the use of bounding boxes to locate cars. Polygonal segmentation is also helpful for tasks where many irregularly shaped objects must be annotated with precision, such as object detection in images collected by satellites and drones. If the goal is to detect objects like water features with precision, polygonal segmentation should be used over bounding boxes.

Notable use cases for polygonal segmentation in computer vision include:

Annotating the many irregularly shaped objects found in cityscapes like cars, trees and pools. 

Polygonal segmentation can also make the detection of objects easier. For instance, Polygon-RNN, a polygon annotation tool sees significant improvement in both speed and accuracy compared to the traditional methods used to annotate irregular shapes, namely semantic segmentation.

3. Line Annotation

Because line annotation concerns itself with drawing attention to lines in an image, it is best used whenever important features are linear in appearance. 

Autonomous driving is a common use case for line annotation, as it an be used to delineate lanes on the road. Similarly, line annotation can be used to instruct industrial robots where to place certain objects, designating a  target zone as between two lines. Bounding boxes could theoretically be used for these purposes, but line annotation is a much cleaner solution, as it avoids much of the noise that comes with using bounding boxes.

Notable computer vision use cases of line annotation include the automatic detection of crop rows and even the tracking of insect leg positions. 

4. Landmark Annotation

Because landmark annotation/dot annotation draws small dots that represent objects, one of its primary uses is in detecting and quantifying small objects. For instance, aerial views of cities may require the use of landmark detection to find objects of interest like cars, houses, trees, or ponds. 

That said, landmark annotation can have other uses as well. Combining many landmarks together can create outlines of objects, like a connect-the-dots puzzle. These dot outlines can be used to recognize facial features or analyze the motion and posture of people.

Common computer vision uses cases for landmark annotation are:

Face Recognition, thanks to the fact that tracking multiple landmarks can make the recognition of emotions and other facial features easier. 

Landmark annotation is also used in the field of biology for geometric morphometrics.

5. 3D Cuboids

3D cuboids are used when a computer vision system doesn’t just need to recognize an object, it must also predict the general shape and volume of that object. Most frequently 3D cuboids are used when a computer vision system is developed for an autonomous system capable of locomotion, as it must make predictions about objects in its surrounding environment. 

Uses cases for 3D cuboids in computer vision include the development of computer vision systems for autonomous vehicles and locomotive robots. 

6. Semantic Segmentation

A potentially unintuitive fact about semantic segmentation is that it’s basically a form of classification, but the classification is just being done on every pixel in a desired region rather than an object. When this is considered, it becomes easy to use semantic segmentation for any task where sizable, discrete regions must be classified/recognized. 

Autonomous driving is one application of semantic segmentation, where the vehicle’s AI must distinguish between sections of road and sections of grass or sidewalk. 

Additional computer vision use cases for semantic segmentation, outside of autonomous driving, include:

Analysis of crop fields to detect weeds and specific crop types. 

Recognition of medical images for diagnosis, cell detection, and blood flow analysis.

Monitoring forests and jungles for deforestation and ecosystem damage to improve conservation efforts.


Almost anything you want to do with computer vision can be accomplished, it's just a matter of selecting the right tools for the job. Now that you’ve become more acquainted with the various types of image annotation and possible use cases for them, the best thing to do is an experiment by implementing them and seeing which annotation techniques work best for your application.