Preamble
This post is a high-level exploration of the most common ways of implementing image-based deep learning (often referred to as image-based Artificial Intelligence or AI), basic annotation approaches, types of annotation and levels of automation for this task.
This article is intended to introduce topics that we will dive deeper into in follow up posts. It can be used as a helpful guide for people looking to implement image-based AIs or who are starting their research and coming to grips with the buzzwords being thrown around. For the sake of sanity, we have simplified some of the concepts below.
Image-based AIs are trained using labelled data. This is also referred to as the ‘ground truth’, ‘labelled’ or ‘annotated’ data. There are multiple types of ‘annotations’ for different data science models. They vary and include things like ‘key-point’ annotation, ‘interpolation’, ‘pose estimation’, and so on. For the purpose of this post we will focus on the four most commonly used types of annotation (Figure 1):
Figure 1 — types of annotation (not an exhaustive list)
Object detection — the ‘noise’ is the sand included in the bounding box
[spoiler alert: or should we say had… ;)]
NOTE: the latest method of 'panoptic' annotation is combining semantic and instance segmentation into a single model.
Manual segmentation — label an object in a minute
As you can see, instance and semantic segmentation are time-consuming as one needs to manually outline the exact target object — point for point with a ‘polygon’, or even pixel for pixel with a ‘mask’. This is why it is so error-prone. In fact, the best annotators in the world have a 4–6% error rate while the average person has around 8–9%. This error rate makes a significant difference in the performance of the resulting AI and is often what blocks projects from making it through the proof of concept phase.
Now imagine that the target objects are complex, such as organic cells or mechanical items. Further, what if the margin for error is slim as the consequences of a wrong decision from the model can be dire or even fatal. Usually, in these non-trivial cases, segmentation has the most utility and is required for you to achieve a high-performing model.
70% of the work required to build an image-based AI is annotation work. If you see an AI working in practice (e.g. autonomous driving) then know that it has taken millions of hours for people to create enough labelled data to train that neural network to a point that the team felt confident enough to put it into production. Even then, there is more often than not the need to relabel or label additional data after the model is deployed.
The benefit in automating this manual work is highest when experts are needed to annotate these images. Typical use cases include medical and biological imaging, robotics, quality assurance, advanced materials and agricultural. Think about cases where you are building an AI to assist a human who took many years to become an expert in that domain.
The goal of automation in machine vision is to determine the outline of an object by giving the fewest inputs possible. For this section, we will largely be referring to automating segmentation tasks as this is generally the most labour intensive.
Levels of automation in this context can be outlined as estimating the outline of:
The goal is to accurately estimate the outline of all objects in all images for a given project.
Using classic computer vision methods popularised from the well known ‘OpenCV’ framework, tools known from Photoshop and even some based on novel AI approaches, are tools that are aiming to automate the annotation of a single object as much as possible. Examples of Level 1 tools include:
DEXTR — label a full image in minutes
NOTE: often annotation tools claim ‘automated labelling’ with features like DEXTR. However, it is still a manual tool reliant on being previously trained on generic datasets that gives you a suggestion per object. Don’t get us wrong, this tool is great and has its uses to get to level 1 automation, but it is a far cry from complete ‘automated labelling’.
On this level, you try to annotate all objects in an image in one action. This is close to the current cutting edge of deep learning. The time savings compared to Level 1 is drastic as human input decreases radically. However, this automation requires a higher level of confidence than level 1. The implication is that one starts an annotation project using Level 1 tools until Level 2 tools are ready to be deployed.
Instance segmentation assistant — label a full image in a few seconds
Level 2 automation is achieved with the use of AI assistants. These assistants learn in the background while you annotate. When they have reached a certain confidence score, you as a user can start to use them and get suggestions not only for individual objects but for a complete image. The assistant retrains and improves as more images are complete.
When annotation has been automated to this level, you as a user should be able to annotate a collection of images or even a complete project in a matter of seconds. What is expected here is that you as a user just click a button, and all images in a project get annotated.
Finish an entire dataset in seconds…
Although extremely powerful, using Level 3 tools also come with challenges. For example, if you annotate a dataset containing 10 000 images of animals where 1 000 have already been annotated, and the Level 3 tool has a hard time differentiating between frogs and toads, the 9 000 images that you auto annotate with the tool might have serious quality issues. What should be classified as frogs are now toads and vice versa and the annotations made are unusable. This is a classification error — only one of four types of error that can occur. The others are generating artefacts, inaccurate segmentations or missing objects altogether.
Thus, to use a Level 3 tool, you need to be very certain that the results will be accurate and the error percentage very low (<0.5%). This certainty can be reached by taking the user behaviour into account for level 2 automation, such as making minor or no adjustments to suggestions from level 2 and looking at things like confidence levels.
In Hasty, we are working towards a Level 3 tool but that’s still under development and will need a few more months before introducing it to users. This is where features like our ‘Error finder’ become critical, which will be a topic for a whole new post…