223 reads New Story

Consensus in Data Annotation: How to Ensure Accuracy and Objectivity

by KeymakrMarch 24th, 2025

FA-AF

Too Long; Didn't Read

Consensus is achieved by gathering the opinions of multiple experts. Google, Tesla, Amazon, and Meta actively use consensus-based annotation to improve AI performance. Google Health applies consensus to enhance diagnostic accuracy. Tesla uses consensus to label data from autopilot cameras.

Company Mentioned

featured image - Consensus in Data Annotation: How to Ensure Accuracy and Objectivity

The consensus method plays a key role in data annotation when it is necessary to ensure high accuracy and reduce subjectivity in labeling. Based on Keymakr’s experience, implementing a consensus approach with multiple experts in specific cases can reduce annotation errors by 30–50%. Consensus minimizes mistakes, automates quality control, and helps create benchmark datasets — especially critical in high-responsibility areas such as medicine and autonomous driving.

Tatiana Verbitskaya, a technical solution architect at Keymakr, talks about how this method works and the projects in which it has been successfully applied.

How It Works

Consensus is achieved by gathering the opinions of multiple experts. When defining “ground truth” data, it is vital to establish an agreed-upon standard of accuracy. Consensus is critical when training a model on subjective data, such as color and shape, or when high accuracy is required. This method is actively used in the early stages when the model has not yet been trained on sufficient data or when additional training is needed, particularly for specific cases (e.g., subjective judgments). Additionally, consensus is critical in large-scale projects, such as annotating data for self-driving cars or monitoring transportation, as it enhances precision while reducing errors.

Key Principles of Consensus:

Odd Number of Experts: To avoid deadlocks, consensus relies on an odd number of annotators, ensuring a definitive outcome even in cases of disagreement.
Disagreement Analysis: This method doesn’t just rely on the majority vote but also considers the frequency of disagreements. If discrepancies are too significant, the data may be flagged for additional review or not even used for the model training.
Error Detection Mechanisms: Even consensus-based data can contain errors if the cases are too subjective and not definitive.

Global technology leaders like Google, Tesla, Amazon, and Meta actively use consensus-based annotation to improve AI model performance. Google Health, for instance, applies multiple radiologist annotations to X-rays to enhance diagnostic accuracy. Tesla uses consensus to label data from autopilot cameras, reducing training errors in autonomous driving. Amazon SageMaker Ground Truth incorporates consensus annotation in NLP, computer vision, and satellite imagery analysis, while Meta employs it for facial and object recognition projects.

Medical Consensus: An Annotation Council

One of the most critical applications of consensus is in medical image annotation for disease diagnosis. Experts say radiologists’ diagnoses can vary by as much as 20–30%, directly impacting patient outcomes. When a consensus-based approach is employed — where multiple radiologists independently annotate images and their inputs are aggregated based on expertise-weighted scoring — annotation accuracy can be improved by up to 40%.

Keymakr actively applies this approach in complex medical projects. As a result, this helps to ensure precise image labeling for AI models trained to detect complex pathologies. Here, the process was built using the Keylabs platform — where you can compare the opinions of several experts, identify discrepancies, and form high-precision datasets. This approach significantly increases the reliability of algorithms used in automated diagnostics, minimizing the risk of wrong diagnosis.

Consensus in Copyright Content Usage Monitoring

Currently, Keymakr collaborates with SoundAware, a company that deploys automated music recognition technology to identify copyrighted music usage. The team reviews 10,000 URLs to assess the presence of copyrighted material.

Video platforms are filled with content that can contain the author’s material, such as music, scenes from movies, or TV show fragments. Due to the vast amount of data and the subjective nature of copyright interpretation, manually analyzing each video is impractical.

However, Keymakr identifies cases where copyrighted content is used or modified in ways automated systems cannot detect reliably yet. These include parodies, fan art, and homages.

To eliminate subjectivity, Keymakr employs a consensus-based approach: each video is evaluated by multiple independent experts who answer the following questions:

Does the video contain copyrighted music?
Does it feature scenes from a movie or TV show?
Has the content been modified, such as through editing or remixing?

Based on the experts' responses, a final decision is made regarding potential copyright issues.

Such projects are essential for enforcing copyright and ensuring rights holders receive fair compensation. Additionally, this process helps companies specializing in content monitoring refine their algorithms and accelerate the detection of copyrighted material.

Consensus in Vehicle and Pedestrian Tracking

Consensus is also widely applied in AI training for autonomous vehicles, particularly in object recognition on roads (e.g., other vehicles, pedestrians, traffic signs). For instance, a camera might capture a pedestrian in motion, and human annotators might disagree on whether the object is a person or a shadow. Consensus ensures precise labeling in such scenarios.

Keymakr team recently worked with analysis of video recorded on cameras to track vehicles. It was necessary to track the vehicle's movement through several cameras at a crossroads and ensure that the system correctly identified the same vehicle in different frames.

The cameras recorded one object (car) at several points. Several experts viewed the video from different cameras. They assessed whether this object is the same car because there could be differences in perception of appearance (for example, by color or brand). The information was used to train the model if five annotators confirmed the object’s identity. Otherwise, such data would have been excluded from the dataset. This has reduced the number of false alarms and increased the accuracy of car recognition systems, which is important for urban safety systems and automatic traffic control systems.

The same approach can be applied to identify people in shopping malls or on the streets. Cameras capture movement by analyzing, for example, the color of clothes, height, or other characteristics. This method is used to:

Enhanced security monitoring
Crime prevention
Retail visitor behavior analysis
Crowd flow assessment in public areas

The Future of Consensus in AI

The future of consensus-based data annotation is promising, particularly as AI models become more complex and data volume grows. The global Data Annotation and Labeling Market is projected to reach $3.6 billion by 2027, and many companies are adopting multi-layered annotation verification to enhance data quality. Studies show that models trained on datasets with consensus annotation demonstrate significantly higher accuracy than models trained on single-source labeling.

Despite the development of automatic annotation and generative AI, the human factor remains key: subjectivity and annotation disagreements necessitate multi-stage validation. Therefore, the consensus method will continue to be used, ensuring data reliability and reducing errors in critical areas such as autonomous systems, medicine, and financial analysis.