An Image Engineer’s Notes, Part 6: How Objective Data and Human Perception Define Image Quality

The Dual Axis of Image Quality Validation – Objective and Subjective

In the field of image engineering, judging the quality of an image (Image Quality, IQ) is never a single-dimensional task. It relies on precise physical measurement data (Objective Validation) and is inseparable from human perception of image aesthetics and experience (Subjective Validation).

In other words, image quality validation is essentially a problem where "physical measurement" and "human visual perception" intertwine. In engineering practice, image quality validation can usually be viewed as a three-layer architecture:

The bottom layer consists of quantifiable physical measurements, the middle layer involves industry benchmark evaluations and official certifications, and the top layer is user subjective perception. This article will delve into these two major validation methodologies and, based on them, introduce mainstream industry benchmark standards and platform certifications.

1. Objective Validation: Quantification and Standardization of Data

Objective validation involves quantifying the physical characteristics of an image through standardized laboratory environments, test charts, and professional instruments. Its core objectives include: providing repeatable test results, establishing comparable metrics across products, and minimizing human subjective interference.

1.1 Standardized Laboratory Environment: The Cornerstone of Precise Measurement

Establishing a standardized and repeatable laboratory test environment is the cornerstone for precise objective validation of image quality. Currently, the most representative test systems in the industry include DXOMARK and VCX Forum.

Lighting System [DXOMARK Multispectral Lighting System (MLS)]: This is a multispectral light source specifically designed for Analyzer systems, capable of precisely simulating spectral distributions from low color temperature A light (2856K) to high color temperature D65 (6500K).

Image Engineering iQ-Flatlight / LE7: This is a light source commonly used in VCX-certified laboratories. The LE7 uses LED technology with 20 independently controllable LED channels, capable of simulating extremely low light environments (< 1 Lux).

Test Charts: Standardized physical reference targets used to evaluate various aspects of an imaging system's performance, such as resolution, color accuracy, and noise. They provide consistent and repeatable visual input for objective measurements.
Automated Test Equipment:
- DXOMARK Six-axis Robot Arm: Used to precisely control camera position, angle, and motion trajectory, achieving highly repeatable tests.
- Image Engineering iQ-Testbench: An automated rail system that precisely controls the distance between the camera and the chart for testing at different focal lengths and focusing distances.

1.2 Core Objective Metrics: Understanding Quality from Data

Objective metrics are the cornerstone of image quality validation, breaking down complex visual perception into quantifiable physical parameters. Understanding the definition, measurement methods, and correlation with human perception of these metrics is key for image engineers to develop "quality intuition."

1.2.1 Resolution & Sharpness

Resolution refers to the ability of an imaging system to capture fine details, typically measured in Line Pairs per Millimeter (LP/mm) or Line Width per Pixel (LW/PH). However, a simple resolution value does not fully reflect the sharpness perceived by the human eye, which is where the concept of sharpness comes in.

MTF (Modulation Transfer Function) & SFR (Spatial Frequency Response): MTF describes the contrast transfer capability of an imaging system at different spatial frequencies, representing a theoretical system characteristic. A higher MTF curve indicates a stronger ability of the system to capture details. SFR is the frequency response curve measured from actual images (e.g., slanted-edge test charts), providing more comprehensive frequency response information. In practice, SFR is often used to approximate MTF, and thus common metrics like MTF50 (the spatial frequency at which contrast drops to 50%) are mostly derived from SFR curves. A higher MTF50 value means the image appears sharper, but excessively high MTF50 may be accompanied by overshoot, leading to unnatural "halos" or "edge enhancement" effects.
Acutance: This is a sharpness metric that is closer to human perception, taking into account the shape of the MTF curve, overshoot, and human eye sensitivity to different spatial frequencies. A higher Acutance value means the image is perceived as clearer and more natural by the human eye. Overall, Acutance can more comprehensively reflect human perception of image sharpness and is a more accurate sharpness metric than a simple MTF50.

1.2.2 Noise & Texture

Noise is unwanted random variation in an image that reduces image purity and detail visibility. Texture refers to regular details in an image, such as fabric weave or skin pores. The challenge in image processing is to suppress noise while preserving or even enhancing texture.

SNR (Signal-to-Noise Ratio): SNR is a metric that measures the ratio of image signal strength to noise strength. A higher SNR value means a purer image. However, SNR is a physical quantity and does not consider human eye sensitivity to different frequency noise.
Visual Noise (or Perceptual Noise): This is a noise metric that is more consistent with human perception, weighting noise based on human eye sensitivity to different spatial frequencies and luminance areas. For example, the human eye is more sensitive to low-frequency noise (e.g., color patches) than high-frequency noise (e.g., fine grain). Visual Noise is typically measured using specific weighting functions (e.g., CIE 1931 Luminance Function) to simulate human eye response.
Texture Loss: During noise suppression, excessive noise reduction algorithms often lead to the loss of image details (especially high-frequency textures), making the image appear smooth and lacking texture. Texture loss is usually measured by analyzing Dead Leaves or other high-frequency texture charts, evaluating the degree to which the system preserves texture details after noise reduction.

1.2.3 Color Accuracy

Color accuracy measures the ability of an imaging system to reproduce true colors. Accurate colors are crucial for the realism and aesthetics of an image.

Delta E (ΔE): This is a metric that measures the perceived difference between two colors. A smaller Delta E value indicates that two colors are closer, and it is harder for the human eye to distinguish them. Commonly used versions in the industry include Delta E 76, Delta E 94, and Delta E 2000. Among them, Delta E 2000 is currently the standard that best matches human visual perception characteristics, as it considers the human eye's varying sensitivity to different color regions. It is worth noting that Delta E differs from the earlier commonly used Delta C (chromaticity difference); Delta E includes the difference in luminance (L), which can more comprehensively reflect the color deviation perceived by the human eye, and is therefore more widely adopted in modern image quality evaluation.
AWB (Auto White Balance) Stability: The goal of automatic white balance is to restore white objects in an image to white under different lighting conditions, ensuring color accuracy. AWB stability evaluates the degree of color shift and recovery speed of the system when switching between different light sources. Unstable AWB can lead to color casts or color jumps in the image.
Color Saturation and Hue Accuracy: In addition to Delta E, the ability of the imaging system to reproduce different color saturations and hues is also evaluated, ensuring that colors are vibrant yet natural, without noticeable color casts.

1.2.4 Dynamic Range & Exposure

Dynamic range refers to the range between the brightest and darkest areas that an imaging system can simultaneously capture. Images with high dynamic range can present more details in both highlights and shadows. Exposure refers to the overall brightness level of an image.

Dynamic Range (DR): Typically expressed in EV (Exposure Value) or dB (decibel), it measures the range of distinguishable details from the darkest to the brightest that an imaging system can capture. Cameras with high dynamic range can better preserve sky details and shadow textures when shooting high-contrast scenes (e.g., backlit).
Tone Mapping Consistency: When the dynamic range of an imaging system exceeds that of the display device, tone mapping is required to compress the image into the display range. Tone mapping consistency evaluates the stability and naturalness of the system's handling of image brightness and contrast under different lighting conditions, avoiding overexposure or underexposure, and loss of detail.
HDR (High Dynamic Range) Performance: For cameras supporting HDR functionality, their ability to process high-contrast scenes in HDR mode is evaluated, including highlight overexposure suppression, shadow detail enhancement, color reproduction, and ghosting control.
Exposure Accuracy and Stability: Evaluates the accuracy and stability of automatic exposure (AE) in different lighting environments, ensuring appropriate image brightness without being too bright or too dark.

1.3 Objective Test Software: A Powerful Tool for Data Analysis

Objective test software is a crucial tool for converting raw image data collected in the laboratory into quantifiable metrics. They provide automated analysis workflows, ensuring data accuracy and consistency.

Imatest: One of the most widely used image quality analysis software in the industry, supporting various test charts and metric analyses, such as MTF, Noise, Color Accuracy, etc. Its powerful customization features make it a valuable tool in the R&D phase.
DXOMARK Analyzer: DXOMARK's proprietary analysis platform, tightly integrated with its test protocols, can automate complex test sequences and generate detailed reports that comply with DXOMARK scoring standards.
Image Engineering iQ-Analyzer: Analysis software commonly used in VCX-certified laboratories, designed specifically for Image Engineering's test charts and equipment, providing highly automated analysis workflows to ensure the transparency and repeatability of VCX evaluations.

2. Subjective Validation: The Final Judgment of Human Perception

The ultimate judgment of image quality comes from the human eye, as the Human Visual System (HVS) is non-linear. Subjective evaluation aims to simulate real user experience and bridge the gap between laboratory data and real-world scenarios.

2.1 Subjective Evaluation Methods: Capturing Perception from Lab to Real-world Scenes

Subjective evaluation is an indispensable part of image quality validation, aiming to simulate the visual experience of real users and capture perceptual differences that are difficult to quantify with objective data. A comprehensive subjective evaluation method should cover the following key dimensions:

2.1.1 Evaluation Environment and Scene Selection

Controlled Lab Environment:
- In the laboratory, precise control of light sources (e.g., DXOMARK MLS, Image Engineering iQ-Flatlight/LE7), color temperature, illuminance, and background ensures the repeatability and consistency of test conditions. This helps isolate variables and accurately evaluate the impact of specific image parameters on subjective perception.
- Common test scenes include: standard static scenes (e.g., portraits, landscapes, still life), low-light scenes, high dynamic range scenes, etc., using standardized props and models.
Real-world Scenes:
- Although laboratory data is precise, it cannot fully simulate complex light changes, object textures, motion blur, and user shooting habits in the real world. Therefore, conducting actual shooting tests in diverse real-world scenes (e.g., outdoor daylight, indoor mixed light, night scenes, sports scenes) is crucial.
- These tests focus more on evaluating the comprehensive performance of the camera in actual use scenarios, such as the stability and accuracy of auto exposure, auto white balance, and auto focus.

2.1.2 Evaluation Process and Standardized Methods

Golden Sample Benchmarking:
- Select one or more recognized "golden samples" (usually industry-leading products or carefully tuned reference images) as benchmarks for subjective evaluation. Images from all products under test will be compared with the Golden Sample to assess their quality differences and merits.
- This helps establish internal quality standards and guides the direction of image tuning.
Side-by-Side (SBS) Comparison:
- Display images from different products or processed by different algorithms side-by-side on a high-quality screen, allowing evaluators to directly observe and compare their differences. This method effectively captures human eye sensitivity to subtle differences.
- Evaluators typically rate or rank independently on dimensions such as resolution, noise, color, contrast, skin tone, and artifacts.
Flicker Test:
- Display two or more images alternately at a very fast speed (e.g., several times per second). The human eye will perceive a "flicker" for any differences between the images. This method is very effective for detecting subtle color shifts, brightness inconsistencies, or noise differences, as the human eye is more sensitive to dynamic changes than static observation.
User Experience (UX) Study:
- In addition to pure image quality, subjective evaluation should also incorporate the user's overall experience. This includes shutter lag, focus speed and accuracy, HDR processing speed, continuous shooting fluidity, intuitive interface, etc.
- Through questionnaires, focus group interviews, eye-tracking, etc., collect user feedback on overall product satisfaction and pain points, combining subjective perception with actual operational feelings.

2.1.3 Perceptual Dimension Decomposition and Scoring Standards

Subjective evaluation typically decomposes image quality into multiple perceptual dimensions and sets detailed scoring standards for each dimension:

Color Rendering: Evaluates color saturation, accuracy, white balance stability, and color consistency under different lighting conditions. Special attention is paid to the naturalness and attractiveness of skin tone.
Detail & Sharpness: Evaluates image clarity, texture preservation, and the presence of artifacts caused by excessive sharpening (e.g., halos, jagged edges).
Noise & Graininess: Evaluates image purity, noise type (e.g., luminance noise, chrominance noise), distribution, and visibility. It also evaluates whether noise reduction algorithms cause detail loss.
Exposure & Contrast: Evaluates whether the overall image brightness is appropriate, the degree of highlight and shadow detail preservation, and the sense of depth and dimensionality of the image.
Artifacts & Distortion: Detects unnatural defects in the image, such as chromatic aberration, distortion, flare, ghosting, and compression artifacts.

2.1.4 Evaluators and Data Analysis

Expert Evaluation: Conducted by experienced image engineers or professional photographers who have a deep understanding of image quality, can accurately identify problems, and provide specific improvement suggestions. This type of evaluation is typically used in the early stages of product development and tuning.
Crowdsourcing / User Study: Collects broader market perception data by recruiting a large number of ordinary users for evaluation. This helps understand product acceptance across different user groups and discover common problems that experts might overlook.
Data Analysis: Statistical analysis of subjective evaluation data, such as calculating mean scores, standard deviations, and correlating with objective data to establish mapping models between objective metrics and subjective perception.

2.2 Validation Iteration Loop: Dual-Loop Model Driving Image Quality Improvement

Image quality validation and optimization are a continuous, iterative process. Especially after integrating objective data and subjective perception, it forms a more complex and efficient "dual-loop iterative model." This model not only covers the precise control of the laboratory but also extends to the user experience in the real world, ensuring optimal product performance in various scenarios.

Inner Loop - Lab & Expert Evaluation:
- Lab Objective Measurement: In a highly controlled laboratory environment, precise objective data collection (e.g., MTF, SNR, Delta E) is performed using standard charts and instruments.
- Algorithm Optimization: Based on objective data analysis results, ISP algorithms (e.g., noise reduction, sharpening, color management) are initially tuned and optimized.
- Expert Subjective Review: Experienced image engineers or professional evaluators conduct rigorous subjective evaluations in a controlled environment, such as Side-by-Side (SBS) Comparison or Flicker Test, to capture subtle perceptual differences and provide specific tuning suggestions. The focus of this loop is rapid iteration and precise problem identification.
Outer Loop - Real-world & User Experience:
- Real-world Field Testing: Apply algorithms optimized in the inner loop to product prototypes, conducting actual shooting tests in diverse real-world scenes (e.g., outdoor, indoor, low light, high dynamic range) to evaluate their stability and robustness in complex environments.
- Algorithm Refinement: Further refine algorithms based on field test results to solve specific problems that may arise in the real world, such as adaptability of auto exposure/white balance, motion blur processing, etc.
- User Experience Study: Collect overall user perception and operational experience feedback on image quality from ordinary users through large-scale user tests, questionnaires, focus groups, etc. This helps discover common problems that experts might overlook and ensures the product meets market expectations.

The connection between Inner and Outer Loops: The results and insights from the expert subjective review in the inner loop serve as important references for real-world field testing, guiding the direction of outer loop tests. Conversely, the results of user experience studies in the outer loop feed back into laboratory objective measurements and algorithm optimization in the inner loop, forming a complete closed loop that drives continuous improvement in image quality. This dual-loop iteration ensures that image quality not only performs excellently in laboratory data but also achieves the best balance in the perceived experience of real users.

3. The Bridge between Objective Metrics and Subjective Perception: Dual-Axis Iteration and Correlation

There is not a simple linear correspondence between objective metrics and subjective perception, but there is a strong correlation. The table below shows how core objective metrics influence the final perception of the human eye:

Objective and subjective validation form a continuously iterative closed loop, jointly driving the improvement of product image quality. This dual-axis process, like a precise converter, transforms cold data into warm perception. Its core mechanism is shown in the figure below:

4. Industry Benchmarks and Official Certifications: Comprehensive Embodiment and Market Threshold

In the practice of image engineering, validation systems can be divided into two main categories: "general benchmark tests" and "application platform certifications." The former focuses on the ultimate physical performance and perceptual modeling of imaging systems, while the latter focuses on user experience and stability in specific application scenarios (e.g., remote collaboration).

4.1 Industry Benchmarks

IEEE P1858 CPIQ (Perception Modeling Oriented):
- Core Philosophy: Establishes a unified evaluation standard based on human visual perception models. The IEEE P1858 CPIQ (Camera Phone Image Quality) standard aims to create a mathematical mapping model from physical measurement to visual perception.
- Technical Characteristics:
  - Introduces JND (Just Noticeable Difference) as a unified quality unit.
  - Maps objective metrics (e.g., SNR, MTF, color error) to Quality Loss.
  - Establishes human visual models, such as Acutance (visual sharpness) and Visual Noise (visual noise).
- Core Objective: To convert complex physical measurements into understandable "perceptual quality differences." For example, 1 JND represents the smallest difference detectable by approximately 75% of observers. Therefore, CPIQ leans more towards a standardized perceptual modeling framework, commonly used in image quality research and industry standard setting.
DXOMARK (Comprehensive User Experience Oriented):
- Core Philosophy: Lab Objective + Real Scene Perceptual. DXOMARK's evaluation system emphasizes the combination of laboratory measurements and real-world scene evaluation, aiming to provide a comprehensive assessment that closely reflects the actual user experience of consumers.
- Technical Characteristics: Its evaluation method combines rigorous laboratory objective measurements with diverse real-world scene perceptual evaluations. The scoring weights are dynamically adjusted based on market usage behavior to reflect consumers' true experiences and preferences in different shooting scenarios (e.g., portraits, low light, HDR, zoom), thereby providing a comprehensive and market-oriented image quality metric.
VCX-Forum (Industrial Transparency Oriented):
- Core Philosophy: Fully automated and highly repeatable. The VCX Forum's design philosophy is to establish a test system that is completely transparent, repeatable, and free from human subjective factors, ensuring industrial-grade objectivity and evaluation efficiency.
- Technical Characteristics: Its evaluation method focuses on standardized, fully automated test processes, and ensures the fairness and reliability of test results through public test frameworks and rigorous repeatability verification. The VCX's design goal is to maximize the objectivity and transparency of test results, making it particularly valued by industry supply chains and operator procurement systems, providing highly reliable image quality evaluation.

4.2 Official Platform Certifications: From Communication Stability to Image Experience

In addition to image quality benchmark tests, many imaging devices need to pass official certifications from specific platforms to enter their ecosystems. These certifications are not aimed at ranking image quality but at ensuring the stability, compatibility, and user experience of devices in actual communication scenarios. Their test logic primarily revolves around Real-Time Communication (RTC) processes, focusing on verifying image behavior and system integration capabilities in conference scenarios.

Microsoft RTC Framework:
- Core: Ensures camera stability and predictability in real-time communication.
- Focus: Image capture frame rate stability, Adaptive Bitrate (ABR), image quality preservation during encoding/decoding, and video stream stability under varying network conditions.
Microsoft Teams Certification:
- Core: Targets enterprise video conferencing scenarios, emphasizing "face visibility" and "call immediacy."
- Focus: Face-centric exposure, low-latency video processes, white balance stability, low-light performance, and USB UVC compatibility.
Zoom Rooms Certification:
- Core: Targets conference room equipment, emphasizing multi-camera collaboration and high-resolution video stability.
- Focus: Consistent exposure and white balance during multi-camera collaboration, stable frame rate and resolution during long calls, and readability of faces and text details under high compression.
Google Meet Certification:
- Core: Emphasizes cross-platform interoperability and natural color reproduction.
- Focus: Cross-device compatibility (Windows, macOS, ChromeOS), naturalness of skin tone and overall color, low-latency video transmission, audio-video synchronization, and video stream stability.
Cisco Webex Certification:
- Core: Targets enterprise-grade collaboration environments, emphasizing enterprise-grade stability and image tonality.
- Focus: Stability during long meetings, dynamic range and image tonality under complex lighting, clarity of documents and text, and consistent image processing in multi-person views.

The table below summarizes the core differences between industry benchmarks and official certifications:

Conclusion: Balancing Data and Perception, Building Image Engineers' Quality Intuition

In summary, image quality validation is a complex and precise interdisciplinary engineering endeavor. It requires us not only to master the objective data of physical measurements but also to deeply understand the subjective subtleties of human visual perception. The "Three-Layer Architecture" (physical measurement, industry benchmarks, user perception) and the "Dual-Loop Iterative Model" (inner loop in the laboratory and outer loop in the real world) elaborated in this article are the practical paths that organically combine these two core elements.

For image engineers, the true challenge and value lie in how to transform cold, objective data into warm and compelling subjective experiences. This is not merely a stacking of technologies but a fusion of art and science, requiring continuous data analysis, rich subjective evaluation experience, and a macroscopic understanding of the overall imaging system architecture to build a unique Image Quality Intuition. Only with this intuition can engineers find the optimal balance between complex performance, cost, and user experience, ultimately creating outstanding imaging products that exceed expectations and truly meet user needs.

Disclaimer: This article's content is based on the author's years of experience in image engineering practice, and all content, including text and images, is based on the author's experience and publicly available information, aiming to provide technical exchange and reference in the field of image quality validation. The standards, test methods, and product names mentioned in this article are for illustrative purposes only and do not represent any form of recommendation or endorsement. All illustrations, unless otherwise specified, are AI-generated. Readers should carefully evaluate relevant information based on their own needs and professional judgment. The author is not responsible for any direct or indirect losses arising from the use of the content of this article.