Authors:
(1) Joshua P. Ebenezer, Student Member, IEEE, Laboratory for Image and Video Engineering, The University of Texas at Austin, Austin, TX, 78712, USA, contributed equally to this work (e-mail: joshuaebenezer@utexas.edu);
(2) Zaixi Shang, Student Member, IEEE, Laboratory for Image and Video Engineering, The University of Texas at Austin, Austin, TX, 78712, USA, contributed equally to this work;
(3) Yixu Chen, Amazon Prime Video;
(4) Yongjun Wu, Amazon Prime Video;
(5) Hai Wei,  Amazon Prime Video;
(6)Sriram Sethuraman, Amazon Prime Video;
(7) Alan C. Bovik, Fellow, IEEE, Laboratory for Image and Video Engineering, The University of Texas at Austin, Austin, TX, 78712, USA. Table of Links Abstract and Introduction
Related Work
Details of Subjective Study
Subjective Analysis
Objective Assessment
Conclusion, Acknowledgment and References II. RELATED WORK To the best of our knowledge, there do not exist any studies that compare the subjective qualities of videos as the dynamic range, resolution, and compression levels are all varied. Existing databases such as LIVE Livestream [2], LIVE ETRI [3], LIVE YTHFR [4], AVT UHD [5], and APV LBMFR [6] study the subjective quality of professionallygenerated SDR videos under conditions of downsampling, compression, and source distortions. Other datasets including Konvid-1k [7], YouTube UGC [8], and LSVQ [9] study the quality of SDR user-generated content. UGC databases are typically much larger than those that study the quality of professional-grade content because they can be conducted online via crowdsourcing owing to looser requirements on the display devices, resolution, and bitrate. LIVE HDR [10], LIVE AQ HDR [11], and APV HDR Sports [12] are recent databases that study the quality of professionally-created HDR videos that have been downsampled and compressed at various resolutions and bitrates. Each of the above-mentioned databases study the quality of either SDR or HDR videos, but not both., that have been subject to distortions. Here we present the first subjective study that compares the quality of HDR and SDR videos of the same content, that have been processed by downscaling and compression. We conducted the study on a variety of display devices using different technologies and having differing capabilities. While subjective human scores from studies like the one we conducted are considered the gold standards of video quality, conducting such studies is expensive is not scalable. However, objective video quality metrics are designed and trained to automatically predict video quality and can be quite economic and scalable. These fall into two categories: FullReference (FR) and No-Reference (NR) models. FR VQA models require as take as input both pristine and distorted videos to measure the quality of the distorted videos. NR metrics only have access to distorted videos when predicting quality, hence designing them is a more challenging problem. NR VQA models are relevant for video source inspection as well as when measuring quality with no available source video. PSNR measures the peak signal to noise ratio between a reference frame and a distorted version of the same frame. SSIM [13] incorporates luminance, contrast, and structure features to predict the quality of distorted images. VMAF [14] models the statistics of the wavelet coefficients of video frames, as well as the detail losses from distortions. SpEED [15] measures the difference in entropy of bandpass coefficients of reference and distorted videos. STRRED [16] models the statistics of space-time video wavelet coefficients altered by distortions. STGREED [17] measures differences in temporal and spatial entropy arising from distortions to model the quality of videos having varying frame rates and bitrates. BRISQUE [18], VBLIINDS [19], VIDEVAL [20], RAPIQUE [21], ChipQA [22], HDR ChipQA [23] and NIQE [24] are NR video quality metrics that rely on neurostatistical models of visual perception. Pristine videos are known to follow certain regular statistics when processed using visual neural models. Distortions predictably alter the statistics of perceptually processed videos, allowing for the design of accurate VQA models. RAPIQUE combines features developed under these models with (semantic) video features provided by a pre-trained deep network. TLVQM [25] explicitly models common distortions such as compression, blur, and flicker, using a variety of spatial and temporal filters and heuristics. This paper is available on arxiv under CC 4.0 license. Authors: (1) Joshua P. Ebenezer, Student Member, IEEE, Laboratory for Image and Video Engineering, The University of Texas at Austin, Austin, TX, 78712, USA, contributed equally to this work (e-mail: joshuaebenezer@utexas.edu); (2) Zaixi Shang, Student Member, IEEE, Laboratory for Image and Video Engineering, The University of Texas at Austin, Austin, TX, 78712, USA, contributed equally to this work; (3) Yixu Chen, Amazon Prime Video; (4) Yongjun Wu, Amazon Prime Video; (5) Hai Wei,  Amazon Prime Video; (6)Sriram Sethuraman, Amazon Prime Video; (7) Alan C. Bovik, Fellow, IEEE, Laboratory for Image and Video Engineering, The University of Texas at Austin, Austin, TX, 78712, USA. Authors: Authors: (1) Joshua P. Ebenezer, Student Member, IEEE, Laboratory for Image and Video Engineering, The University of Texas at Austin, Austin, TX, 78712, USA, contributed equally to this work (e-mail: joshuaebenezer@utexas.edu); (2) Zaixi Shang, Student Member, IEEE, Laboratory for Image and Video Engineering, The University of Texas at Austin, Austin, TX, 78712, USA, contributed equally to this work; (3) Yixu Chen, Amazon Prime Video; (4) Yongjun Wu, Amazon Prime Video; (5) Hai Wei,  Amazon Prime Video; (6)Sriram Sethuraman, Amazon Prime Video; (7) Alan C. Bovik, Fellow, IEEE, Laboratory for Image and Video Engineering, The University of Texas at Austin, Austin, TX, 78712, USA. Table of Links Abstract and Introduction Related Work Details of Subjective Study Subjective Analysis Objective Assessment Conclusion, Acknowledgment and References Abstract and Introduction Abstract and Introduction Related Work Related Work Details of Subjective Study Details of Subjective Study Subjective Analysis Subjective Analysis Objective Assessment Objective Assessment Conclusion, Acknowledgment and References Conclusion, Acknowledgment and References II. RELATED WORK To the best of our knowledge, there do not exist any studies that compare the subjective qualities of videos as the dynamic range, resolution, and compression levels are all varied. Existing databases such as LIVE Livestream [2], LIVE ETRI [3], LIVE YTHFR [4], AVT UHD [5], and APV LBMFR [6] study the subjective quality of professionallygenerated SDR videos under conditions of downsampling, compression, and source distortions. Other datasets including Konvid-1k [7], YouTube UGC [8], and LSVQ [9] study the quality of SDR user-generated content. UGC databases are typically much larger than those that study the quality of professional-grade content because they can be conducted online via crowdsourcing owing to looser requirements on the display devices, resolution, and bitrate. LIVE HDR [10], LIVE AQ HDR [11], and APV HDR Sports [12] are recent databases that study the quality of professionally-created HDR videos that have been downsampled and compressed at various resolutions and bitrates. Each of the above-mentioned databases study the quality of either SDR or HDR videos, but not both., that have been subject to distortions. Here we present the first subjective study that compares the quality of HDR and SDR videos of the same content, that have been processed by downscaling and compression. We conducted the study on a variety of display devices using different technologies and having differing capabilities. While subjective human scores from studies like the one we conducted are considered the gold standards of video quality, conducting such studies is expensive is not scalable. However, objective video quality metrics are designed and trained to automatically predict video quality and can be quite economic and scalable. These fall into two categories: FullReference (FR) and No-Reference (NR) models. FR VQA models require as take as input both pristine and distorted videos to measure the quality of the distorted videos. NR metrics only have access to distorted videos when predicting quality, hence designing them is a more challenging problem. NR VQA models are relevant for video source inspection as well as when measuring quality with no available source video. PSNR measures the peak signal to noise ratio between a reference frame and a distorted version of the same frame. SSIM [13] incorporates luminance, contrast, and structure features to predict the quality of distorted images. VMAF [14] models the statistics of the wavelet coefficients of video frames, as well as the detail losses from distortions. SpEED [15] measures the difference in entropy of bandpass coefficients of reference and distorted videos. STRRED [16] models the statistics of space-time video wavelet coefficients altered by distortions. STGREED [17] measures differences in temporal and spatial entropy arising from distortions to model the quality of videos having varying frame rates and bitrates. BRISQUE [18], VBLIINDS [19], VIDEVAL [20], RAPIQUE [21], ChipQA [22], HDR ChipQA [23] and NIQE [24] are NR video quality metrics that rely on neurostatistical models of visual perception. Pristine videos are known to follow certain regular statistics when processed using visual neural models. Distortions predictably alter the statistics of perceptually processed videos, allowing for the design of accurate VQA models. RAPIQUE combines features developed under these models with (semantic) video features provided by a pre-trained deep network. TLVQM [25] explicitly models common distortions such as compression, blur, and flicker, using a variety of spatial and temporal filters and heuristics. This paper is available on arxiv under CC 4.0 license. This paper is available on arxiv under CC 4.0 license. available on arxiv

Part of HackerNoon's growing list of open-source research papers, promoting free access to academic material.

HDR or SDR? A Study of Scaled and Compressed Videos: Related Work

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

15 Common Types of Unethical Behavior Found in Open-Source Projects

HDR or SDR? A Study of Scaled and Compressed Videos: Abstract and Introduction

HDR or SDR? A Study of Scaled and Compressed Videos: Objective Assessment

HDR or SDR? A Study of Scaled and Compressed Videos: Conclusion, Acknowledgment, and References

HDR or SDR? A Study of Scaled and Compressed Videos: Details of Subjective Study

HDR or SDR? A Study of Scaled and Compressed Videos: Subjective Analysis

15 Common Types of Unethical Behavior Found in Open-Source Projects

HDR or SDR? A Study of Scaled and Compressed Videos: Abstract and Introduction

HDR or SDR? A Study of Scaled and Compressed Videos: Objective Assessment

HDR or SDR? A Study of Scaled and Compressed Videos: Conclusion, Acknowledgment, and References

HDR or SDR? A Study of Scaled and Compressed Videos: Details of Subjective Study

HDR or SDR? A Study of Scaled and Compressed Videos: Subjective Analysis

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps