Authors:
(1) Wanru Zhao, University of Cambridge, Shanghai AI Laboratory with Equal contribution;
(2) Yaxin Du, Shanghai Jiao Tong University with Equal contribution;
(3) Nicholas D. Lane, University of Cambridge and Flower Labs;
(4) Siheng Chen, Shanghai AI Laboratory and Shanghai Jiao Tong University;
(5) Yanfeng Wang, Shanghai AI Laboratory and Shanghai Jiao Tong University. Table of Links Abstract and Introduction
Motivation and Setup: How low-quality data affects the performance of Collaborative Training
Proposed Workflow for Data Quality Control
Experiments
Conclusion and Future Work, and References
A. Related Work
B. Heterogeneity Settings
C. Experimental Details
D. Ablation study of Unified Scoring with Anchor Data
E. Examples for low-and high- quality Data D ABLATION STUDY OF UNIFIED SCORING WITH ANCHOR DATA We perform a comparative analysis focusing on the number of selected data and the proportion of low-quality data. This comparison contrasts the approach of selection based on a unified scoring from anchor data, as depicted in the fourth bar, with methods that determine the threshold by acknowledging the overall proportion of low-quality data. In Figure 4, the method represented by the second bar involves each client selecting high-quality data by sorting each sample’s scores and eliminating the lowest-scored samples, informed by the proportion of low-quality data within the entire training set of FL. This approach, which does not need a score threshold and relies solely on the known proportion of low-quality data, proves to be impractical and less effective in settings with heterogeneous data quality due to varying data quality compositions among clients. The third bar illustrates the outcomes of implementing a global score threshold aligned with the exact proportion of global low-quality data. This method necessitates the server’s aggregation of all scores from clients within the FL framework and knowing the proportion of low-quality data, potentially leading to privacy concerns. The results indicate that selection by anchor score consistently yields a lower proportion of low-quality data compared to the other two methodologies. This paper is available on arxiv under CC BY 4.0 DEED license. Authors: (1) Wanru Zhao, University of Cambridge, Shanghai AI Laboratory with Equal contribution; (2) Yaxin Du, Shanghai Jiao Tong University with Equal contribution; (3) Nicholas D. Lane, University of Cambridge and Flower Labs; (4) Siheng Chen, Shanghai AI Laboratory and Shanghai Jiao Tong University; (5) Yanfeng Wang, Shanghai AI Laboratory and Shanghai Jiao Tong University. Authors: Authors: (1) Wanru Zhao, University of Cambridge, Shanghai AI Laboratory with Equal contribution; (2) Yaxin Du, Shanghai Jiao Tong University with Equal contribution; (3) Nicholas D. Lane, University of Cambridge and Flower Labs; (4) Siheng Chen, Shanghai AI Laboratory and Shanghai Jiao Tong University; (5) Yanfeng Wang, Shanghai AI Laboratory and Shanghai Jiao Tong University. Table of Links Abstract and Introduction Motivation and Setup: How low-quality data affects the performance of Collaborative Training Proposed Workflow for Data Quality Control Experiments Conclusion and Future Work, and References A. Related Work B. Heterogeneity Settings C. Experimental Details D. Ablation study of Unified Scoring with Anchor Data E. Examples for low-and high- quality Data Abstract and Introduction Abstract and Introduction Motivation and Setup: How low-quality data affects the performance of Collaborative Training Motivation and Setup: How low-quality data affects the performance of Collaborative Training Proposed Workflow for Data Quality Control Proposed Workflow for Data Quality Control Experiments Experiments Conclusion and Future Work, and References Conclusion and Future Work, and References A. Related Work A. Related Work B. Heterogeneity Settings B. Heterogeneity Settings C. Experimental Details C. Experimental Details D. Ablation study of Unified Scoring with Anchor Data D. Ablation study of Unified Scoring with Anchor Data E. Examples for low-and high- quality Data E. Examples for low-and high- quality Data D ABLATION STUDY OF UNIFIED SCORING WITH ANCHOR DATA We perform a comparative analysis focusing on the number of selected data and the proportion of low-quality data. This comparison contrasts the approach of selection based on a unified scoring from anchor data, as depicted in the fourth bar, with methods that determine the threshold by acknowledging the overall proportion of low-quality data. In Figure 4, the method represented by the second bar involves each client selecting high-quality data by sorting each sample’s scores and eliminating the lowest-scored samples, informed by the proportion of low-quality data within the entire training set of FL. This approach, which does not need a score threshold and relies solely on the known proportion of low-quality data, proves to be impractical and less effective in settings with heterogeneous data quality due to varying data quality compositions among clients. The third bar illustrates the outcomes of implementing a global score threshold aligned with the exact proportion of global low-quality data. This method necessitates the server’s aggregation of all scores from clients within the FL framework and knowing the proportion of low-quality data, potentially leading to privacy concerns. The results indicate that selection by anchor score consistently yields a lower proportion of low-quality data compared to the other two methodologies. This paper is available on arxiv under CC BY 4.0 DEED license. This paper is available on arxiv under CC BY 4.0 DEED license. available on arxiv

Part of HackerNoon's growing list of open-source research papers, promoting free access to academic material.

Enhancing Data Quality: Ablation Study of Unified Scoring With Anchor Data

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

A Data-centric Approach to Class-specific Bias in Image Data Augmentation: Abstract and Intro

AI as the "Bad Student" in Class

AI Will Not Kill Quantum Computing

AI's Unstoppable Energy Appetite: A Looming Crisis

Beyond the Algorithm: How Training Data Can Make or Break a Generative AI Model

How GPT-4 Built a New Multimodal Model

A Data-centric Approach to Class-specific Bias in Image Data Augmentation: Abstract and Intro

AI as the "Bad Student" in Class

AI Will Not Kill Quantum Computing

AI's Unstoppable Energy Appetite: A Looming Crisis

Beyond the Algorithm: How Training Data Can Make or Break a Generative AI Model

How GPT-4 Built a New Multimodal Model

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps