Authors:
(1) Diptarka Chakraborty, National University of Singapore, Singapore
(2) Sourav Chakraborty, Indian Statistical Institute, Kolkata;
(3) Gunjan Kumar, National University of Singapore, Singapore;
(4) Kuldeep S. Meel, University of Toronto, Toronto. Table of Links Abstract and 1 Introduction 2 Notations and Preliminaries 3 Related Work 4 An Efficient One-Round Adaptive Algorithm and 4.1 High-Level Overview 4.2 Algorithm Description 4.3 Technical Analysis 5 Conclusion 6 Acknowledgements and References A Missing Proofs B An O(log log n)-query fully adaptive algorithm Abstract 1 Introduction Evaluating different properties of an unknown object is a fundamental challenge in statistics. When dealing with large objects, it becomes essential to determine these properties by making only a limited number of queries to the object. In the case of unknown objects being probability distributions, the goal is to assess whether the input distribution(s) possess specific properties or deviate significantly (i.e., “ε-far” for some ε > 0) from meeting them. All of this needs to be accomplished while minimizing the number of queries made (also known as query complexity) to the distribution(s). Probability distributions are crucial subjects of study, and distribution testing has remained central to sublinear algorithms and modern data analysis since its introduction [GGR98, GR11, BFR+13]. Early investigations into distribution testing primarily utilized the SAMP query model, which only allows drawing samples from the given distribution(s). However, for testing many interesting properties, the SAMP model proves to be restrictive, as evidenced by strong polynomial (in domain size) lower bounds on the sample complexity. To overcome this limitation, several alternative query models have been proposed over the past decade. Among these models, the conditional sampling model (COND) [CFGM13, CRS14] has been extensively studied. This model permits drawing samples from the input distribution(s) conditioned on any arbitrary subset of the domain. Various distribution testing problems have been explored under the COND model [FJO+15, KT19, Nar21, CKM23] and certain variants of it like subcube conditioning model [BC18, CCK+21, CJLW21]. Moreover, the COND model and its variants have recently found applications in the areas like formal methods and machine learning (e.g., [CM19, MPC20, GJM22]). In this work, we study the equivalence testing problem, one of the most fundamental problems in distribution testing. In this problem, given query access to two (unknown) distributions P and Q, the objective is to decide whether they are equal or ε-far from each other in the total variation distance. For this problem, a query-optimal algorithm is already known in the COND model. However, the primary challenge with the query-efficient algorithm/tester in the COND model is its inherent sequential or adaptive nature. A tester is considered non-adaptive if it can generate all its queries based solely on the input parameter (in this case, the domain size) and its internal randomness, without relying on previous query responses. Non-adaptive testers are generally favored in practical situations because they can make multiple queries simultaneously. Exploring the balance between the degree of adaptivity and query complexity is a captivating area of research. This curiosity prompted [CG18] to delve deeper into adaptive testing, introducing a nuanced approach by permitting a limited number of adaptive stages or rounds. This multi-stage or bounded-round adaptivity concept finds resonance in various other problems, including group testing [DHH00, DMT13, EBE20], submodular function maximization [BS18, CQ19], compressed sensing and sparse recovery [NSWZ18, KP19], multi-armed bandits problem [AAAK17]. In this work, we initiate the study of bounded-round adaptivity in the context of the equivalence testing problem and provide a query-efficient one-round adaptive tester. This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license. Authors: (1) Diptarka Chakraborty, National University of Singapore, Singapore (2) Sourav Chakraborty, Indian Statistical Institute, Kolkata; (3) Gunjan Kumar, National University of Singapore, Singapore; (4) Kuldeep S. Meel, University of Toronto, Toronto. Authors: Authors: (1) Diptarka Chakraborty, National University of Singapore, Singapore (2) Sourav Chakraborty, Indian Statistical Institute, Kolkata; (3) Gunjan Kumar, National University of Singapore, Singapore; (4) Kuldeep S. Meel, University of Toronto, Toronto. Table of Links Abstract and 1 Introduction Abstract and 1 Introduction 2 Notations and Preliminaries 2 Notations and Preliminaries 3 Related Work 3 Related Work 4 An Efficient One-Round Adaptive Algorithm and 4.1 High-Level Overview 4 An Efficient One-Round Adaptive Algorithm and 4.1 High-Level Overview 4.2 Algorithm Description 4.2 Algorithm Description 4.3 Technical Analysis 4.3 Technical Analysis 5 Conclusion 5 Conclusion 6 Acknowledgements and References 6 Acknowledgements and References A Missing Proofs A Missing Proofs B An O(log log n)-query fully adaptive algorithm B An O(log log n)-query fully adaptive algorithm Abstract 1 Introduction Evaluating different properties of an unknown object is a fundamental challenge in statistics. When dealing with large objects, it becomes essential to determine these properties by making only a limited number of queries to the object. In the case of unknown objects being probability distributions, the goal is to assess whether the input distribution(s) possess specific properties or deviate significantly (i.e., “ε-far” for some ε > 0) from meeting them. All of this needs to be accomplished while minimizing the number of queries made (also known as query complexity) to the distribution(s). Probability distributions are crucial subjects of study, and distribution testing has remained central to sublinear algorithms and modern data analysis since its introduction [GGR98, GR11, BFR+13]. Early investigations into distribution testing primarily utilized the SAMP query model, which only allows drawing samples from the given distribution(s). However, for testing many interesting properties, the SAMP model proves to be restrictive, as evidenced by strong polynomial (in domain size) lower bounds on the sample complexity. To overcome this limitation, several alternative query models have been proposed over the past decade. Among these models, the conditional sampling model (COND) [CFGM13, CRS14] has been extensively studied. This model permits drawing samples from the input distribution(s) conditioned on any arbitrary subset of the domain. Various distribution testing problems have been explored under the COND model [FJO+15, KT19, Nar21, CKM23] and certain variants of it like subcube conditioning model [BC18, CCK+21, CJLW21]. Moreover, the COND model and its variants have recently found applications in the areas like formal methods and machine learning (e.g., [CM19, MPC20, GJM22]). In this work, we study the equivalence testing problem, one of the most fundamental problems in distribution testing. In this problem, given query access to two (unknown) distributions P and Q, the objective is to decide whether they are equal or ε-far from each other in the total variation distance. For this problem, a query-optimal algorithm is already known in the COND model. However, the primary challenge with the query-efficient algorithm/tester in the COND model is its inherent sequential or adaptive nature. A tester is considered non-adaptive if it can generate all its queries based solely on the input parameter (in this case, the domain size) and its internal randomness, without relying on previous query responses. Non-adaptive testers are generally favored in practical situations because they can make multiple queries simultaneously. Exploring the balance between the degree of adaptivity and query complexity is a captivating area of research. This curiosity prompted [CG18] to delve deeper into adaptive testing, introducing a nuanced approach by permitting a limited number of adaptive stages or rounds. This multi-stage or bounded-round adaptivity concept finds resonance in various other problems, including group testing [DHH00, DMT13, EBE20], submodular function maximization [BS18, CQ19], compressed sensing and sparse recovery [NSWZ18, KP19], multi-armed bandits problem [AAAK17]. In this work, we initiate the study of bounded-round adaptivity in the context of the equivalence testing problem and provide a query-efficient one-round adaptive tester. This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license. This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license. available on arxiv

Part of HackerNoon's growing list of open-source research papers, promoting free access to academic material.

Explore The Power of Bounded Adaptivity Through Equivalence Testing

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

A Data-centric Approach to Class-specific Bias in Image Data Augmentation: Abstract and Intro

Understand The Distributions Used In Equivalence Testing

SAMP Model Vs COND Model In Equivalence Testing

EquivTester Samples In Equivalence Testing Algorithm

Technical Analysis Of The Correctness And Complexity Of EquivTester

How to Obtain A Fully Adaptive Algorithm With Sample Complexity O˜(log log n)

A Data-centric Approach to Class-specific Bias in Image Data Augmentation: Abstract and Intro

Understand The Distributions Used In Equivalence Testing

SAMP Model Vs COND Model In Equivalence Testing

EquivTester Samples In Equivalence Testing Algorithm

Technical Analysis Of The Correctness And Complexity Of EquivTester

How to Obtain A Fully Adaptive Algorithm With Sample Complexity O˜(log log n)

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps