paint-brush
An Efficient One-Round Adaptive Algorithm In Equivalence Testingby@computational
New Story

An Efficient One-Round Adaptive Algorithm In Equivalence Testing

tldt arrow

Too Long; Didn't Read

This paper presents an O(log log n)-query fully adaptive algorithm. The algorithm is a one-round adaptive tester for the equivalence testing problem in the COND model.
featured image - An Efficient One-Round Adaptive Algorithm In Equivalence Testing
Computational Technology for All HackerNoon profile picture

Authors:

(1) Diptarka Chakraborty, National University of Singapore, Singapore

(2) Sourav Chakraborty, Indian Statistical Institute, Kolkata;

(3) Gunjan Kumar, National University of Singapore, Singapore;

(4) Kuldeep S. Meel, University of Toronto, Toronto.

Abstract and 1 Introduction

2 Notations and Preliminaries

3 Related Work

4 An Efficient One-Round Adaptive Algorithm and 4.1 High-Level Overview

4.2 Algorithm Description

4.3 Technical Analysis

5 Conclusion

6 Acknowledgements and References

A Missing Proofs

B An O(log log n)-query fully adaptive algorithm

4 An Efficient One-Round Adaptive Algorithm

Our main contribution is a one-round adaptive tester for the equivalence testing problem in the COND model that makes at most O˜(log n) queries.



In the remaining part of this paper, we will prove the above theorem. We start with a high-level idea behind our algorithm, and then we provide a formal description of the algorithm with a detailed analysis.

4.1 High-Level Overview

The first attempt to design an equivalence tester is to pick a sample, say i, from P and compare the probability mass P(i) and Q(i) in P and Q respectively. If P and Q are far (in total variation distance), then with enough probability, the probability mass of i in both distributions is significantly different. However, the issue is that estimating the probability mass of the element i can be very expensive. To bypass this issue, the idea is to sample a subset S of the domain, hoping the following:


• the probability mass of S in P is comparable to that of the mass of S in Q, and


• the probability mass of i (in P) is similar to the probability mass of S.


Assuming that all the above two statements hold, one can use conditional sampling (conditioned on S ∪i) to compare P(i)/P(S ∪ i) and Q(i)/Q(S ∪ i). Since, we assumed P(S) is similar to Q(S) so with high enough probability P(i)/P(S ∪ i) and Q(i)/Q(S ∪ i) will be different enough. And since we assumed that P(i) is comparable to P(S), one can estimate P(i)/P(S ∪ i) and can upper bound Q(i)/Q(S ∪ i) using only a few samples, which should be sufficient to distinguish P from Q if the the two distributions are far. But the issue is how to take care of the two assumptions.



• (Case 2) There is large “tail probability,” because of which concentration is not possible. (The notion of tail probability is formalized in Section 4.3).



This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license.