Sample-Optimal Locally Private Hypothesis Selection and the Provable Benefits of Interactivity
arXiv:2312.05645v2 Announce Type: replace
Abstract: We study the problem of hypothesis selection under the constraint of local differential privacy. Given a class $mathcal{F}$ of $k$ distributions and a set of i.i.d. samples from an unknown distribution $h$, the goal of hypothesis selection is to pick a distribution $hat{f}$ whose total variation distance to $h$ is comparable with the best distribution in $mathcal{F}$ (with high probability). We devise an $varepsilon$-locally-differentially-private ($varepsilon$-LDP) algorithm that uses $Thetaleft(frac{k}{alpha^2min {varepsilon^2,1}}right)$ samples to guarantee that $d_{TV}(h,hat{f})leq alpha + 9 min_{fin mathcal{F}}d_{TV}(h,f)$ with high probability. This sample complexity is optimal for $varepsilon<1$, matching the lower bound of Gopi et al. (2020). All previously known algorithms for this problem required $Omegaleft(frac{klog k}{alpha^2min { varepsilon^2 ,1}} right)$ samples to work.
Moreover, our result demonstrates the power of interaction for $varepsilon$-LDP hypothesis selection. Namely, it breaks the known lower bound of $Omegaleft(frac{klog k}{alpha^2min { varepsilon^2 ,1}} right)$ for the sample complexity of non-interactive hypothesis selection. Our algorithm breaks this barrier using only $Theta(log log k)$ rounds of interaction.
To prove our results, we define the notion of emph{critical queries} for a Statistical Query Algorithm (SQA) which may be of independent interest. Informally, an SQA is said to use a small number of critical queries if its success relies on the accuracy of only a small number of queries it asks. We then design an LDP algorithm that uses a smaller number of critical queries.