
Learning-Augmented Private Algorithms for Multiple Quantile Release
deriving and analyzing a learning-augmented algorithm for
multiple quantiles we show numerous comparative advan-
tages, including:
1.
Minimal assumptions about the data, in our case even
fewer than needed by the unaugmented baseline.
2.
Existing tools for studying the robustness of algorithms
to noisy predictions (Lykouris & Vassilvitskii,2021).
3.
Co-designing algorithms with predictions together with
methods for learning those predictions from data (Kho-
dak et al.,2022), which we show is crucial for both the
public-private and sequential release settings.
As part of this analysis we derive a learning-augmented ex-
tension of the
ApproximateQuantiles
(AQ) method
of Kaplan et al. (2022) that (nearly) matches its worst-case
guarantees while being much better if a natural measure
Uxpwq
of prediction quality is small. By studying
Ux
, we
make the following contributions to multiple quantiles:
1.
The first robust algorithm, even for one quantile, that
avoids assuming the data is bounded on some interval,
specifically by using a heavy-tailed prior.
2.
A provable way of ensuring robustness to poor priors,
without losing the consistency of good ones.
3.
A novel connection between DP quantiles and censored
regression that leads to (a) a public-private release al-
gorithm and (b) a sequential release scheme, both with
runtime and error guarantees.
Finally, we integrate these techniques to significantly
improve the accuracy of public-private and sequential
quantile release on several real and synthetic datasets.
2. Related work
There has been significant work on incorporating external
information to improve DP methods. A major line of work
is the public-private framework, where we have access
to public data that is related in some way to the private
data (Liu et al.,2021;Amid et al.,2022;Li et al.,2022;Bie
et al.,2022;Bassily et al.,2022). The use of public data
can be viewed as using a prediction, but such work starts by
making (often strong) distributional assumptions on the pub-
lic and private data; we instead derive instance-dependent
upper bounds with minimal assumptions that we then apply
to such public-private settings. Furthermore, our frame-
work allows us to ensure robustness to poor predictions
without distributional assumptions, and to derive learning
algorithms using training data that may itself be sensitive.
Another approach is to treat DP mechanisms (e.g. the
exponential) as Bayesian posterior sampling (Dimitrakakis
et al.,2017;Geumlek et al.,2017;Seeman et al.,2020).
Our work can be viewed as an adaptation where we give
explicit prior-dependent utility bounds. To our knowledge,
no such guarantees exist in the literature. Moreover, while
our focus is quantile estimation, the predictions-based
framework that we advocate is much broader, as many
DP methods—including for multiple quantiles—combine
multiple queries that must be considered jointly.
Our approach for augmenting DP with external information
centers the algorithms with predictions framework, where
past work has focused on using predictions to improve met-
rics related to time, space, and communication complexity.
We make use of existing techniques from this literature, in-
cluding robustness-consistency tradeoffs (Lykouris & Vassil-
vitskii,2021) and the online learning of predictions (Khodak
et al.,2022). Tuning DP algorithms has been an important
topic in private machine learning, e.g. for hyperparameter
tuning (Chaudhuri & Vinterbo,2013) and federated learn-
ing (Andrew et al.,2021), but these have not to our knowl-
edge considered incorporating per-instance predictions.
The specific task we focus on is DP quantiles, a well-studied
problem (Gillenwater et al.,2021;Kaplan et al.,2022), but
we are not aware of work adding outside information. We
also make the important contribution of an effective method
for removing data-boundedness assumptions. Our algorithm
builds upon the state-of-the-art work of Kaplan et al. (2022),
which is also our main source for empirical comparison.
3. Augmenting a private algorithm
The basic requirement for a learning-augmented algorithm
is that the cost
Cxpwq
of running it on an instance
x
with
prediction
w
should be upper bounded—usually up to
constant or logarithmic factors—by a metric
Uxpwq
of the
quality of the prediction on the instance. We denote this by
CxÀUx
. In our work the cost
Cxpwq
will be the error of
a privately released statistic, as compared to some ground
truth. We will use the following privacy notion:
Definition 3.1
(Dwork & Roth (2014))
.
Algorithm
A
is
pε, δq-differentially private
if for all subsets
S
of its range,
PrtApxq P Su ď eεPrtAp˜xq P Su ` δ
whenever
x„˜x
are neighboring, i.e. they differ in at most one element.
Using
ε
-DP to denote
pε, 0q
-DP, the broad goal of this work
will be to reduce the error
Cxpwq
of
ε
-DP multiple quantile
release while fixing the privacy level ε.
3.1. Problem formulation
A good guarantee for a learning-augmented algorithm will
have several important properties that formally separate its
performance from naive upper bounds
UxÁCx
. The first,
consistency, requires it to be a reasonable indicator of strong
performance in the limit of perfect prediction:
Definition 3.2.
A learning-augmented guarantee
CxÀUx
is cx-consistent if Cxpwq ď cxwhenever Uxpwq “ 0.
Here
cx
is a prediction-independent quantity that should
depend weakly or not at all on problem difficulty (in the
case of quantiles, the minimum separation between data
2