
predictions for a set of test examples into a fixed number of bins and computes the expected deviation between the true
fraction of positives and predicted probabilities for each bin1.
Precision, Recall, Accuracy.
Given a classifier that makes (binary) predictions
pred(xi)
of the ground truth label
yi
on Mexamples xi, standard classifier quality metrics include:
•Accuracy: the fraction of correctly predicted examples, i.e., |{i: pred(xi) = yi}|/M.
•Recall: the fraction of correctly predicted positive examples (a.k.a. true positive ratio), i.e.,
|{i: pred(xi) = yi= 1}|/|{i:yi= 1}|.
•Precision: the fraction of examples labeled positive that are labeled correctly, i.e.,
|{i: pred(xi) = yi= 1}|/|{i: pred(xi)=1}|.
ROC AUC (Receiver Operating Characteristic Area Under the Curve) is often used to capture the quality of a trained
ML classifier. Plotting TPR against FPR generates the ROC curve, and the ROC AUC (AUC for short) represents the area
under this curve. The AUC can be computed in several equivalent ways. Given a set of labeled examples
E={(x, y)}
with
±1
label
y
, the AUC equals the probability that a uniformly-selected positive example (
y= 1
) is ranked above
a uniformly-selected negative example (
y=−1
). Let
N
be the number of negative examples,
|{(x, −1) ∈E}|
, and
P=|{(x, 1) ∈E}|. Then (Hand & Till, 2001)
AUC = 1
P N P(x,1)∈EP(z,−1)∈EI[s(x)> s(z)] (2)
This expression of AUC simplifies computation, but compares pairs of examples across clients. We avoid such
distributed interactions by evaluating metrics via histograms.
2.2 Privacy models
Federated Privacy via Secure Aggregation (Federated)
assumes that each client holds one (scalar or vector) value
x(i)
. In practice, clients may hold multiple values, but this can easily be handled in our protocols by working with
their sum. In what follows, we assume a single value since this represents the hardest case for ensuring privacy and
security. The Secure Aggregation protocol computes the sum,
X=PM
i=1 x(i)
without revealing any intermediate
values. Various cryptographic protocols provide distributed implementations of Secure Aggregation (Segal et al., 2017;
Bell et al., 2020), or aggregation can be performed by a trusted aggregator, e.g., a server with secure hardware (Zhao
et al., 2021). While secure aggregation provides a practical and simple-to-understand privacy primitive, it does not fully
protect against a knowledgeable adversary. In particular, knowing the inputs of all clients except for one, the adversary
can subtract the values that they already hold. Hence, DP guarantees are sought for additional protection.
Distributed Differential Privacy (DistDP).
The model of differential privacy (DP) requires the output of a computation
to be a sample from a random variable, so for two inputs that are close, their outputs have similar probabilities – in our
case, captured as inputs that vary by the addition or removal of one item (event-level privacy) (Dwork & Roth, 2014).
DP is often achieved by adding calibrated noise from a suitable statistical distribution to the exact answer: in our setting,
this is instatiated by introducing distributed noise at each client which sums to give discrete Laplace noise with variance
O(1/2)
. The resulting (expected) absolute error for count queries is
O(1/)
, and for means of
M
values is
O(1/M)
.
The distributed noise is achieved by sampling from the difference of two (discrete) Pólya distributions (Balle et al.,
2020), which neatly combines with secure aggregation so only the noisy sum is revealed2.
Local Differential Privacy (LocalDP).
Local Differential Privacy can be understood as applying the DP definition at
the level of an individual client: each client creates a message based on their input, so that whatever inputs the client
holds, the probability of producing each possible message is similar. For a set of
M
clients who each hold a binary
bi
,
we can estimate the sum and the mean of
bi
under
-LDP by applying randomized response (Warner, 1965). For the
sum over
M
clients, the variance is
O(M/2)
, and so the absolute error is proportional to
√M/
. After rescaling by
M
, the variance of the mean is
O(1/(M2))
, and so the absolute error is proportional to
1/√M
(Yang et al., 2020b).
These bounds hold for histograms via “frequency oracles”, when each client holds one out of
B
possibilities — we
use Optimal Unary Encoding (Wang et al., 2017) and build a (private) view of the frequency histogram, which can be
normalized to get an empirical PMF.
In what follows, we treat as a fixed parameter close to 1 (for both LocalDP and DistDP cases).
1
Formally, the ECE is defined by Naeini et al. (2015) as
PK
j=1 P(j)|o(j)−e(j)|
, where
P(j)
is the (empirical) probability that an example
falls in the
j
th bucket (out of
K
), while
o(j)
is the true fraction of positive examples that fall in the
j
th bin, and
e(j)
is the fraction predicted by the
model.
2Other privacy noise is possible via Binomial (Dwork et al., 2006) or Skellam (Agarwal et al., 2021) noise addition.
4