
Differential privacy (DP) first gained traction as it met the urgent need for rigour and quantifiability in
privacy protection [
14
]. In short, DP bounds the change in the distribution of outputs of a query made
on a dataset under an alteration of one data point. The following definition formalizes this notion.
Definition 1.1
[
14
] A randomized algorithm
A
, taking a dataset consisting of individuals as its input,
is
(, δ)
-differentially private if, for any pair of datasets
S
and
S0
that differ in the record of a single
individual and any event E,
P[A(S)∈E]≤eP[A(S0)∈E] + δ.
When δ= 0,Ais called -differentially private (-DP).
While the notion of
(, δ)
-DP has wide applications [
12
,
16
,
10
,
21
], there are a few notable drawbacks
to this framework. One is the poor interpretability of
(, δ)
-DP: unlike other concepts in machine
learning, DP should not remain a black box. Privacy guarantees are intended for human interpretation
and so must be understandable by the users it affects and by regulatory entities. A second drawback
is
(, δ)
-DP’s inferior composition properties and lack of versatility. Here, “composition” refers to
the ability for DP properties to be inherited when DP algorithms are combined and used as building
blocks. As an example, the training of deep learning models involves gradient evaluations and weight
updates: each of these steps can be treated as a building block. It is natural to expect that a DP learning
algorithm can be built using differentially-private versions of these components. However, the DP
composition properties cannot generally be well characterized within the framework of
(, δ)
-DP,
leading to very loose composition theorems.
To overcome the drawbacks of
(, δ)
-DP, numerous variants have been developed, including the
hypothesis-testing-based
f
-DP [
35
,
13
], the moments-accountant-based Rényi DP [
28
], as well as
concentrated DP and its variants [
9
,
8
]. Despite their very different perspectives, all of these DP
variants can be fully characterized by an infinite union of
(, δ)
-DP guarantees. In particular, there is
a two-way embedding between
f
-DP and the infinite union of
(, δ)
-DP guarantees: any guarantee
provided by an infinite union of
(, δ)
-DP can be fully characterized by
f
-DP and vice visa [
13
].
Consequently, f-DP has the versatility to treat all of the above notions as special cases.
In addition to its versatility,
f
-DP is more interpretable than other DP paradigms because it considers
privacy protection from an attacker’s perspective. Under
f
-DP, an attacker is challenged with the
hypothesis-testing problem
H0:the underlying dataset is Sversus H1:the underlying dataset is S0
and given output of an algorithm
A
, where
S
and
S0
are neighbouring datasets. The harder this
testing problem is, the less privacy leakage
A
has. To see this, consider the dilemma that the attacker
is facing. The attacker must reject either
H0
or
H1
based on the given output of
A
: this means the
attacker must select a subset
R0
of
Range(A)
and reject
H0
if the sampled output is in
R0
(or must
otherwise reject
H1
). The attacker is more likely to incorrectly reject
H0
(in a type I error) when
R0is large. Conversely, if R0is small, the attacker is more likely to incorrectly reject H1(in a type
II error). We say that an algorithm
A
is
f
-DP if, for any
α∈[0,1]
, no attacker can simultaneously
bound the probability of type I error below αand bound the probability of type II error below f(α).
Such fis called a trade-off function and controls the strength of the privacy protection.
The versatility afforded by
f
can be unwieldy in practice. Although
f
-DP is capable of handling
composition and can embed other notions of differential privacy, it is not convenient for repre-
senting safety levels as a curve amenable to human interpretation. Gaussian differential privacy
(GDP), as a parametric family of
f
-DP guarantees, provides a balance between interpretability and
versatility. GDP guarantees are parameterized by a single value
µ
and use the trade-off function
f(α)=ΦΦ−1(1 −α)−µ
, where
Φ
is the cumulative distribution function of the standard normal
distribution. With this choice of
f
, the hypothesis-testing problem faced by the attacker is as hard
as distinguishing between
N(0,1)
and
N(µ, 1)
on the basis of a single observation. Aside from
its visual interpretation, GDP also has unique composition theorems: the composition of a
µ1
- and
µ2
-GDP algorithm is, as expected,
pµ2
1+µ2
2
-GDP. This property can be easily generalized to
n
-fold
composition. GDP also has a special central limit theorem implying that all hypothesis-testing-based
definitions of privacy converge to GDP in terms of a limit in the number of compositions. Readers
are referred to [13] for more information.
2