Composition of Dierential Privacy Privacy Amplication by Subsampling Thomas Steinke

2025-04-27 0 0 927.29KB 70 页 10玖币

侵权投诉

Composition of Diﬀerential Privacy &

Privacy Ampliﬁcation by Subsampling

Thomas Steinke∗

October 27, 2022

Abstract

This chapter is meant to be part of the book “Diﬀerential Privacy for Artiﬁcial

Intelligence Applications.” We give an introduction to the most important property of

diﬀerential privacy – composition: running multiple independent analyses on the data

of a set of people will still be diﬀerentially private as long as each of the analyses is

private on its own – as well as the related topic of privacy ampliﬁcation by subsampling.

This chapter introduces the basic concepts and gives proofs of the key results needed to

apply these tools in practice.

∗Google Research ............................................................... steinke@google.com

arXiv:2210.00597v4 [cs.CR] 26 Oct 2022

Contents

1 Introduction 3

2 Basic Composition 4

2.1 Is Basic Composition Optimal? . . . . . . . . . . . . . . . . . . . . . . . . . 4

3 Privacy Loss Distributions 6

3.1 Privacy Loss of Gaussian Noise Addition . . . . . . . . . . . . . . . . . . . . 8

3.2 Statistical Hypothesis Testing Perspective . . . . . . . . . . . . . . . . . . . 9

3.3 Approximate DP & the Privacy Loss Distribution . . . . . . . . . . . . . . . 11

4 Composition via the Privacy Loss Distribution 14

4.1 Concentrated Diﬀerential Privacy . . . . . . . . . . . . . . . . . . . . . . . . 18

4.2 Adaptive Composition & Postprocessing . . . . . . . . . . . . . . . . . . . . 25

4.3 Composition of Approximate (ε, δ)-DP ..................... 28

5 Asymptotic Optimality of Composition 33

6 Privacy Ampliﬁcation by Subsampling 39

6.1 Subsampling for Pure or Approximate DP . . . . . . . . . . . . . . . . . . . 39

6.2 Addition or Removal versus Replacement for Neighbouring Datasets . . . . . 42

6.3 Subsampling & Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

6.4 R´enyi Diﬀerential Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

6.5 Sharp Privacy Ampliﬁcation by Poisson Subsampling for R´enyi DP . . . . . 49

6.6 Analytic R´enyi DP Bound for Privacy Ampliﬁcation by Poisson Subsampling 53

6.7 How to Use Privacy Ampliﬁcation by Subsampling . . . . . . . . . . . . . . 59

7 Historical Notes & Further Reading 60

Acknowledgements 63

References 64

1 Introduction

Our data is subject to many diﬀerent uses. Many entities will have access to our data,

including government agencies, healthcare providers, employers, technology companies, and

ﬁnancial institutions. Those entities will perform many diﬀerent analyses that involve our

data and those analyses will be updated repeatedly over our lifetimes. The greatest risk

to privacy is that an attacker will combine multiple pieces of information from the same

or diﬀerent sources and that the combination of these will reveal sensitive details about us.

Thus we cannot study privacy leakage in a vacuum; it is important that we can reason about

the accumulated privacy leakage over multiple independent analyses.

As a concrete example to keep in mind, consider the following simple diﬀerencing attack:

Suppose your employer provides healthcare beneﬁts. The employer pays for these beneﬁts and

thus may have access to summary statistics like how many employees are currently receiving

pre-natal care or currently are being treated for cancer. Your pregnancy or cancer status is

highly sensitive information, but intuitively the aggregated count is not sensitive as it is not

speciﬁc to you. However, this count may be updated on a regular basis and your employer

may notice that the count increased on the day you were hired or on the day you took oﬀ for

a medical appointment. This example shows how multiple pieces of information – the date of

your hire or medical appointment, the count before that date, and the count afterwards –

can be combined to reveal sensitive information about you, despite each piece of information

seeming innocuous on its own. Attacks could combine many diﬀerent statistics from multiple

sources and hence we need to be careful to guard against such attacks, which leads us to

diﬀerential privacy.

Diﬀerential privacy has strong composition properties – if multiple independent analyses

are run on our data and each analysis is diﬀerentially private on its own, then the combination

of these analyses is also diﬀerentially private. This property is key to the success of diﬀerential

privacy. Composition enables building complex diﬀerentially private systems out of simple

diﬀerentially private subroutines. Composition allows the re-use data over time without

fear of a catastrophic privacy failure. And, when multiple entities use the data of the same

individuals, they do not need to coordinate to prevent an attacker from learning private

details of individuals by combining the information released by those entities. To prevent the

above diﬀerencing attack, we could independently perturb each count to make it diﬀerentially

private; then taking the diﬀerence of two counts would be suﬃciently noisy to obscure your

pregnancy or cancer status.

Composition is quantitative. The diﬀerential privacy guarantee of the overall system will

depend on the number of analyses and the privacy parameters that they each satisfy. The

exact relationship between these quantities can be complex. There are various composition

theorems that give bounds on the overall parameters in terms of the parameters of the parts

of the system. In this chapter, we will study several composition theorems (including the

relevant proofs) and we will also look at some examples that demonstrate how to apply the

composition theorems and why we need them.

Composition theorems provide privacy bounds for a given system. A system designer

must use composition theorems to design systems that simultaneously give good privacy and

good utility (i.e., good statistical accuracy). This process often called “privacy budgeting” or

“privacy accounting.” Intuitively, the system designer has some privacy constraint (i.e., the

overall system must satisfy some ﬁnal privacy guarantee) which can be viewed as analogous to

a monetary budget that must be divided amongst the various parts of the system. Composition

theorems provide the accounting rules for this budget. Allocating more of the budget to

some part of the system makes that part more accurate, but then less budget is available

for other parts of the system. Thus the system designer must also make a value judgement

about which parts of the system to prioritize.

2 Basic Composition

The simplest composition theorem is what is known as basic composition. This applies to

pure

-DP (although it can be extended to approximate (

ε, δ

)-DP). Basic composition says

that, if we run

independent

-DP algorithms, then the composition of these is

kε

-DP. More

generally, we have the following result.

Theorem 1

(Basic Composition)

Let

M1, M2,··· , Mk

Xn→ Y

be randomized algo-

rithms. Suppose

εj

-DP for each

j∈

[

]. Deﬁne

Xn→ Yk

(

) =

(

)

, M2

(

)

,··· , Mk

(

)), where each algorithm is run independently. Then

-DP for

ε=Pk

j=1 εj.

Proof.

Fix an arbitrary pair of neighbouring datasets

x, x0∈ Xn

and output

y∈ Yk

. To

establish that

-DP, we must show that

e−ε≤P[M(x)=y]

P[M(x0)=y]≤eε

. By independence, we

have

P[M(x) = y]

P[M(x0) = y]=Qk

j=1 P[Mj(x) = yj]

j=1 P[Mj(x0) = yj]=

j=1

P[Mj(x) = yj]

P[Mj(x0) = yj]≤

j=1

eεj=ePk

j=1 εj=eε,

where the inequality follows from the fact that each

εj

-DP and, hence,

e−εj≤

P[Mj(x)=yj]

P[Mj(x0)=yj]≤eεj. Similarly, Qk

j=1

P[Mj(x)=yj]

P[Mj(x0)=yj]≥Qk

j=1 e−εj, which completes the proof.

Basic composition is already a powerful result, despite its simple proof; it establishes the

versatility of diﬀerential privacy and allows us to begin reasoning about complex systems in

terms of their building blocks. For example, suppose we have

functions

f1,··· , fk

Xn→R

each of sensitivity 1. For each

j∈

[

], we know that adding

Laplace

/ε

) noise to the value

(

) satisﬁes

-DP. Thus, if we add independent

Laplace

/ε

) noise to each value

(

)

for all

j∈

[

], then basic composition tells us that releasing this vector of

noisy values

satisﬁes

kε

-DP. If we want the overall system to be

-DP, then we should add independent

Laplace(k/ε) noise to each value fj(x).

2.1 Is Basic Composition Optimal?

If we want to release

values each of sensitivity 1 (as above) and have the overall release

-DP, then, using basic composition, we can add

Laplace

(

k/ε

) noise to each value. The

variance of the noise for each value is 2

k2/ε2

, so the standard deviation is

√2k/ε

. In other

words, the scale of the noise must grow linearly with the number of values

if the overall

privacy and each value’s sensitivity is ﬁxed. It is natural to wonder whether the scale of the

Laplace noise can be reduced by improving the basic composition result. We now show that

this is not possible.

For each

j∈

[

], let

Xn→R

be the algorithm that releases

(

) with

Laplace

(

k/ε

)

noise added. Let

Xn→Rk

be the composition of these

algorithms. Then

ε/k

-DP

for each

j∈

[

] and basic composition tells us that

-DP. The question is whether

satisﬁes a better DP guarantee than this – i.e., does

satisfy

ε∗

-DP for some

ε∗< ε

Suppose we have neighbouring datasets

x, x0∈ Xn

such that

(

) =

(

) + 1 for each

j∈[k]. Let y= (a, a, ··· , a)∈Rkfor some a≥maxk

j=1 fj(x). Then

P[M(x) = y]

P[M(x0) = y]=Qk

j=1 P[fj(x) + Laplace(k/ε) = yj]

j=1 P[fj(x0) + Laplace(k/ε) = yj]

j=1

P[Laplace(k/ε) = yj−fj(x)]

P[Laplace(k/ε) = yj−fj(x0)]

j=1

2kexp −ε

k|yj−fj(x)|

2kexp −ε

k|yj−fj(x0)|

j=1

exp −ε

k(yj−fj(x))

exp −ε

k(yj−fj(x0))(yj≥fj(x) and yj≥fj(x0))

j=1

exp ε

k(fj(x)−fj(x0))

= exp ε

j=1

(fj(x)−fj(x0))!=eε.

This shows that basic composition is optimal. For this example, we cannot prove a better

guarantee than what is given by basic composition.

Is there some other way to improve upon basic composition that circumvents this example?

Note that we assumed that there are neighbouring datasets

x, x0∈ Xn

such that

(

) =

(

) + 1 for each

j∈

[

]. In some settings, no such worst case datasets exist. In that case,

instead of scaling the noise linearly with

, we can scale the Laplace noise according to the

sensitivity ∆1:= sup x,x0∈X n

neighbouring Pk

j=1 |fj(x)−fj(x0)|.

Instead of adding assumptions to the problem, we will look more closely at the example

above. We showed that there exists some output

y∈Rd

such that

P[M(x)=y]

P[M(x0)=y]

eε

. However,

such outputs

are very rare, as we require

yj≥max{fj

(

)

, fj

(

)

}

for each

j∈

[

] where

(

) +

Laplace

(

k/ε

). Thus, in order to observe an output

such that the likelihood

ratio is maximal, all of the

Laplace noise samples must be positive, which happens with

probability 2

−k

. The fact that outputs

with maximal likelihood ratio are exceedingly rare

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

CompositionofDierentialPrivacy&PrivacyAmplicationbySubsamplingThomasSteinke*October27,2022AbstractThischapterismeanttobepartofthebookDierentialPrivacyforArticialIntelligenceApplications."Wegiveanintroductiontothemostimportantpropertyofdierentialprivacy{composition:runningmultipleindependentana...

展开>> 收起<<

Composition of Dierential Privacy Privacy Amplication by Subsampling Thomas Steinke.pdf

共70页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Composition of Dierential Privacy Privacy Amplication by Subsampling Thomas Steinke

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: