1 Introduction
Our data is subject to many different uses. Many entities will have access to our data,
including government agencies, healthcare providers, employers, technology companies, and
financial institutions. Those entities will perform many different analyses that involve our
data and those analyses will be updated repeatedly over our lifetimes. The greatest risk
to privacy is that an attacker will combine multiple pieces of information from the same
or different sources and that the combination of these will reveal sensitive details about us.
Thus we cannot study privacy leakage in a vacuum; it is important that we can reason about
the accumulated privacy leakage over multiple independent analyses.
As a concrete example to keep in mind, consider the following simple differencing attack:
Suppose your employer provides healthcare benefits. The employer pays for these benefits and
thus may have access to summary statistics like how many employees are currently receiving
pre-natal care or currently are being treated for cancer. Your pregnancy or cancer status is
highly sensitive information, but intuitively the aggregated count is not sensitive as it is not
specific to you. However, this count may be updated on a regular basis and your employer
may notice that the count increased on the day you were hired or on the day you took off for
a medical appointment. This example shows how multiple pieces of information – the date of
your hire or medical appointment, the count before that date, and the count afterwards –
can be combined to reveal sensitive information about you, despite each piece of information
seeming innocuous on its own. Attacks could combine many different statistics from multiple
sources and hence we need to be careful to guard against such attacks, which leads us to
differential privacy.
Differential privacy has strong composition properties – if multiple independent analyses
are run on our data and each analysis is differentially private on its own, then the combination
of these analyses is also differentially private. This property is key to the success of differential
privacy. Composition enables building complex differentially private systems out of simple
differentially private subroutines. Composition allows the re-use data over time without
fear of a catastrophic privacy failure. And, when multiple entities use the data of the same
individuals, they do not need to coordinate to prevent an attacker from learning private
details of individuals by combining the information released by those entities. To prevent the
above differencing attack, we could independently perturb each count to make it differentially
private; then taking the difference of two counts would be sufficiently noisy to obscure your
pregnancy or cancer status.
Composition is quantitative. The differential privacy guarantee of the overall system will
depend on the number of analyses and the privacy parameters that they each satisfy. The
exact relationship between these quantities can be complex. There are various composition
theorems that give bounds on the overall parameters in terms of the parameters of the parts
of the system. In this chapter, we will study several composition theorems (including the
relevant proofs) and we will also look at some examples that demonstrate how to apply the
composition theorems and why we need them.
Composition theorems provide privacy bounds for a given system. A system designer
must use composition theorems to design systems that simultaneously give good privacy and
3