
1 Introduction
Data-driven predictions inform policy decisions that directly impact individuals. Proponents
argue that by understanding patterns from the past, decisions can be optimized to improve
future outcomes, to the benefit of individuals and institutions [KLMO15]. In the US educational
system, for instance, early warning systems (EWS) have become a key tool used by states to
combat low graduation rates [BB19,US 16]. The rationale for using such systems is clear. Given
a predictor that, for each student, estimates the likelihood of graduation, school districts can
identify high-risk students at a young age, directing resources to improve individuals’ outcomes,
and in turn, the districts’ graduation rates. Despite compelling arguments, reliably predicting
life outcomes remains a largely-unsolved problem in machine learning.
A key challenge in utilizing predictions to inform decisions is that, often, predictions
influence the outcomes they’re meant to forecast. In the education example above, districts
consider predictions of graduation with the intention of effecting graduation outcomes. In this
situation—where predictions determine interventions, which influence outcomes—accuracy can
be a paradoxical notion. If a predictor correctly identifies high risk individuals as likely to suffer
negative outcomes, after successful interventions, the individuals’ outcomes will be positive and
the initial predictions will appear inaccurate. To apply data-driven tools effectively, decision-
makers must resolve an apparent tension between the objectives of forecasting individuals’
outcomes reliably and steering individuals to achieve better outcomes.
Recent work of [PZMH20] introduced performative prediction to contend with the fact that
predictions not only forecast, but also shape the world. Informally, a prediction problem is
performative if the act of prediction influences the distribution on individual-outcome pairs.
From early warning systems, to online content recommendations, to public health advisories:
across many contexts, individuals respond to predictions in a manner that changes the likelihood
of possible outcomes (successful graduation, increased click rate, or decreased disease caseload).
In their original work on the subject, [PZMH20] frame the goal of performative prediction
through loss minimization. In this framing, the ultimate goal is to learn a performatively optimal
decision rule. A decision rule
hpo
is performatively optimal if it achieves the minimal expected
loss (within some class of decision rules H) over the distribution that it induces,
hpo ∈argmin
h∈H
E
(x,y)∼D(h)[`(x,h(x),y)].(1)
Here, D(h) is the distribution over (x,y) pairs observed as response to deploying h.
For generality’s sake, performative prediction makes minimal restrictions on how the distri-
bution may respond to a chosen decision rule. In particular, the choice to deploy a hypothesis
h
, may change the joint distribution (
x, y
)
∼ D
(
h
) over individual-outcome pairs, essentially
arbitrarily.
1
This generality enables us to write a broad range of prediction problems—including
supervised learning [SB14], strategic classification [HMPW16], and causal inference [MMH20]—
as special cases of performative prediction. In all, [PZMH20] establishes a powerful framework
for reasoning about settings where the distribution of examples responds to the predictions.
While powerful, the framework has two noticeable limitations. First, achieving performative
optimality is hard. Without any assumptions on the distributional response
D
(
·
), achieving
performative optimality requires exhaustive search over the hypothesis class
H
. Furthermore,
even under strong structural assumptions on the distributional response and choice of loss
`
, it
1
[PZMH20] assume only a Lipschitzness condition, where similar hypotheses
h
and
h0
give rise to similar
distributions D(h) and D(h0), measured in Wasserstein (earth mover’s) distance.
1