consideration of baseline covariates (features) [
Kal17
]. However, by considering covariates for each
individual, and using additional assumptions of smoothness, substantial gains can be made in terms
of the variance of the treatment effect estimate via alternative assignment procedures. The most
common approach attempts to minimize imbalance, i.e., the difference between the baseline covariates
in the treatment and control groups [ADR21,Kal17,MR12].
While experimental designs that minimize imbalance increase the power of an experiment for
a given pool of subjects, there are many practical applications where the experimenter wishes to
minimize the total number of subjects who are placed into the experiment. For example, in medicine,
clinical trials may carry nontrivial risk to patients. Within industrial applications, experiments may
carry substantial costs in terms of testing changes, which decrease the quality of the user experience,
or have direct monetary costs.
In this paper, we examine the problem of selecting a subset of
s
individuals from a larger
population and assigning treatments such that the estimated treatment effect has a small error. We
consider two different estimands: individual treatment effect (ITE) and average treatment effect
(ATE).
A bit more formally, we represent the
d
-covariates of a population of
n
individuals using
X∈Rn×d
.
We assume that the treatment and control values, denoted by
y1,y0∈Rn
, are functions of the
covariates, i.e.,
y1
=
f
(
X,ζ
ζ
ζ1
)and
y0
=
g
(
X,ζ
ζ
ζ0
)where
ζ
ζ
ζ0,ζ
ζ
ζ1∈Rn
are noise vectors. The ITE for
the
ith
individual is
y1
i−y0
i
and ATE is the average of all the ITE values. We further assume a
linear model, i.e., the functions
f, g
are linear in
X
and
ζ
ζ
ζ1,ζ
ζ
ζ0
. The goal is to pick a subset of
s
individuals and partition this subset into control and treatment groups. For an individual
i
in the
treatment group, we measure
y1
i
, and for an individual
j
in the control, we measure
y0
j
. From this
small set of measurements, we seek to estimate the ITE or ATE over the full population.
Without parametric assumptions, ITE estimation is not feasible [
SJS17
]. We focus on linear
models in particular, since they are important in developing theory. E.g., in the literature on
optimal designs in active learning, much of the foundational theory is built around linear models.
Identifying estimators based on linearity assumptions is an active area of study in the causal inference
literature [HSSZ19,WDTT16].
Our setup is similar to active learning [
Set09
], where the goal is to minimize the number of
individual labels that we access for solving linear regression or other downstream tasks. The key
difference is that we must select both a subset of individuals, and for each
i
, can measure only one
of two labels:
y1
i
or
y0
i
. In particular, ITE estimation can be thought of as solving two simultaneous
active linear regression problems – one for the treatment outcomes and one for the control outcomes.
Thus, standard active learning-based approaches, such as [
CP19
,
CDL13
,
M+11
], fall short. Even
when
s
equals the population size
n
, i.e., when active learning becomes trivial, our problem does
not. We must still pick a partition of the full population into treatment and control groups. Overall,
sample constrained treatment effect estimation by designing efficient randomized controlled trials
has received little attention, compared to various approaches that use observational data, such
as [JTvA+21,QWZ21,SSS+19].
1.1 Our Contributions
For ITE estimation, we propose an algorithm using leverage score sampling [
Woo14
], which is a
popular approach to subset selection for fast linear algebraic computation. For ATE estimation, we
employ a recursive application of a covariate balancing design [
HSSZ19
]. We provide a theoretical
analysis in terms of root mean squared error (ITE) and deviation error (ATE).
2