
Shalev Shaer, Gal Maman, Yaniv Romano
where the function g(a, b)∈[−m, m]is antisymmetric
g(a, b) = −g(b, a), satisfying g(a, b)>0if b > a and
g(a, b1)≥g(a, b2)for b1≥b2. For example, g(a, b) =
m·sign(b−a). The hyper-parameter 0< m ≤1controls
the magnitude of the score. As in the knockoff filter, our
design of gensures it follows the flip sign property, requir-
ing that a swap of the original feature Xtand its dummy
˜
Xtwill flip the sign of Wt(Candes et al., 2018).
Under the alternative, one should interpret a strictly posi-
tive betting score Wt>0as a successful bet, which will
increase our wealth. This means that we gain some evi-
dence that Xtcarries extra predictive power about Ytbe-
yond what is already known in Zt. Analogously, a strictly
negative Wt<0indicates an erroneous bet, which will
reduce our wealth even though the null is false. Crucially,
under the null, Wtwill be zero on average, no matter how
accurate ˆ
ftis. In other words, it is impossible to have a
systematically positive Wtwhen H0is true.
Lemma 2. Under the same conditions as in Lemma 1, if
H0is true then EH0[Wt| Ft−1]=0for all t∈N.
The core idea behind the proof of the above lemma is that,
under the null, Wthas a symmetric distribution about zero
conditional on Ft−1, and thereby its expected value is zero;
see (Ramdas et al., 2020) for a related property of symmet-
ric distributions. In particular, Wtis equally likely to have
positive and negative values, which is a well-known result
in the knockoff literature with the important difference that
in our case we show it holds conditionally on Ft−1.
Armed with the betting score Wtat time t, we turn to define
a test martingale {St:t∈N0}for H0. The martingale
can be thought of as the wealth process, initialized by toy
money S0= 1, and our ultimate goal is to maximize it. We
begin with defining the base martingale as follows:
Sv
t:=
t
Y
j=1
(1 + v·Wj),(5)
where v∈[0,1] is a fixed amount of toy money that we are
willing to risk at step t.1Proposition 2 in Supplementary
Section C.1 shows that {Sv
t:t∈N0}in (5) is a valid test
martingale. As a result, following Ville’s inequality in (3),
one can monitor Sv
tand control the type-I error for any
choice of v∈[0,1]. Importantly, the amount of toy money
vthat we risk when placing the bet affects the power.
The above immediately raises the question of how should
we choose v? Ideally, we want to set the best constant
v∗so that Sv∗
tis maximized under the alternative. The
problem is that we are not allowed to look at the current
1We can set a different vtfor each time step, yet vtmust be
chosen without looking at the current (Xt, Yt, Zt)as otherwise
the test will cease to be valid. Intuitively, in such a case one can
always set vt= 0 when Wtis negative and vt= 1 otherwise, and
increase the wealth regardless on whether the null is true or false.
betting score Wt, so it is impossible to find such an ideal
data-dependent v∗in foresight. As a thought experiment,
consider the simplest choice for gas the sign function for
which Wt∈ {+1,−1}, and suppose we adopt an aggres-
sive betting strategy with v= 1. With this choice, when
we win a bet we will increase Sv
tby the maximal amount
possible at step t. However, if we lose a bet even once, we
will have Sv
t= 0, resulting in a powerless test; to see this,
assign Wt=−1in (5). We give a concrete example that
visualizes this discussion in Supplementary Section C.2.
As a way out, we formulate a powerful betting strategy us-
ing the mixture-method of Shekhar and Ramdas (2021),
which is intimately connected to universal portfolio opti-
mization (Cover, 2011). The mixture-method is defined as
the average over Sv
tfor all v∈[0,1]:
St=Z1
0
Sv
t·h(v)dv, (6)
where h(v)is a probability density function (pdf) whose
support is on the [0,1] interval, e.g., a uniform distribution.
We adopt the mixture method betting strategy to formu-
late our test martingale since it has appealing power prop-
erties, which we discuss soon. Before doing so, however,
we shall first prove that the test martingale in (6) is valid.
The theorem presented below states that by monitoring St
one can safely reject the null the first time Stexceeds 1/α,
while rigorously controlling the type-I error simultaneously
for all optional stopping times. This result holds in finite
samples, without making any modeling assumptions on the
conditional distribution of Y|X, and for any machine
learning model ˆ
ft, which we use to bet against the null. The
proof follows (Shekhar and Ramdas, 2021, Section 2.2).
Theorem 1. Under the same conditions as in Lemma 1, if
the null hypothesis H0is true then for any α∈(0,1),
PH0(∃t:St≥1/α)≤α.
Having established the validity of the test, we turn to dis-
cuss the key advantage of the mixture method betting strat-
egy. The idea behind this approach is that one of the base
martingales Sv
tin (6) must hit the best constant v∗, which,
in turn, drives the average martingale Stupwards. We
demonstrate this visually in Supplementary Section C.2. In
fact, Shekhar and Ramdas (2021) proved that Stis not only
dominated by Sv∗
t, but can also provably form a consistent
test that achieves power one in the limit of infinite data.
Proposition 1 (Shekhar and Ramdas (2021)).If
lim inft→∞ 1
tPt
s=1 Ws>0under the alternative
H1. Then, PH1(∃t:St≥1/α) = 1 for any α∈(0,1).
The condition of lim inft→∞ 1
tPt
s=1 Ws>0implies that
it suffices that only on average the predictive model will be
able to tell apart the original and dummy triplets, so at the
limit of infinite data we will attain a consistent test.