
details referenced in parentheses):
1. Model the training dynamics
(Appendix C.4): Train the network to convergence on the clean data,
saving the network weights and use the empirical neural tangent kernel at this choice of weights as our
model of the network training dynamics.
2. Initialization (Appendix B.2): Use greedy initialization to find an initial set of poison images.
3. Optimization
(Appendices B.1.2 and B.3): Improve the initial set of poison images using a gradient-
based optimizer.
Background on neural tangent kernels:
The NTK of a scalar-valued neural network
f
is the kernel
associated with the feature map
φ
(
x
) =
∇θf
(
x
;
θ
). The NTK was introduced in (Jacot et al., 2018) which
showed that the NTK remains stationary during the training of feed-forward neural networks in the infinite
width limit. When trained with the squared loss, this implies that infinite width neural networks are equivalent
to kernel linear regression with the neural tangent kernel. Since then, the NTK has been extended to other
architectures Li et al. (2019); Du et al. (2019b); Alemohammad et al. (2020); Yang (2020), computed in
closed form Li et al. (2019); Novak et al. (2020), and compared to finite neural networks Lee et al. (2020);
Arora et al. (2019). The closed form predictions of the NTK offer a computational convenience which has
been leveraged for data distillation Nguyen et al. (2020, 2021), meta-learning Zhou et al. (2021), and subset
selection Borsos et al. (2020). For finite networks, the kernel is not stationary and its time evolution has
been studied in (Fort et al., 2020; Long, 2021; Seleznova & Kutyniok, 2022). We call the NTK of a finite
network with
θ
chosen at some point during training the network’s empirical NTK. Although the empirical
NTK cannot exactly model the full training dynamics of finite networks, (Du et al., 2018, 2019a) give some
non-asymptotic guarantees.
Bi-level optimization with NTK:
Let (
Xd,yd
) and (
Xp,yp
) denote the clean and poison training examples,
respectively, (
Xt,yt
) denote clean test examples, and (
Xa,ya
) denote test data with the trigger applied
and the target label. Our goal is to construct poison examples,
Xp
, with target label,
yp
=
ytarget
, that,
when trained on together with clean examples, produce a model which (
i
) is accurate on clean test data
Xt
and (
ii
) predicts the target label for poison test data
Xa
. This naturally leads to the the following bi-level
optimization problem:
min
XpLbackdoorfXta; argmin
θL(f(Xdp;θ),ydp),yta,(1)
where we denote concatenation with subscripts
X>
dp
=
X>
dX>
p
and similarly for
Xta, yta
, and
ydp
. To
ensure our objective is differentiable and to permit closed-form kernel predictions, we use the squared loss
L
(
b
y,y
) =
Lbackdoor
(
b
y,y
) =
1
2
b
y−y
2
2
. Still, such bi-level optimizations are typically challenging to solve
(Bard, 1991, 2013). Differentiating directly through the inner optimization
argminθL
(
f
(
Xdp
;
θ
)
,ydp
) with
respect to the corrupted training data
Xp
is impractical for two reasons: (i) backpropagating through an
iterative process incurs a significant performance penalty, even when using advanced checkpointing techniques
(Walther & Griewank, 2004) and (ii) the gradients obtained by backpropagating through SGD are too noisy
to be useful (Hospedales et al., 2020). To overcome these challenges, we propose to use a closed-form kernel
to model the training dynamics of the neural network. This dramatically simplifies and stabilizes our loss,
which becomes
Lbackdoor(Kdp,dpta,ydpta) = 1
2
y>
dpK−1
dp,dpKdp,ta −yta
2
2,(2)
where we plugged in the closed-form solution of the inner optimization from the kernel linear regression
model, which we can easily differentiate with respect to
Kdp,dpta
. We use
K
:
X × X → R
to denote a kernel
function of choice,
K
(
X, X0
) to denote the
|X| × |X0|
kernel matrix with
K
(
X, X0
)
i,j
=
K
(
Xi, X0
j
), and
subscripts as shorthand for block matrices, e.g.
Ka,dp
=
K(Xa, Xd)K(Xa, Xp)
. This simplification does
not come for free, as kernel-designed poisons might not generalize to the neural network training that we
desire to backdoor. Empirically demonstrating in Section 3 that there is little loss in transferring our attack
to neural network is one of our main goals (see Table 2).
3