
clean data and operates them in a black-box manner, the backdoors are still retained. Moreover, due
to its independence of model parameters or training data, EVAS is naturally robust against defenses
such as model inspection and input filtering.
To realize EVAS, we define a novel metric based on neural tangent kernel (Chen et al., 2021), which
effectively indicates the exploitable vulnerability of a given arch; further, we integrate this metric into
the NAS-without-training framework (Mellor et al., 2021; Chen et al., 2021). The resulting search
method is able to efficiently identify candidate arches without requiring model training or backdoor
testing. To verify EVAS’s empirical effectiveness, we evaluate EVAS on benchmark datasets and show:
(i) EVAS successfully finds arches with exploitable vulnerability, (ii) the injected backdoors may be
explained by arch-level “shortcuts” that recognize trigger patterns, and (iii) EVAS demonstrates high
evasiveness, transferability, and robustness against defenses. Our findings show the feasibility of
exploiting NAS as a new attack vector to implement previously improbable attacks, raise concerns
about the current practice of NAS in security-sensitive domains, and point to potential directions to
develop effective mitigation.
2 RELATED WORK
Next, we survey the literature relevant to this work.
Neural arch search.
The existing NAS methods can be categorized along search space, search
strategy, and performance measure. Search space – early methods focus on the chain-of-layer
structure (Baker et al., 2017), while recent work proposes to search for motifs of cell structures (Zoph
et al., 2018; Pham et al., 2018; Liu et al., 2019). Search strategy – early methods rely on either random
search (Jozefowicz et al., 2015) or Bayesian optimization (Bergstra et al., 2013), which are limited in
model complexity; recent work mainly uses the approaches of reinforcement learning (Baker et al.,
2017) or neural evolution (Liu et al., 2019). Performance measure – one-shot NAS has emerged as a
popular performance measure. It considers all candidate arches as different sub-graphs of a super-net
(i.e., the one-shot model) and shares weights between candidate arches (Liu et al., 2019). Despite
the intensive research on NAS, its security implications are largely unexplored. Recent work shows
that NAS-generated models tend to be more vulnerable to various malicious attacks than manually
designed ones (Pang et al., 2022; Devaguptapu et al., 2021). This work explores another dimension:
whether it can be exploited as an attack vector to launch new attacks, which complements the existing
studies on the security of NAS.
Backdoor attacks and defenses.
Backdoor attacks inject malicious backdoors into the victim’s
model during training and activate such backdoors at inference, which can be categorized along attack
targets – input-specific (Shafahi et al., 2018), class-specific (Tang et al., 2020), or any-input (Gu
et al., 2017), attack vectors – polluting training data (Liu et al., 2018) or releasing infected models (Ji
et al., 2018), and optimization metrics – attack effectiveness (Pang et al., 2020), transferability (Yao
et al., 2019), or attack evasiveness(Chen et al., 2017). To mitigate such threats, many defenses have
also been proposed, which can be categorized according to their strategies (Pang et al., 2022): input
filtering purges poisoning samples from training data (Tran et al., 2018); model inspection determines
whether a given model is backdoored(Liu et al., 2019; Wang et al., 2019), and input inspection
detects trigger inputs at inference time (Gao et al., 2019). Most attacks and defenses above focus
on backdoors implemented in the space of model parameters. Concurrent to this work, Bober-Irizar
et al. (2022) explore using neural arches to implement backdoors by manually designing “trigger
detectors” in the arches and activating such detectors using poisoning data during training. This work
investigates using NAS to directly search for arches with exploitable vulnerability, which represents a
new direction of backdoor attacks.
3 EVAS
Next, we present EVAS, a new backdoor attack leveraging NAS to find neural arches with exploitable
vulnerability. We begin by introducing the threat model.
2