
Hindering Adversarial Attacks with Implicit Network Activation Coding (LINAC)
network initialisation, a novel observation which makes our
LINAC defence possible. (2) The seed of the random num-
ber generator used for initialising and computing INRs is
shown to be an effective and compact private key, since with-
holding this information hinders a suite of standard adver-
sarial attacks widely used for robustness evaluations. (3) We
report our systematic efforts to circumvent the LINAC de-
fence with transfer and a series of adaptive attacks, designed
to expose and exploit potential weaknesses of LINAC. (4) To
the same end we propose the novel Parametric Bypass Ap-
proximation (PBA) attack strategy, valid under our threat
model, and applicable to other defences using secret keys.
We demonstrate its effectiveness by invalidating an existing
key-based defence which was previously assumed robust.
2. Related Work
Adversarial Robustness.
Much progress has been made
towards robust image classifiers along the adversarial train-
ing (Madry et al.,2018) route, which has been extensively
explored and is well reviewed, e.g. in (Schott et al.,2019;
Pang et al.,2020;Gowal et al.,2020;Rebuffi et al.,2021).
While such approaches can be effective against current at-
tacks, a complementary line of work investigates certified
defences, which offer guarantees of robustness around ex-
amples for some well defined sets (Wong & Kolter,2018;
Raghunathan et al.,2018;Cohen et al.,2019). Indeed, many
such works acknowledge the need for complementary ap-
proaches, irrespective of the success of adversarial training
and the well understood difficulties in combining methods
(He et al.,2017). The prolific work on defences against
adversarial perturbations has spurred the development of
stronger attacks (Carlini & Wagner,2017b;Brendel et al.,
2018;Andriushchenko et al.,2020) and standardisation of
evaluation strategies for threat models of interest (Athalye
et al.,2018;Croce & Hein,2020), including adaptive attacks
(Tramer et al.,2020). Alongside the empirical progress to-
wards building robust predictors, this line of research has
yielded an improved understanding of current deep learn-
ing models (Ilyas et al.,2019;Engstrom et al.,2019), the
limitations of effective adversarial robustness techniques
(Jacobsen et al.,2018), and the data required to train them
(Schmidt et al.,2018).
Athalye et al. (2018) show that a number of defences pri-
marily hinder gradient-based adversarial attacks by obfus-
cating gradients. Various forms are identified, such as gra-
dient shattering (Goodfellow et al.,2014), gradient masking
(Papernot et al.,2017), exploding and vanishing gradients
(Song et al.,2018b), stochastic gradients (Dhillon et al.,
2018) and a number of input transformations aimed at coun-
tering adversarial examples, including noise filtering ap-
proaches using PCA or image quilting (Guo et al.,2018),
the Saak transform (Song et al.,2018a), low-pass filtering
(Shaham et al.,2018), matrix estimation (Yang et al.,2019)
and JPEG compression (Dziugaite et al.,2016;Das et al.,
2017;2018). Indeed, many such defences have been pro-
posed, as reviewed by Niu et al. (2020), they have ranked
highly in competitions (Kurakin et al.,2018), and many
have since been shown to be less robust than previously
thought, e.g. by Athalye et al. (2018) and Tramer et al.
(2020), who use adaptive attacks to demonstrate that several
input transformations offer little to no robustness.
To build on such insights, it is worth identifying the “ingre-
dients” essential to the success of adversarial attacks. Most
effective attacks, including adaptive ones, assume the ability
to approximate the outputs of the targeted model for arbi-
trary inputs. This is reasonable when applying the correct
transformation is tractable for the attacker. Hence, deny-
ing access to such computations seems to be a promising
direction for hindering adversarial attacks. AprilPyone &
Kiya (2020;2021b); MaungMaung & Kiya (2021) borrow
standard practice from cryptography and assume that an
attacker has full knowledge of the defence’s algorithm and
parameters, short of a small number of bits which make
up a private key. Another critical “weakness” of such in-
put denoising defences is that they can be approximated
by the identity mapping for the purpose of computing gra-
dients (Athalye et al.,2018). Even complex parametric
approaches, which learn stochastic generative models of
the input distribution, are susceptible to reparameterisation
and Expectation-over-Transformation (EoT) attacks in the
white-box setting. Thus, it is worth investigating whether
non-parametric,lossy and fully deterministic input transfor-
mations exist such that downstream models can still perform
tasks of interest to high accuracy, while known and novel
attack strategies are either ruled out, or at least substantially
hindered, including adaptive attacks.
Implicit Neural Representations.
Neural networks have
been used to parameterise many kinds of signals, see the
work by Sitzmann (2020) for an extensive list, with remark-
able recent advances in scene representations (Mildenhall
et al.,2020) and image processing (Sitzmann et al.,2020).
INRs have been used in isolation per image or scene, not
for generalisation across images. Some exceptions exist
in unsupervised learning, e.g. Skorokhodov et al. (2021)
parameterise GAN decoders such that they directly output
INRs of images, rather than colour intensities for all pixels.
In this paper we show that INRs can be used to discover
functional decompositions of RGB images which enable
comparable generalisation to learning on the original signal
encoding (i.e. RGB).