
Digital Threats: Research and Practice, , Ying Yuan, Giovanni Apruzzese, and Mauro Conti
(e.g. [
16
,
35
]). As a result, recent reports [
38
,
57
] focusing on the integration of ML in practice reveal that: “I Never
Thought About Securing My Machine Learning Systems” [
26
]. This is not surprising: if ML can be so easily broken,
then why invest resources in increasing its security through –unreliable– defenses?
Sovereign entities (e.g., [
3
,
4
]) are endorsing the development of “trustworthy” ML systems; yet, any enhancement
should be economically justied. No system is foolproof (ML-based or not [
29
]), and guaranteeing protection against
omnipotent attackers is an enticing but unattainable objective. In our case, a security system should increase the cost
incurred by an attacker to achieve their goal [
66
]. Real attackers have a cost/benet mindset [
99
]: they may try to evade
a detector, but only if doing so yields positive returns. In reality, worst-case scenarios are an exception—not the norm.
Our paper is inspired by several recent works that pointed out some ‘inconsistencies’ in the adversarial attacks carried
out by prior studies. Pierazzi et al. [
78
] observe that real attackers operate in the “problem-space”, i.e., the perturbations
they can introduce are subject to physical constraints. If such constraints are not met, and hence the perturbation
is introduced in the “feature-space” (e.g., [
68
]), then there is a risk of generating an adversarial example that is not
physically realizable [
92
]. Apruzzese et al. [
14
], however, highlight that even ‘impossible’ perturbations can be applied,
but only if the attacker has internal access to the data-processing pipeline of the target system. Nonetheless, Biggio
and Roli suggest that ML security should focus on “anticipating the most likely threats” [
24
]. Only after proactively
assessing the impact of such threats a suitable countermeasure can be developed—if required.
We aim to promote the development of secure ML systems. However, meeting Biggio and Roli’s recommendation
presents two tough challenges for research papers. First, it is necessary to devise a realistic threat model which portrays
adversarial attacks that are not only physically realizable, but also economically viable. Devising such a threat model,
however, requires a detailed security analysis of the specic cyberthreat addressed by the detector—while factoring the
resources that attackers are willing to invest. Second, it is necessary to evaluate the impact of the attack by crafting the
corresponding perturbations. Doing so is dicult if the threat model assumes an attacker operating in the problem-space,
because such perturbations must be applied on raw-data, i.e., before any preprocessing occurs—which is hard to nd.
In this paper, we tackle both of these challenges. In particular, we focus on ML-systems for Phishing Website
Detection (PWD). Countering phishing – still a major threat today [
8
,
53
] – is an endless struggle. Blocklists can be
easily evaded [
91
], and to cope against adaptive attackers some detectors are equipped with ML (e.g. [
90
]). Yet, as shown
by Liang et al. [
61
], even such ML-PWD can be “cracked” by oblivious attackers—if they invest enough eort to reverse
engineer the entire ML-PWD. Indeed, we address ML-PWD because prior work (e.g., [
23
,
40
,
59
,
85
]) assumed threat
models that hardly resemble a real scenario. Phishing, by nature, is meant to be cheap [
54
] and most attempts end up
in failure [
71
]. It is unlikely
1
that a phisher invests many resources just to evade ML-PWD: even if a website is not
detected, the user may be ‘hooked’, but is not ‘phished’ yet. As a result, the state-of-the-art on adversarial ML for PWD
is immature—from a pragmatic perspective.
Contribution and Organization. Let us explain how we aim to spearhead the security enhancements to ML-PWD.
We begin by introducing the fundamentals concepts (PWD, ML, and adversarial ML) at the base of our paper in §2,
which also serves as a motivation. Then, we make the following four contributions.
•
We formalize the evasion-space of adversarial attacks against ML-PWD (§3), rooted in exhaustive analyses of a
generic ML-PWD. Such evasion-space explains ‘where’ a perturbation can be introduced to fool a ML-PWD. Our
formalization highlights that even adversarial samples created by direct feature manipulation can be realistic,
validating all the attacks performed by past work.
1It is unlikely, but not impossible. Hence, as recommended by Arp et al [20], it is positive that such cases have also been studied by prior work.
2