Multi-view Representation Learning from Malware to Defend Against Adversarial Variants James Lee Hu

2025-05-02 0 0 1002.72KB 8 页 10玖币

侵权投诉

Multi-view Representation Learning from Malware

to Defend Against Adversarial Variants

James Lee Hu*

Department of Management Information Systems

University of Arizona

Tucson, USA

jameshu@arizona.edu

Mohammadreza Ebrahimi*

School of Information Systems and Management

University of South Florida

Tampa, USA

ebrahimim@usf.edu

Weifeng Li

Department of Management Information Systems

University of Georgia

Athens, USA

weifeng.li@uga.edu

Xin Li*

Department of Computer Science

University of Arizona

Tucson, USA

xinli2@arizona.edu

Hsinchun Chen

Department of Management Information Systems

University of Arizona

Tucson, USA

hsinchun@arizona.edu

Abstract—Deep learning-based adversarial malware detectors

have yielded promising results in detecting never-before-seen mal-

ware executables without relying on expensive dynamic behavior

analysis and sandbox. Despite their abilities, these detectors have

been shown to be vulnerable to adversarial malware variants -

meticulously modiﬁed, functionality-preserving versions of origi-

nal malware executables generated by machine learning. Due to

the nature of these adversarial modiﬁcations, these adversarial

methods often use a single view of malware executables (i.e.,

the binary/hexadecimal view) to generate adversarial malware

variants. This provides an opportunity for the defenders (i.e.,

malware detectors) to detect the adversarial variants by utilizing

more than one view of a malware ﬁle (e.g., source code view

in addition to the binary view). The rationale behind this idea

is that while the adversary focuses on the binary view, certain

characteristics of the malware ﬁle in the source code view

remain untouched which leads to the detection of the adversarial

malware variants. To capitalize on this opportunity, we propose

Adversarially Robust Multiview Malware Defense (ARMD), a

novel multi-view learning framework to improve the robustness

of DL-based malware detectors against adversarial variants. Our

experiments on three renowned open-source deep learning-based

malware detectors across six common malware categories show

that ARMD is able to improve the adversarial robustness by up

to seven times on these malware detectors.

Index Terms—Multi-View Learning, Adversarial Machine

Learning, Adversarial Malware Variants, Deep Learning-based

Malware Detectors, Adversarial Robustness

*: Corresponding author

Acknowledgments: This material is based upon work supported by the

National Science Foundation (NSF) under Secure and Trustworthy Cyberspace

(1936370) , Cybersecurity Innovation for Cyberinfrastructure (1917117) , and

Cybersecurity Scholarship-for-Service (1921485) programs.

I. INTRODUCTION

Recent studies have shown deep Learning (DL)-based mal-

ware detectors are susceptible to attacks from Adversarial

Malware Generation (AMG) techniques [1] [3] [4]. These

AMG methods automatically generate adversarial malware

variants to evade a targeted malware detector. The generated

malware samples can be used to improve a malware detector’s

robustness against such attacks [2], [4]. Most AMG meth-

ods employ additive modiﬁcations via injecting bytes to the

malware binary to generate functionality-preserving evasive

variants [11]. However, these AMG methods are often limited

to operating on the binary view of a malware sample and

do not account for other views representing malware samples

(e.g., a malware’s source code). Thus, the multi-view (MV)

nature of malware could be leveraged to improve detector

robustness against these adversarial variants. As such, we

expect MV learning to boost adversarial robustness.

MV learning refers to a branch of machine learning models

that processes multiple distinct representations from the same

instance of input data [13]. When applied to malware detec-

tion, MV learning has been shown to signiﬁcantly improve

malware detection accuracy [14] [15]. These models often

leverage a mechanism to combine different views into a

single representation, known as a fusion mechanism [16].

However, the impact of MV learning, and its different fusion

mechanisms, on detector models’ adversarial robustness is un-

clear. We hypothesize that MV learning can detect adversarial

variants evasive to single-view detectors by extracting features

from malware views untouched by AMG methods. Thus,

arXiv:2210.15429v1 [cs.CR] 25 Oct 2022

in this study, we propose Adversarially Robust Multiview

Malware Defense (ARMD), an MV learning framework to

improve the robustness of DL-based malware detectors against

adversarial variants.

In the remainder of this manuscript, ﬁrst, we review AMG,

DL-based malware detectors, MV learning, fusion mecha-

nisms, and highway layers. Subsequently, we detail the com-

ponents of our proposed framework and its contribution. We

then conduct several experiments to evaluate the performance

of ARMD. Lastly, we highlight promising future directions.

II. LITERATURE REVIEW

Five areas of research are examined. First, we review extant

AMG studies as the overarching area for our study. Second,

we examine DL-based Malware Detectors as an effective

type of AI model to detect malicious samples. Third, we

review MV Learning as a potential way to boost a DL-based

detector’s adversarial robustness. Fourth, we investigate Fusion

Mechanisms to determine their impact on an MV Learning

model’s adversarial robustness. Lastly, we review Highway

Layers as a potential remedy for the shortcomings of existing

fusion mechanisms regarding adversarial robustness.

A. Adversarial Malware Generation (AMG)

AMG aims to perturb malware samples and generate vari-

ants that evade malware detectors. Among the prevailing AMG

methods, append attacks (considered as additive modiﬁcations)

are the most practical due to their high chance of preserving

the functionality of the original malware executable [11]. We

summarize selected signiﬁcant append-based prior work based

on their data source, attack method used, and the view(s) of

the malware sample they operate under in Table I.

Three major observation are made from Table I. First, the

majority of studies use VirusTotal, an open-source online mal-

ware database, as a source of their malware samples [1] [2] [3]

[4] [5] [6] [8] [10] [11] [12]. Second, regarding selected attack

methods, a few notable attack methods include simple append

attack [11], attacking using randomly generated perturbation

[5], and attacking using speciﬁc perturbations that lowers a

malware detector’s score [6]. More advanced methods incor-

porate machine learning techniques (Genetic Programming [2]

[7], Gradient Descent [5], and Dynamic Programming [9])

and implement advanced DL-based techniques (Generative

Adversarial Networks [10], Deep Reinforcement Learning [1]

[8] [12], and Generative Recurrent Neural Networks [4] [3]).

Third, and most importantly, most AMG methods only operate

within a single view of the malware. Many of these AMG

methods operate in the binary view [2] [3] [4] [5] [6] [7]

[8] [11]. A few studies delved into AMG attacks on other

views (e.g., the source code view [9] and API call view

[9]). The main exceptions are two Deep RL-based AMG

studies [1] [12]. These two studies include multiple different

perturbations in their RL action space, a few of which results

in a simultaneous binary and source code edit. Overall, we

observe that most AMG studies only operate within a single

view of the malware. As such, when attacking an MV malware

detector, these AMG methods are expected to be rendered

ineffective due to their perturbations only affecting certain

parts of the malware detector’s input.

B. Deep Learning-based (DL-based) Malware Detectors

Fig. 1. MalConv Architecture

DL-based malware detectors have shown high performance

in malware categorization [17] [18] [19]. One such well-known

detector is MalConv, a widely-used open-source DL-based

malware detector operating only in the binary view of the

TABLE I

SELECTED SIGNIFICANT PRIOR RESEARCH ON AMG APPEND ATTACKS AGAINST MALWARE DETECTORS

Year Author(s) Data Source Attack Method View

2021 Ebrahimi et al. [1] VirusTotal Deep RL Binary & Source Code

2021 Demetrio et al. [2] VirusTotal Genetic programming Binary

2021 Hu et al. [3] VirusTotal GPT2 Binary

2020 Ebrahimi et al. [4] VirusTotal Generative RNN Binary

2019 Castro et al. [5] VirusTotal Random perturbations Binary

2019 Chen et al. [6] VirusShare, Malwarebenchmark Enhanced random perturbations Binary

2019 Dey et al. [7] Contagio PDF malware dump Genetic programming Binary

2019 Fang et al. [8] VirusTotal Deep RL Binary

2019 Park et al. [9] Malmig & MMBig Dynamic programming Source Code

2019 Rosenberg et al. [10] VirusTotal GAN API Call

2019 Suciu et al. [11] VirusTotal, Reversing Labs, FireEye Append attack Binary

2018 Anderson et al. [12] VirusTotal Deep RL Binary & Source Code

Note: RNN: Recurrent Neural Network; NN: Neural Network; GAN: Generative Adversarial Network; RL: Reinforcement Learning

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Multi-viewRepresentationLearningfromMalwaretoDefendAgainstAdversarialVariantsJamesLeeHu*DepartmentofManagementInformationSystemsUniversityofArizonaTucson,USAjameshu@arizona.eduMohammadrezaEbrahimi*SchoolofInformationSystemsandManagementUniversityofSouthFloridaTampa,USAebrahimim@usf.eduWeifengLiDepar...

展开>> 收起<<

Multi-view Representation Learning from Malware to Defend Against Adversarial Variants James Lee Hu.pdf

共8页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Multi-view Representation Learning from Malware to Defend Against Adversarial Variants James Lee Hu

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: