Multi-view Representation Learning from Malware to Defend Against Adversarial Variants James Lee Hu

2025-05-02 0 0 1002.72KB 8 页 10玖币
侵权投诉
Multi-view Representation Learning from Malware
to Defend Against Adversarial Variants
James Lee Hu*
Department of Management Information Systems
University of Arizona
Tucson, USA
jameshu@arizona.edu
Mohammadreza Ebrahimi*
School of Information Systems and Management
University of South Florida
Tampa, USA
ebrahimim@usf.edu
Weifeng Li
Department of Management Information Systems
University of Georgia
Athens, USA
weifeng.li@uga.edu
Xin Li*
Department of Computer Science
University of Arizona
Tucson, USA
xinli2@arizona.edu
Hsinchun Chen
Department of Management Information Systems
University of Arizona
Tucson, USA
hsinchun@arizona.edu
Abstract—Deep learning-based adversarial malware detectors
have yielded promising results in detecting never-before-seen mal-
ware executables without relying on expensive dynamic behavior
analysis and sandbox. Despite their abilities, these detectors have
been shown to be vulnerable to adversarial malware variants -
meticulously modified, functionality-preserving versions of origi-
nal malware executables generated by machine learning. Due to
the nature of these adversarial modifications, these adversarial
methods often use a single view of malware executables (i.e.,
the binary/hexadecimal view) to generate adversarial malware
variants. This provides an opportunity for the defenders (i.e.,
malware detectors) to detect the adversarial variants by utilizing
more than one view of a malware file (e.g., source code view
in addition to the binary view). The rationale behind this idea
is that while the adversary focuses on the binary view, certain
characteristics of the malware file in the source code view
remain untouched which leads to the detection of the adversarial
malware variants. To capitalize on this opportunity, we propose
Adversarially Robust Multiview Malware Defense (ARMD), a
novel multi-view learning framework to improve the robustness
of DL-based malware detectors against adversarial variants. Our
experiments on three renowned open-source deep learning-based
malware detectors across six common malware categories show
that ARMD is able to improve the adversarial robustness by up
to seven times on these malware detectors.
Index Terms—Multi-View Learning, Adversarial Machine
Learning, Adversarial Malware Variants, Deep Learning-based
Malware Detectors, Adversarial Robustness
*: Corresponding author
Acknowledgments: This material is based upon work supported by the
National Science Foundation (NSF) under Secure and Trustworthy Cyberspace
(1936370) , Cybersecurity Innovation for Cyberinfrastructure (1917117) , and
Cybersecurity Scholarship-for-Service (1921485) programs.
I. INTRODUCTION
Recent studies have shown deep Learning (DL)-based mal-
ware detectors are susceptible to attacks from Adversarial
Malware Generation (AMG) techniques [1] [3] [4]. These
AMG methods automatically generate adversarial malware
variants to evade a targeted malware detector. The generated
malware samples can be used to improve a malware detector’s
robustness against such attacks [2], [4]. Most AMG meth-
ods employ additive modifications via injecting bytes to the
malware binary to generate functionality-preserving evasive
variants [11]. However, these AMG methods are often limited
to operating on the binary view of a malware sample and
do not account for other views representing malware samples
(e.g., a malware’s source code). Thus, the multi-view (MV)
nature of malware could be leveraged to improve detector
robustness against these adversarial variants. As such, we
expect MV learning to boost adversarial robustness.
MV learning refers to a branch of machine learning models
that processes multiple distinct representations from the same
instance of input data [13]. When applied to malware detec-
tion, MV learning has been shown to significantly improve
malware detection accuracy [14] [15]. These models often
leverage a mechanism to combine different views into a
single representation, known as a fusion mechanism [16].
However, the impact of MV learning, and its different fusion
mechanisms, on detector models’ adversarial robustness is un-
clear. We hypothesize that MV learning can detect adversarial
variants evasive to single-view detectors by extracting features
from malware views untouched by AMG methods. Thus,
1
arXiv:2210.15429v1 [cs.CR] 25 Oct 2022
in this study, we propose Adversarially Robust Multiview
Malware Defense (ARMD), an MV learning framework to
improve the robustness of DL-based malware detectors against
adversarial variants.
In the remainder of this manuscript, first, we review AMG,
DL-based malware detectors, MV learning, fusion mecha-
nisms, and highway layers. Subsequently, we detail the com-
ponents of our proposed framework and its contribution. We
then conduct several experiments to evaluate the performance
of ARMD. Lastly, we highlight promising future directions.
II. LITERATURE REVIEW
Five areas of research are examined. First, we review extant
AMG studies as the overarching area for our study. Second,
we examine DL-based Malware Detectors as an effective
type of AI model to detect malicious samples. Third, we
review MV Learning as a potential way to boost a DL-based
detector’s adversarial robustness. Fourth, we investigate Fusion
Mechanisms to determine their impact on an MV Learning
model’s adversarial robustness. Lastly, we review Highway
Layers as a potential remedy for the shortcomings of existing
fusion mechanisms regarding adversarial robustness.
A. Adversarial Malware Generation (AMG)
AMG aims to perturb malware samples and generate vari-
ants that evade malware detectors. Among the prevailing AMG
methods, append attacks (considered as additive modifications)
are the most practical due to their high chance of preserving
the functionality of the original malware executable [11]. We
summarize selected significant append-based prior work based
on their data source, attack method used, and the view(s) of
the malware sample they operate under in Table I.
Three major observation are made from Table I. First, the
majority of studies use VirusTotal, an open-source online mal-
ware database, as a source of their malware samples [1] [2] [3]
[4] [5] [6] [8] [10] [11] [12]. Second, regarding selected attack
methods, a few notable attack methods include simple append
attack [11], attacking using randomly generated perturbation
[5], and attacking using specific perturbations that lowers a
malware detector’s score [6]. More advanced methods incor-
porate machine learning techniques (Genetic Programming [2]
[7], Gradient Descent [5], and Dynamic Programming [9])
and implement advanced DL-based techniques (Generative
Adversarial Networks [10], Deep Reinforcement Learning [1]
[8] [12], and Generative Recurrent Neural Networks [4] [3]).
Third, and most importantly, most AMG methods only operate
within a single view of the malware. Many of these AMG
methods operate in the binary view [2] [3] [4] [5] [6] [7]
[8] [11]. A few studies delved into AMG attacks on other
views (e.g., the source code view [9] and API call view
[9]). The main exceptions are two Deep RL-based AMG
studies [1] [12]. These two studies include multiple different
perturbations in their RL action space, a few of which results
in a simultaneous binary and source code edit. Overall, we
observe that most AMG studies only operate within a single
view of the malware. As such, when attacking an MV malware
detector, these AMG methods are expected to be rendered
ineffective due to their perturbations only affecting certain
parts of the malware detector’s input.
B. Deep Learning-based (DL-based) Malware Detectors
Fig. 1. MalConv Architecture
DL-based malware detectors have shown high performance
in malware categorization [17] [18] [19]. One such well-known
detector is MalConv, a widely-used open-source DL-based
malware detector operating only in the binary view of the
TABLE I
SELECTED SIGNIFICANT PRIOR RESEARCH ON AMG APPEND ATTACKS AGAINST MALWARE DETECTORS
Year Author(s) Data Source Attack Method View
2021 Ebrahimi et al. [1] VirusTotal Deep RL Binary & Source Code
2021 Demetrio et al. [2] VirusTotal Genetic programming Binary
2021 Hu et al. [3] VirusTotal GPT2 Binary
2020 Ebrahimi et al. [4] VirusTotal Generative RNN Binary
2019 Castro et al. [5] VirusTotal Random perturbations Binary
2019 Chen et al. [6] VirusShare, Malwarebenchmark Enhanced random perturbations Binary
2019 Dey et al. [7] Contagio PDF malware dump Genetic programming Binary
2019 Fang et al. [8] VirusTotal Deep RL Binary
2019 Park et al. [9] Malmig & MMBig Dynamic programming Source Code
2019 Rosenberg et al. [10] VirusTotal GAN API Call
2019 Suciu et al. [11] VirusTotal, Reversing Labs, FireEye Append attack Binary
2018 Anderson et al. [12] VirusTotal Deep RL Binary & Source Code
Note: RNN: Recurrent Neural Network; NN: Neural Network; GAN: Generative Adversarial Network; RL: Reinforcement Learning
2
摘要:

Multi-viewRepresentationLearningfromMalwaretoDefendAgainstAdversarialVariantsJamesLeeHu*DepartmentofManagementInformationSystemsUniversityofArizonaTucson,USAjameshu@arizona.eduMohammadrezaEbrahimi*SchoolofInformationSystemsandManagementUniversityofSouthFloridaTampa,USAebrahimim@usf.eduWeifengLiDepar...

展开>> 收起<<
Multi-view Representation Learning from Malware to Defend Against Adversarial Variants James Lee Hu.pdf

共8页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:8 页 大小:1002.72KB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 8
客服
关注