1 The State -of-the-Art in AI-Based Malware Detection Techniques A Review Adam Wolsey

2025-04-30 0 0 994.69KB 18 页 10玖币
侵权投诉
1
The State-of-the-Art in AI-Based Malware Detection
Techniques: A Review
Adam Wolsey
Abstract
Artificial Intelligence techniques have evolved rapidly in recent years, revolutionising the
approaches used to fight against cybercriminals. But as the cyber security field has progressed,
so has malware development, making it an economic imperative to strengthen businesses’
defensive capability against malware attacks. This review aims to outline the state-of-the-art
AI techniques used in malware detection and prevention, providing an in-depth analysis of the
latest studies in this field. The algorithms investigated consist of Shallow Learning, Deep
Learning and Bio-Inspired Computing, applied to a variety of platforms, such as PC, cloud,
Android and IoT. This survey also touches on the rapid adoption of AI by cybercriminals as a
means to create ever more advanced malware and exploit the AI algorithms designed to defend
against them.
1. Introduction
The coronavirus pandemic has dramatically increased worldwide cyber-attacks, a phenomenon
that has increasingly been termed the cyber pandemic’ [1] and is expected to reach USD 10.5
trillion in annual damage costs by 2025 [2]. At the same time, the new business model of
‘working from home’ imposed by the pandemic has substantially increased most organisations’
threat exposure [3].
More alarmingly, prior to the pandemic, an estimated 20% of cyber attackers used previously
unseen malware or attack techniques, with many of these consisting of machine learning models
that adapt to the environment to remain undetected. This proportion rose to 35% during the
pandemic [4]. Combined, these trends indicate a negative direction in cybercrime, set to impact
a large segment of global businesses in all industries and expected to advance at an
unprecedented rate in the next decade.
With the exponential growth of technology, data and computing power, it is fundamental that
more sophisticated tools are used to tackle rising modern problems. As humans cannot handle
the growing complexity on their own, the dependency on AI has become unavoidable. It is
predicted that the market for AI within cybersecurity will grow from USD 3.92 Billion in 2017
to USD 34.81 Billion by 2025 [5].
Furthermore, a survey conducted by the Capgemini Research Institute found that 69% of
organisations think AI is necessary to respond to cyberattacks [6]. Now more than ever, AI is
attracting greater attention from the public and private sectors. However, its power will
2
inevitably fall into the hands of cybercriminals, creating the next generation of AI-powered
malware.
This unprecedented, AI-powered upsurge in cybercriminal capabilities makes it all the more
important for security experts to identify the types of detected malware swiftly and accurately.
But despite significant advances in AI-driven malware detection methods, the current rate of
progress is inadequate, and greater efforts to outpace cybercriminals are required.
Malware (short for malicious software) comes in many categories such as viruses, worms,
spyware, trojans, ransomware, etc. but almost always has the goal of compromising systems
and data or holding a victim to ransom.
Traditionally, malware detection was based entirely on comparing continuous byte sequences
(called signatures’) of a suspected malware file to the signatures of known malware held in a
database. Over time, as newer, ‘polymorphic malware appeared, signature-based detection
became less effective and was superseded by next-generation, heuristic-based and behavioural-
based detection methods relying on machine learning models [7]. Currently, all the top
malware-detection solutions (also called Endpoint Detection and Response EDR) are
underpinned by machine learning algorithms [8].
This literature review aims to present and analyse the latest efforts in developing novel, more
effective ways to use artificial intelligence for malware detection, with the aim of providing
comprehensive guidance for subsequent research.
This review can, therefore, assist researchers in developing an understanding of the malware
detection field, as well as the new developments and research directions explored by the
scientific community to address this complex challenge.
There are numerous papers outlining AI-based malware detection techniques. However,
because this research focuses on the most recent trends in AI-based malware detection, papers
older than 2016 will not be included in the scope of this survey.
The remainder of this paper is structured as follows: Sections 2 and 3 provide an overview of
the main approaches to malware analysis and detection; Section 4 defines feature extraction
and selection; Sections 5, 6 and 7 review the most recent papers using shallow learning, deep
learning and bio-inspired AI algorithms for malware detection in host-based, cloud and IoT
environments; Section 8 presents the state-of-the-art machine learning approaches for Android
malware detection; and Section 9 describes malicious uses of AI for the purpose of designing
malware. Finally, Section 10 provides conclusions, identifying potential challenges and future
directions for the use of AI in malware detection.
3
2. Malware Analysis Approaches
Malware analysis is the foundation of malware detection and essential for developing effective
malware detection techniques. Without malware analysis, which provides insight into the
classification and functionality of the malicious file, detecting malware could not be achieved.
Malware analysis is divided into static, dynamic and hybrid approaches, as described below
[9]:
Static analysis, whereby a suspected malicious file is inspected and analysed without
executing it based on extracted low-level information such as system calls, the control
flow graph, and the data flow graph. Static analysis produces a low number of false
positives; however, it fails to detect unknown malware that uses code obfuscation.
Dynamic analysis, whereby a suspected malicious file is inspected at runtime, usually
within a sandbox (an isolated virtual machine designed for testing purposes, where
malware can be executed without affecting system resources). The advantage is that the
malware can be executed and analysed; therefore, unknown malware can be
successfully detected. However, dynamic analysis is time-consuming and produces a
high level of false positives.
Hybrid analysis, whereby characteristics of static and dynamic analysis are combined
to overcome the challenges of the two.
This paper will refer to these three types of analysis to describe the basis of the methods
employed in state-of-the-art machine learning-based malware detection systems.
3. Malware Detection Techniques
As previously mentioned, malware detection techniques can be classified into three broad
categories: signature-based, heuristic-based, and behaviour-based. These methods rely on
results from malware analysis, and each method has its unique advantages and challenges [9]:
Signature-based detection this technique uses a known list of indicators of
compromise (IOCs), which include specific byte sequences, API calls, file hashes,
malicious domains or network attack patterns. Signature-based detection is, however,
incapable of detecting previously unknown or encrypted malware and does not require
machine learning models.
Behavioural-based detection involves monitoring a suspected executable file in an
isolated environment and collecting all exhibited behaviours, then using methods of
extracting useful features by which a machine learning model can classify the malicious
behaviour.
Heuristic-based detection this technique relies on generating rules based on the
results of the static/dynamic analysis to guide the inspection of the extracted data to
support the proposed malware detection model. Such rules can either be generated
4
manually (relying on the expertise of the security analysts) or automatically, using
machine learning or tools such as YARA.
4. Feature Extraction and Selection
Throughout this paper, the term ‘features’ will be frequently used in relation to various machine
learning algorithms. In machine learning, feature extraction or feature engineering is the
process of transforming raw data into numerical features that can be understood by the
machine learning algorithm. Feature extraction is necessary to improve the model’s
effectiveness, as applying machine learning directly to raw data is generally ineffective.
On the other hand, feature selection is the process of removing unnecessary features to assist
with developing a predictive model.
In a machine learning-based malware detection system, feature extraction and selection are
critical steps in developing the system. Tables II, IV, V and VI below detail the features
extracted by the authors of each paper evaluated in this review.
5. Shallow Learning-Based Classification Methods for Malware Detection
Shallow learning (SL) generally comprises the majority of machine learning models proposed
prior to 2006 and, more specifically, any machine learning models not classified as deep
learning. SL approaches traditionally depend heavily on features manually designed to solve a
given task [10].
Nevertheless, shallow learning algorithms are still widely used, including in cyber security
more broadly and malware detection in particular, as seen in Table II.
Fig. 1 below lists the most commonly used SL-based classification methods.
Fig. 1 Common shallow learning algorithms
Table I provides explanations for the acronyms used in Fig. 1:
摘要:

1TheState-of-the-ArtinAI-BasedMalwareDetectionTechniques:AReviewAdamWolseyAbstractArtificialIntelligencetechniqueshaveevolvedrapidlyinrecentyears,revolutionisingtheapproachesusedtofightagainstcybercriminals.Butasthecybersecurityfieldhasprogressed,sohasmalwaredevelopment,makingitaneconomicimperativet...

展开>> 收起<<
1 The State -of-the-Art in AI-Based Malware Detection Techniques A Review Adam Wolsey.pdf

共18页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:18 页 大小:994.69KB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 18
客服
关注