1 The State -of-the-Art in AI-Based Malware Detection Techniques A Review Adam Wolsey

2025-04-30 1 0 994.69KB 18 页 10玖币

侵权投诉

The State-of-the-Art in AI-Based Malware Detection

Techniques: A Review

Adam Wolsey

Abstract

Artificial Intelligence techniques have evolved rapidly in recent years, revolutionising the

approaches used to fight against cybercriminals. But as the cyber security field has progressed,

so has malware development, making it an economic imperative to strengthen businesses’

defensive capability against malware attacks. This review aims to outline the state-of-the-art

AI techniques used in malware detection and prevention, providing an in-depth analysis of the

latest studies in this field. The algorithms investigated consist of Shallow Learning, Deep

Learning and Bio-Inspired Computing, applied to a variety of platforms, such as PC, cloud,

Android and IoT. This survey also touches on the rapid adoption of AI by cybercriminals as a

means to create ever more advanced malware and exploit the AI algorithms designed to defend

against them.

1. Introduction

The coronavirus pandemic has dramatically increased worldwide cyber-attacks, a phenomenon

that has increasingly been termed the ‘cyber pandemic’ [1] and is expected to reach USD 10.5

trillion in annual damage costs by 2025 [2]. At the same time, the new business model of

‘working from home’ imposed by the pandemic has substantially increased most organisations’

threat exposure [3].

More alarmingly, prior to the pandemic, an estimated 20% of cyber attackers used previously

unseen malware or attack techniques, with many of these consisting of machine learning models

that adapt to the environment to remain undetected. This proportion rose to 35% during the

pandemic [4]. Combined, these trends indicate a negative direction in cybercrime, set to impact

a large segment of global businesses in all industries and expected to advance at an

unprecedented rate in the next decade.

With the exponential growth of technology, data and computing power, it is fundamental that

more sophisticated tools are used to tackle rising modern problems. As humans cannot handle

the growing complexity on their own, the dependency on AI has become unavoidable. It is

predicted that the market for AI within cybersecurity will grow from USD 3.92 Billion in 2017

to USD 34.81 Billion by 2025 [5].

Furthermore, a survey conducted by the Capgemini Research Institute found that 69% of

organisations think AI is necessary to respond to cyberattacks [6]. Now more than ever, AI is

attracting greater attention from the public and private sectors. However, its power will

inevitably fall into the hands of cybercriminals, creating the next generation of AI-powered

malware.

This unprecedented, AI-powered upsurge in cybercriminal capabilities makes it all the more

important for security experts to identify the types of detected malware swiftly and accurately.

But despite significant advances in AI-driven malware detection methods, the current rate of

progress is inadequate, and greater efforts to outpace cybercriminals are required.

Malware (short for malicious software) comes in many categories – such as viruses, worms,

spyware, trojans, ransomware, etc. – but almost always has the goal of compromising systems

and data or holding a victim to ransom.

Traditionally, malware detection was based entirely on comparing continuous byte sequences

(called ‘signatures’) of a suspected malware file to the signatures of known malware held in a

database. Over time, as newer, ‘polymorphic’ malware appeared, signature-based detection

became less effective and was superseded by next-generation, heuristic-based and behavioural-

based detection methods relying on machine learning models [7]. Currently, all the top

malware-detection solutions (also called Endpoint Detection and Response – EDR) are

underpinned by machine learning algorithms [8].

This literature review aims to present and analyse the latest efforts in developing novel, more

effective ways to use artificial intelligence for malware detection, with the aim of providing

comprehensive guidance for subsequent research.

This review can, therefore, assist researchers in developing an understanding of the malware

detection field, as well as the new developments and research directions explored by the

scientific community to address this complex challenge.

There are numerous papers outlining AI-based malware detection techniques. However,

because this research focuses on the most recent trends in AI-based malware detection, papers

older than 2016 will not be included in the scope of this survey.

The remainder of this paper is structured as follows: Sections 2 and 3 provide an overview of

the main approaches to malware analysis and detection; Section 4 defines feature extraction

and selection; Sections 5, 6 and 7 review the most recent papers using shallow learning, deep

learning and bio-inspired AI algorithms for malware detection in host-based, cloud and IoT

environments; Section 8 presents the state-of-the-art machine learning approaches for Android

malware detection; and Section 9 describes malicious uses of AI for the purpose of designing

malware. Finally, Section 10 provides conclusions, identifying potential challenges and future

directions for the use of AI in malware detection.

2. Malware Analysis Approaches

Malware analysis is the foundation of malware detection and essential for developing effective

malware detection techniques. Without malware analysis, which provides insight into the

classification and functionality of the malicious file, detecting malware could not be achieved.

Malware analysis is divided into static, dynamic and hybrid approaches, as described below

[9]:

• Static analysis, whereby a suspected malicious file is inspected and analysed without

executing it based on extracted low-level information such as system calls, the control

flow graph, and the data flow graph. Static analysis produces a low number of false

positives; however, it fails to detect unknown malware that uses code obfuscation.

• Dynamic analysis, whereby a suspected malicious file is inspected at runtime, usually

within a sandbox (an isolated virtual machine designed for testing purposes, where

malware can be executed without affecting system resources). The advantage is that the

malware can be executed and analysed; therefore, unknown malware can be

successfully detected. However, dynamic analysis is time-consuming and produces a

high level of false positives.

• Hybrid analysis, whereby characteristics of static and dynamic analysis are combined

to overcome the challenges of the two.

This paper will refer to these three types of analysis to describe the basis of the methods

employed in state-of-the-art machine learning-based malware detection systems.

3. Malware Detection Techniques

As previously mentioned, malware detection techniques can be classified into three broad

categories: signature-based, heuristic-based, and behaviour-based. These methods rely on

results from malware analysis, and each method has its unique advantages and challenges [9]:

• Signature-based detection – this technique uses a known list of indicators of

compromise (IOCs), which include specific byte sequences, API calls, file hashes,

malicious domains or network attack patterns. Signature-based detection is, however,

incapable of detecting previously unknown or encrypted malware and does not require

machine learning models.

• Behavioural-based detection – involves monitoring a suspected executable file in an

isolated environment and collecting all exhibited behaviours, then using methods of

extracting useful features by which a machine learning model can classify the malicious

behaviour.

• Heuristic-based detection – this technique relies on generating rules based on the

results of the static/dynamic analysis to guide the inspection of the extracted data to

support the proposed malware detection model. Such rules can either be generated

manually (relying on the expertise of the security analysts) or automatically, using

machine learning or tools such as YARA.

4. Feature Extraction and Selection

Throughout this paper, the term ‘features’ will be frequently used in relation to various machine

learning algorithms. In machine learning, feature extraction or feature engineering is the

process of transforming raw data into numerical features that can be ‘understood’ by the

machine learning algorithm. Feature extraction is necessary to improve the model’s

effectiveness, as applying machine learning directly to raw data is generally ineffective.

On the other hand, feature selection is the process of removing unnecessary features to assist

with developing a predictive model.

In a machine learning-based malware detection system, feature extraction and selection are

critical steps in developing the system. Tables II, IV, V and VI below detail the features

extracted by the authors of each paper evaluated in this review.

5. Shallow Learning-Based Classification Methods for Malware Detection

Shallow learning (SL) generally comprises the majority of machine learning models proposed

prior to 2006 and, more specifically, any machine learning models not classified as deep

learning. SL approaches traditionally depend heavily on features manually designed to solve a

given task [10].

Nevertheless, shallow learning algorithms are still widely used, including in cyber security

more broadly and malware detection in particular, as seen in Table II.

Fig. 1 below lists the most commonly used SL-based classification methods.

Fig. 1 Common shallow learning algorithms

Table I provides explanations for the acronyms used in Fig. 1:

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

1TheState-of-the-ArtinAI-BasedMalwareDetectionTechniques:AReviewAdamWolseyAbstractArtificialIntelligencetechniqueshaveevolvedrapidlyinrecentyears,revolutionisingtheapproachesusedtofightagainstcybercriminals.Butasthecybersecurityfieldhasprogressed,sohasmalwaredevelopment,makingitaneconomicimperativet...

展开>> 收起<<

1 The State -of-the-Art in AI-Based Malware Detection Techniques A Review Adam Wolsey.pdf

共18页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

1 The State -of-the-Art in AI-Based Malware Detection Techniques A Review Adam Wolsey

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: