From Threat Reports to Continuous Threat Intelligence A Comparison of Attack Technique Extraction Methods from Textual Artifacts Md Rayhanur Rahman Laurie Williams

2025-04-27 0 0 616.33KB 18 页 10玖币

侵权投诉

From Threat Reports to Continuous Threat Intelligence: A Comparison of Attack

Technique Extraction Methods from Textual Artifacts

Md Rayhanur Rahman, Laurie Williams

mrahman@ncsu.edu, lawilli3@ncsu.edu

North Carolina State University

Abstract

The cyberthreat landscape is continuously evolving. Hence,

continuous monitoring and sharing of threat intelligence have

become a priority for organizations. Threat reports, published

by cybersecurity vendors, contain detailed descriptions of at-

tack Tactics, Techniques, and Procedures (TTP) written in

an unstructured text format. Extracting TTP from these re-

ports aids cybersecurity practitioners and researchers learn

and adapt to evolving attacks and in planning threat mitiga-

tion. Researchers have proposed TTP extraction methods in

the literature, however, not all of these proposed methods are

compared to one another or to a baseline. The goal of this

study is to aid cybersecurity researchers and practitioners

choose attack technique extraction methods for monitoring

and sharing threat intelligence by comparing the underlying

methods from the TTP extraction studies in the literature. In

this work, we identify ten existing TTP extraction studies from

the literature and implement ﬁve methods from the ten stud-

ies. We ﬁnd two methods, based on Term Frequency-Inverse

Document Frequency(TFIDF) and Latent Semantic Indexing

(LSI), outperform the other three methods with a F1 score of

84% and 83%, respectively. We observe the performance of

all methods in F1 score drops in the case of increasing the

class labels exponentially. We also implement and evaluate

an oversampling strategy to mitigate class imbalance issues.

Furthermore, oversampling improves the classiﬁcation perfor-

mance of TTP extraction. We provide recommendations from

our ﬁndings for future cybersecurity researchers, such as the

construction of a benchmark dataset from a large corpus; and

the selection of textual features of TTP. Our work, along with

the dataset and implementation source code, can work as a

baseline for cybersecurity researchers to test and compare the

performance of future TTP extraction methods.

1 Introduction

Information technology (IT) systems have been gaining

continuous attention from threat actors with ﬁnancial mo-

tives [19] and organized backing (i.e., state sponsored [26]).

For example, in 2021, Sonatype reported that software sup-

ply chain attacks increased by 650% in 2020 from the pre-

vious year [40]. Moreover, a cyberattack on the Colonial

pipeline [46], JBS [31] and Ireland health services [27] show

that threat actors can destabilize millions of people’s lives

by fuel price surge, food supply shortage, and disruption

in healthcare services. Thwarting cyberattacks has become

more complicated as the threat landscape evolves rapidly.

Hence, continuous monitoring and sharing of threat intelli-

gence has become a priority, as emphasized in Section 2(iv)

of the US Executive Order 14028: Improving the Nation’s

cybersecurity: “service providers share cyber threat and inci-

dent information with agencies, doing so, where possible, in

industry-recognized formats for incident response and reme-

diation.“ [18].

Threat reports, published by cybersecurity vendors and

researchers, contain detailed descriptions on how malicious

actors utilize speciﬁc tactics, relevant techniques, and describe

procedures for performing the attack - known as Tactics, Tech-

niques, Procedures (TTP) (see Section 2.2) [24,42,44]) - to

launch cyberattacks. Consider a threat report from FireEye de-

scribing the attack procedures of the Solarwinds supply chain

attack [7] in Example 1, where we show the attackers’ actions

in bold text. One of the observed (mentioned in the report)

TTP is T1518.001: Security Software Discovery which al-

lows an attacker to bypass the security defense by discovering

security software running in the system [23]. The rise in cy-

berattack incidents with evolving attack techniques results in

a growing number and volume of threat reports. Extracting the

TTP from threat reports can help cybersecurity practitioners

and researchers with cyberattack characterization, detection,

and mitigation [14] from the past knowledge of cyberattacks.

Analyzing TTP also helps cybersecurity practitioners in con-

tinuous monitoring and sharing of threat intelligence. For

example, organizations can learn how to adapt to the evolu-

tion of cyberattacks. Cybersecurity red and blue teams also

beneﬁt in threat hunting by threat intelligence sharing [44],

attack proﬁling [29], and forecasting [37].

Threat reports contain a large amount of text and manually

arXiv:2210.02601v1 [cs.CR] 5 Oct 2022

extracting TTP is error-prone and inefﬁcient [14]. Cybersecu-

rity researchers have proposed automated extraction of TTP

from threat reports (e.g. [5,14,15,28,29,34]). Moreover,

the MITRE [2] organization uses an open-source tool [3]

for ﬁnding TTP from threat reports. These TTP extraction

work use natural language processing (NLP) along with su-

pervised and unsupervised machine learning (ML) techniques

to classify texts to the corresponding TTPs. However, no

comparison among this existing work has been conducted,

and the research has not involved an established ground truth

dataset [34], highlighting the need for a comparison of under-

lying methods of existing TTP extraction work. A compara-

tive study among these methods would provide cybersecurity

researchers and practitioners a baseline for choosing the best

method for TTP extraction, ﬁnding room for improvement.

Rahman et al. systematically surveyed the literature and

obtained ten TTP extraction studies [34]. None of these

studies compared their work with a common baseline and

only two of these studies [5,15] compared their results with

one other. In our work, we ﬁrst select ﬁve studies [5,14,21,

28,29] from the ten based on inclusion criteria (Section 3.2)

and implement the underlying methods of the ﬁve selected

studies. We then compare the performance of classifying text

(i.e., attack procedure description) to the corresponding attack

techniques. Moreover, as the number of attack techniques

are growing due to the evolution of attack techniques, we

also investigate (i) how the methods perform given that the

dataset has class imbalance problems (existence of majority

and minority classes); and (ii) how the methods perform when

we increase the classiﬁcation labels (labels are the name of

techniques that would be classiﬁed from attack procedure

descriptions).

The goal of this study is to aid cybersecurity researchers

and practitioners choose attack technique extraction methods

for monitoring and sharing of threat intelligence by compar-

ing underlying method from the TTP extraction studies in the

literature. We investigate these following research questions

(RQs):

RQ1: Classiﬁcation performance

How do the TTP extrac-

tion methods perform in classifying textual descriptions

of attack procedures to attack techniques across different

classiﬁers?

RQ2: Effect of class imbalance mitigation

What is the ef-

fect on the performance of the compared TTP extraction

methods when oversampling is applied to mitigate class

imbalance?

RQ3: Effect of increase in class labels

How do the TTP

extraction methods perform when the number of class

labels is increased exponentially?

We implement the underlying methods of these ﬁve studies:

[5,14,21,28,29]. We construct a pipeline for comparing

the methods on the same machine learning workﬂow. We

run the comparison utilizing a dataset constructed from the

MITRE ATT&CK framework [24]. We also run the methods

on oversampled data to investigate how the effect of class

imbalance can be mitigated. Finally, we use six different

multiclass classiﬁcation settings (

n=2,4,8,16,32,64

where

denotes the number of class labels) to investigate how the

methods perform in classifying a large number of available

TTP. We list our contributions below.

•

A comparative study of the ﬁve TTP extraction methods

from the literature. This article, to the best of our knowl-

edge, is the ﬁrst study to conduct direct comparisons of

the TTP extraction methods.

•

A sensitivity analysis on the effect of using oversampling

and multiclass classiﬁcation on the compared method.

Our work investigates these two important aspects of

classiﬁcation as the number of techniques is more than

hundred and the technique enumeration is being updated

gradually resulting in majority and minority classes.

•

A pipeline for conducting the comparison settings which

ensure the methods are executed in the same machine

learning workﬂow. We also make our dataset and im-

plementation source code available at [4] for future re-

searchers. The pipeline along with the dataset and imple-

mentation sources serve as a baseline for cybersecurity

researchers to test and compare the performance of the

future TTP extraction method.

•

We provide recommendations on how the methods can

be improved for better extraction performance.

Example 1: Excerpt from a threat report on Solar-

winds attack showing attackers’ actions in bold texts

After an initial dormant period of up to two weeks,

retrieves

and

executes

commands, called “Jobs”,

that include the ability to

transfer

ﬁles,

execute

ﬁles,

proﬁle

the system,

reboot

the machine, and

disable

system services. The malware

masquerades

its net-

work trafﬁc as the Orion Improvement Program (OIP)

protocol and

stores

reconnaissance results within le-

gitimate plugin conﬁguration ﬁles allowing it to blend

in with legitimate SolarWinds activity. The backdoor

uses multiple obfuscated blocklists to

identify

foren-

sic and anti-virus tools running as processes, services,

and drivers.

Source: FireEye [7]

The rest of the article is organized as follows. In Section 2,

we discuss a few key concepts relevant to this study. In Sec-

tion 3and 4, we discuss our process to identify the selected

studies for comparison. In Section 5, we discuss our method-

ology for designing and running the experiment. In Section 6

and 7, we report and discuss our observations from the exper-

iment. In Section 9and 8, we identify several limitations to

our work followed by highlighting potential future research

paths. In Section 10, we discuss related work in the literature

followed by concluding the article in Section 11. We report a

few supplementary information in the Appendix.

2 Key Concepts

In this section, we discuss several key concepts relevant in

the context of our study.

2.1 Threat Intelligence:

Threat intelligence - also known as Cyberthreat intelligence

(CTI) - is deﬁned as ‘evidence-based knowledge, including

context, mechanisms, indicators, implications, and actionable

advice about an existing or emerging menace or hazard to

assets that can be used to inform decisions regarding the

subject’s response to that menace or hazard‘ [22]. Threat in-

telligence can be used to forecast, prevent and defend attacks.

2.2 Tactics, techniques and procedures (TTP):

Tactics are high level goals of an attacker, whereas techniques

are lower level descriptions of the execution of the attack in

the context of a given tactic [24,44]. Procedures are the lowest

level step by step execution of an attack being performed. TTP

can be used to proﬁle or analyze the lifecycle of an attack on

a targeted system. For example, privilege escalation is a tactic

for gaining elevated permission on a system. One technique

for privilege escalation can be access token manipulation [24].

An attacker can gain elevated privilege in a system by tamper-

ing the access token to bypass the access control mechanism.

An example procedure is an attacker manipulating an access

token by using Metasploit’s named-pipe impersonation [24].

2.3 ATT&CK:

The MITRE [2] organization developed ATT&CK [24], a

framework derived from real world observations of adver-

sarial TTPs deployed by attack groups. ATT&CK contains

an enumeration of high level attack stages known as tactics.

Each tactic has an enumeration of corresponding techniques,

and each technique has associated procedure description(s).

Procedures are written in unstructured text and describe how

a particular technique has been used by the attacker to gain an

objective of the corresponding tactic to launch a cyberattack.

ATT&CK was ﬁrst introduced in 2013 to model the lifecycle

and common TTP utilized by threat actors in launching APT

(advanced persistent threat) attacks. In our research, we uti-

lized Version 9 of the ATT&CK framework which consists of

14 Tactics, 170 Techniques, and 8,104 procedures.

3 Selection of TTP extraction methods

In this section, we discuss the methodology for selecting

and comparing the TTP extraction methods in ﬁve studies

[5,14,21,28,29] found in literature.

3.1 Finding TTP extraction work from the lit-

erature:

Rahman et al. [34] systematically collected automated

threat intelligence extraction-related studies from scholarly

databases and found 64 relevant studies. From these, the ﬁrst

author of this paper identiﬁed ten studies that extracted TTP

from the text automatically using NLP and ML techniques.

We select these ten work as potential candidates for our com-

parison study. We refer to these ten works as the candidate set.

In the Appendix, Table 7, we list the bibliographic information

of the candidate set.

3.2 Inclusion criteria for TTP extraction

work:

A comprehensive comparison of TTP extraction methods is

not a straightforward task. One difﬁculty in setting up the

study is to ﬁnd a labelled and universally agreed upon dataset.

Moreover, constructing such a dataset is inherently challeng-

ing as the set of TTP is subject to change with evolution

of the manner of attack. Another challenge is to determine

whether the extraction should be performed on the sentence

level or paragraph level. Finally, in the candidate set, TTP ex-

traction methods were designed targeting different use cases,

such as transforming the extracted TTP to structured threat

intelligence formats [14] or building a knowledge graph [32].

Hence, not every study in the candidate set is able to extract

all known TTPs. Hence, we deﬁne the following inclusion

criteria:

All methods selected for the comparison can work on

the same textual artifacts

Besides labelling the text to corresponding technique, no

other manual labelling is required for comparison

All methods can be compared using the same set of

technique names which will be used as labels for classi-

ﬁcation tasks.

Id Dataset type Dataset source # threat reports NLP/ML techniques and features

S1* Data breach incident reports

Github APTnotes [6] and custom

search engine [1]

327 Latent Semantic Indexing(LSI)

S2* APT attack reports Github APTnotes 445

Dependency parsing, TFIDF of in-

dependent noun phrases

S3APT attack reports Github APTnotes 50 Named entity recognition(NER)

S4* APT attack reports Github APTnotes 18,257 TFIDF

S5Malware report

Github APTnotes, Mi-

crosoftS/Adobe Security Bulletins,

National Vulnerability Database

description

474 NER, Cybersecurity ontology

S6* APT attack reports

Attack technique dataset (Source not

reported)

200 LSI

Computer security literature and An-

droid developer documentation

IEEE S&P, CCS, USENIX articles,

Android API [12]

1,068 Dependency parsing

S8- - 18

NER, Dependency parsing, Basilisk

S9* Malware report Symantec threat reports 17,000 Dependency parsing, BM25

S10 Malware report Symantec threat reports 2,200 Dependency parsing, BM25

Id with (*) symbol denotes that the study is selected for comparison

Table 1: Datasets and methods used in candidate set

3.3 Filtering the TTP extraction work for

comparison:

In Table 1, we report the dataset type, dataset source, and

relevant NLP/ML techniques used for our candidate set. Next,

we report how we ﬁlter the candidate set.

•

We drop

, and

because Named Entity Recogni-

tion (NER) labelling of words from the text is required

(violates ﬁltering criteria [2]).

•

We drop

because this work (a) uses Android develop-

ment documentation (violates ﬁltering criteria [1]), and

(b) extracts the features for Android-speciﬁc malware

only (violates ﬁltering criteria [3]).

•

We drop

S10

because the work requires additional man-

ual work on identifying relevant verbs and objects from

Wikipedia articles on computing and cybersecurity re-

lated concepts (violates ﬁltering criteria [2]).

Finally, we keep the remaining work for our comparison

study:

, and

and

utilized Latent Seman-

tic Indexing (LSI) [20];

and

utilized Term frequency -

inverse document frequency (TFIDF); and

utilized depen-

dency parsing and BM25.

4 Overview of the selected studies for compar-

ison

We report a brief overview of the studies selected for compar-

ison followed by observed similarities and dissimilarities.

The authors used the data breach incident reports produced

by cybersecurity vendors and then searched high level

attack patterns from those reports. The authors used

the ATT&CK framework for the common vocabulary

of attack pattern names. They used LSI for searching

the attack pattern names from the texts. Finally, they

correlated these searched attack patterns with responsible

APT actor groups.

The authors used APT attack related articles as dataset

and MITRE ATT&CK framework for the common vo-

cabulary of TTP. Then they extracted independent noun

phrases from the corpus that appear in the corpus at least

once without being part of a larger noun phrase. Then

they computed TFIDF vectors of these noun phrases.

Finally, using these vectors, they retrieved the most rel-

evant set of articles associated with speciﬁc TTP key-

words such as data breach, privilege escalation.

S4The authors used APT attack-related articles and Syman-

tec threat reports as dataset and MITRE ATT&CK frame-

work for the common vocabulary of TTP. They computed

TFIDF vectors of the articles and then applied three bias

correction techniques named kernel mean matching [13],

Kullback-Liebler importance estimation procedure [43],

and relative density ratio estimation. Finally, they used

SVM classiﬁer on bias corrected data.

The authors used advanced persistent threat (APT) attack

related online articles as dataset and MITRE ATT&CK

framework for the common vocabulary of TTP. They

ﬁrst computed the TFIDF vectors of the description of

TTP. Then they applied LSI on articles for retrieving a

set of topics. After that, for each article, the authors com-

puted the cosine similarity score between TFIDF vectors

of each TTP and the retrieved topics. Then they used

these computed similarity scores as features. Finally, the

authors used two multi-label classiﬁcation techniques

named Binary Relevance and Label Powerset [36,41].

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

FromThreatReportstoContinuousThreatIntelligence:AComparisonofAttackTechniqueExtractionMethodsfromTextualArtifactsMdRayhanurRahman,LaurieWilliamsmrahman@ncsu.edu,lawilli3@ncsu.eduNorthCarolinaStateUniversityAbstractThecyberthreatlandscapeiscontinuouslyevolving.Hence,continuousmonitoringandsharingofth...

展开>> 收起<<

From Threat Reports to Continuous Threat Intelligence A Comparison of Attack Technique Extraction Methods from Textual Artifacts Md Rayhanur Rahman Laurie Williams.pdf

共18页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

From Threat Reports to Continuous Threat Intelligence A Comparison of Attack Technique Extraction Methods from Textual Artifacts Md Rayhanur Rahman Laurie Williams

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: