From Threat Reports to Continuous Threat Intelligence A Comparison of Attack Technique Extraction Methods from Textual Artifacts Md Rayhanur Rahman Laurie Williams

2025-04-27 0 0 616.33KB 18 页 10玖币
侵权投诉
From Threat Reports to Continuous Threat Intelligence: A Comparison of Attack
Technique Extraction Methods from Textual Artifacts
Md Rayhanur Rahman, Laurie Williams
mrahman@ncsu.edu, lawilli3@ncsu.edu
North Carolina State University
Abstract
The cyberthreat landscape is continuously evolving. Hence,
continuous monitoring and sharing of threat intelligence have
become a priority for organizations. Threat reports, published
by cybersecurity vendors, contain detailed descriptions of at-
tack Tactics, Techniques, and Procedures (TTP) written in
an unstructured text format. Extracting TTP from these re-
ports aids cybersecurity practitioners and researchers learn
and adapt to evolving attacks and in planning threat mitiga-
tion. Researchers have proposed TTP extraction methods in
the literature, however, not all of these proposed methods are
compared to one another or to a baseline. The goal of this
study is to aid cybersecurity researchers and practitioners
choose attack technique extraction methods for monitoring
and sharing threat intelligence by comparing the underlying
methods from the TTP extraction studies in the literature. In
this work, we identify ten existing TTP extraction studies from
the literature and implement five methods from the ten stud-
ies. We find two methods, based on Term Frequency-Inverse
Document Frequency(TFIDF) and Latent Semantic Indexing
(LSI), outperform the other three methods with a F1 score of
84% and 83%, respectively. We observe the performance of
all methods in F1 score drops in the case of increasing the
class labels exponentially. We also implement and evaluate
an oversampling strategy to mitigate class imbalance issues.
Furthermore, oversampling improves the classification perfor-
mance of TTP extraction. We provide recommendations from
our findings for future cybersecurity researchers, such as the
construction of a benchmark dataset from a large corpus; and
the selection of textual features of TTP. Our work, along with
the dataset and implementation source code, can work as a
baseline for cybersecurity researchers to test and compare the
performance of future TTP extraction methods.
1 Introduction
Information technology (IT) systems have been gaining
continuous attention from threat actors with financial mo-
tives [19] and organized backing (i.e., state sponsored [26]).
For example, in 2021, Sonatype reported that software sup-
ply chain attacks increased by 650% in 2020 from the pre-
vious year [40]. Moreover, a cyberattack on the Colonial
pipeline [46], JBS [31] and Ireland health services [27] show
that threat actors can destabilize millions of people’s lives
by fuel price surge, food supply shortage, and disruption
in healthcare services. Thwarting cyberattacks has become
more complicated as the threat landscape evolves rapidly.
Hence, continuous monitoring and sharing of threat intelli-
gence has become a priority, as emphasized in Section 2(iv)
of the US Executive Order 14028: Improving the Nation’s
cybersecurity: “service providers share cyber threat and inci-
dent information with agencies, doing so, where possible, in
industry-recognized formats for incident response and reme-
diation.“ [18].
Threat reports, published by cybersecurity vendors and
researchers, contain detailed descriptions on how malicious
actors utilize specific tactics, relevant techniques, and describe
procedures for performing the attack - known as Tactics, Tech-
niques, Procedures (TTP) (see Section 2.2) [24,42,44]) - to
launch cyberattacks. Consider a threat report from FireEye de-
scribing the attack procedures of the Solarwinds supply chain
attack [7] in Example 1, where we show the attackers’ actions
in bold text. One of the observed (mentioned in the report)
TTP is T1518.001: Security Software Discovery which al-
lows an attacker to bypass the security defense by discovering
security software running in the system [23]. The rise in cy-
berattack incidents with evolving attack techniques results in
a growing number and volume of threat reports. Extracting the
TTP from threat reports can help cybersecurity practitioners
and researchers with cyberattack characterization, detection,
and mitigation [14] from the past knowledge of cyberattacks.
Analyzing TTP also helps cybersecurity practitioners in con-
tinuous monitoring and sharing of threat intelligence. For
example, organizations can learn how to adapt to the evolu-
tion of cyberattacks. Cybersecurity red and blue teams also
benefit in threat hunting by threat intelligence sharing [44],
attack profiling [29], and forecasting [37].
Threat reports contain a large amount of text and manually
1
arXiv:2210.02601v1 [cs.CR] 5 Oct 2022
extracting TTP is error-prone and inefficient [14]. Cybersecu-
rity researchers have proposed automated extraction of TTP
from threat reports (e.g. [5,14,15,28,29,34]). Moreover,
the MITRE [2] organization uses an open-source tool [3]
for finding TTP from threat reports. These TTP extraction
work use natural language processing (NLP) along with su-
pervised and unsupervised machine learning (ML) techniques
to classify texts to the corresponding TTPs. However, no
comparison among this existing work has been conducted,
and the research has not involved an established ground truth
dataset [34], highlighting the need for a comparison of under-
lying methods of existing TTP extraction work. A compara-
tive study among these methods would provide cybersecurity
researchers and practitioners a baseline for choosing the best
method for TTP extraction, finding room for improvement.
Rahman et al. systematically surveyed the literature and
obtained ten TTP extraction studies [34]. None of these
studies compared their work with a common baseline and
only two of these studies [5,15] compared their results with
one other. In our work, we first select five studies [5,14,21,
28,29] from the ten based on inclusion criteria (Section 3.2)
and implement the underlying methods of the five selected
studies. We then compare the performance of classifying text
(i.e., attack procedure description) to the corresponding attack
techniques. Moreover, as the number of attack techniques
are growing due to the evolution of attack techniques, we
also investigate (i) how the methods perform given that the
dataset has class imbalance problems (existence of majority
and minority classes); and (ii) how the methods perform when
we increase the classification labels (labels are the name of
techniques that would be classified from attack procedure
descriptions).
The goal of this study is to aid cybersecurity researchers
and practitioners choose attack technique extraction methods
for monitoring and sharing of threat intelligence by compar-
ing underlying method from the TTP extraction studies in the
literature. We investigate these following research questions
(RQs):
RQ1: Classification performance
How do the TTP extrac-
tion methods perform in classifying textual descriptions
of attack procedures to attack techniques across different
classifiers?
RQ2: Effect of class imbalance mitigation
What is the ef-
fect on the performance of the compared TTP extraction
methods when oversampling is applied to mitigate class
imbalance?
RQ3: Effect of increase in class labels
How do the TTP
extraction methods perform when the number of class
labels is increased exponentially?
We implement the underlying methods of these five studies:
[5,14,21,28,29]. We construct a pipeline for comparing
the methods on the same machine learning workflow. We
run the comparison utilizing a dataset constructed from the
MITRE ATT&CK framework [24]. We also run the methods
on oversampled data to investigate how the effect of class
imbalance can be mitigated. Finally, we use six different
multiclass classification settings (
n=2,4,8,16,32,64
where
n
denotes the number of class labels) to investigate how the
methods perform in classifying a large number of available
TTP. We list our contributions below.
A comparative study of the five TTP extraction methods
from the literature. This article, to the best of our knowl-
edge, is the first study to conduct direct comparisons of
the TTP extraction methods.
A sensitivity analysis on the effect of using oversampling
and multiclass classification on the compared method.
Our work investigates these two important aspects of
classification as the number of techniques is more than
hundred and the technique enumeration is being updated
gradually resulting in majority and minority classes.
A pipeline for conducting the comparison settings which
ensure the methods are executed in the same machine
learning workflow. We also make our dataset and im-
plementation source code available at [4] for future re-
searchers. The pipeline along with the dataset and imple-
mentation sources serve as a baseline for cybersecurity
researchers to test and compare the performance of the
future TTP extraction method.
We provide recommendations on how the methods can
be improved for better extraction performance.
Example 1: Excerpt from a threat report on Solar-
winds attack showing attackers’ actions in bold texts
After an initial dormant period of up to two weeks,
it
retrieves
and
executes
commands, called “Jobs”,
that include the ability to
transfer
files,
execute
files,
profile
the system,
reboot
the machine, and
disable
system services. The malware
masquerades
its net-
work traffic as the Orion Improvement Program (OIP)
protocol and
stores
reconnaissance results within le-
gitimate plugin configuration files allowing it to blend
in with legitimate SolarWinds activity. The backdoor
uses multiple obfuscated blocklists to
identify
foren-
sic and anti-virus tools running as processes, services,
and drivers.
Source: FireEye [7]
The rest of the article is organized as follows. In Section 2,
we discuss a few key concepts relevant to this study. In Sec-
tion 3and 4, we discuss our process to identify the selected
2
studies for comparison. In Section 5, we discuss our method-
ology for designing and running the experiment. In Section 6
and 7, we report and discuss our observations from the exper-
iment. In Section 9and 8, we identify several limitations to
our work followed by highlighting potential future research
paths. In Section 10, we discuss related work in the literature
followed by concluding the article in Section 11. We report a
few supplementary information in the Appendix.
2 Key Concepts
In this section, we discuss several key concepts relevant in
the context of our study.
2.1 Threat Intelligence:
Threat intelligence - also known as Cyberthreat intelligence
(CTI) - is defined as ‘evidence-based knowledge, including
context, mechanisms, indicators, implications, and actionable
advice about an existing or emerging menace or hazard to
assets that can be used to inform decisions regarding the
subject’s response to that menace or hazard‘ [22]. Threat in-
telligence can be used to forecast, prevent and defend attacks.
2.2 Tactics, techniques and procedures (TTP):
Tactics are high level goals of an attacker, whereas techniques
are lower level descriptions of the execution of the attack in
the context of a given tactic [24,44]. Procedures are the lowest
level step by step execution of an attack being performed. TTP
can be used to profile or analyze the lifecycle of an attack on
a targeted system. For example, privilege escalation is a tactic
for gaining elevated permission on a system. One technique
for privilege escalation can be access token manipulation [24].
An attacker can gain elevated privilege in a system by tamper-
ing the access token to bypass the access control mechanism.
An example procedure is an attacker manipulating an access
token by using Metasploit’s named-pipe impersonation [24].
2.3 ATT&CK:
The MITRE [2] organization developed ATT&CK [24], a
framework derived from real world observations of adver-
sarial TTPs deployed by attack groups. ATT&CK contains
an enumeration of high level attack stages known as tactics.
Each tactic has an enumeration of corresponding techniques,
and each technique has associated procedure description(s).
Procedures are written in unstructured text and describe how
a particular technique has been used by the attacker to gain an
objective of the corresponding tactic to launch a cyberattack.
ATT&CK was first introduced in 2013 to model the lifecycle
and common TTP utilized by threat actors in launching APT
(advanced persistent threat) attacks. In our research, we uti-
lized Version 9 of the ATT&CK framework which consists of
14 Tactics, 170 Techniques, and 8,104 procedures.
3 Selection of TTP extraction methods
In this section, we discuss the methodology for selecting
and comparing the TTP extraction methods in five studies
[5,14,21,28,29] found in literature.
3.1 Finding TTP extraction work from the lit-
erature:
Rahman et al. [34] systematically collected automated
threat intelligence extraction-related studies from scholarly
databases and found 64 relevant studies. From these, the first
author of this paper identified ten studies that extracted TTP
from the text automatically using NLP and ML techniques.
We select these ten work as potential candidates for our com-
parison study. We refer to these ten works as the candidate set.
In the Appendix, Table 7, we list the bibliographic information
of the candidate set.
3.2 Inclusion criteria for TTP extraction
work:
A comprehensive comparison of TTP extraction methods is
not a straightforward task. One difficulty in setting up the
study is to find a labelled and universally agreed upon dataset.
Moreover, constructing such a dataset is inherently challeng-
ing as the set of TTP is subject to change with evolution
of the manner of attack. Another challenge is to determine
whether the extraction should be performed on the sentence
level or paragraph level. Finally, in the candidate set, TTP ex-
traction methods were designed targeting different use cases,
such as transforming the extracted TTP to structured threat
intelligence formats [14] or building a knowledge graph [32].
Hence, not every study in the candidate set is able to extract
all known TTPs. Hence, we define the following inclusion
criteria:
1.
All methods selected for the comparison can work on
the same textual artifacts
2.
Besides labelling the text to corresponding technique, no
other manual labelling is required for comparison
3.
All methods can be compared using the same set of
technique names which will be used as labels for classi-
fication tasks.
3
Id Dataset type Dataset source # threat reports NLP/ML techniques and features
S1* Data breach incident reports
Github APTnotes [6] and custom
search engine [1]
327 Latent Semantic Indexing(LSI)
S2* APT attack reports Github APTnotes 445
Dependency parsing, TFIDF of in-
dependent noun phrases
S3APT attack reports Github APTnotes 50 Named entity recognition(NER)
S4* APT attack reports Github APTnotes 18,257 TFIDF
S5Malware report
Github APTnotes, Mi-
crosoftS/Adobe Security Bulletins,
National Vulnerability Database
description
474 NER, Cybersecurity ontology
S6* APT attack reports
Attack technique dataset (Source not
reported)
200 LSI
S7
Computer security literature and An-
droid developer documentation
IEEE S&P, CCS, USENIX articles,
Android API [12]
1,068 Dependency parsing
S8- - 18
NER, Dependency parsing, Basilisk
S9* Malware report Symantec threat reports 17,000 Dependency parsing, BM25
S10 Malware report Symantec threat reports 2,200 Dependency parsing, BM25
Id with (*) symbol denotes that the study is selected for comparison
Table 1: Datasets and methods used in candidate set
3.3 Filtering the TTP extraction work for
comparison:
In Table 1, we report the dataset type, dataset source, and
relevant NLP/ML techniques used for our candidate set. Next,
we report how we filter the candidate set.
We drop
S3
,
S5
, and
S8
because Named Entity Recogni-
tion (NER) labelling of words from the text is required
(violates filtering criteria [2]).
We drop
S7
because this work (a) uses Android develop-
ment documentation (violates filtering criteria [1]), and
(b) extracts the features for Android-specific malware
only (violates filtering criteria [3]).
We drop
S10
because the work requires additional man-
ual work on identifying relevant verbs and objects from
Wikipedia articles on computing and cybersecurity re-
lated concepts (violates filtering criteria [2]).
Finally, we keep the remaining work for our comparison
study:
S1
,
S2
,
S4
,
S6
, and
S9
.
S1
and
S6
utilized Latent Seman-
tic Indexing (LSI) [20];
S2
and
S4
utilized Term frequency -
inverse document frequency (TFIDF); and
S9
utilized depen-
dency parsing and BM25.
4 Overview of the selected studies for compar-
ison
We report a brief overview of the studies selected for compar-
ison followed by observed similarities and dissimilarities.
S1
The authors used the data breach incident reports produced
by cybersecurity vendors and then searched high level
attack patterns from those reports. The authors used
the ATT&CK framework for the common vocabulary
of attack pattern names. They used LSI for searching
the attack pattern names from the texts. Finally, they
correlated these searched attack patterns with responsible
APT actor groups.
S2
The authors used APT attack related articles as dataset
and MITRE ATT&CK framework for the common vo-
cabulary of TTP. Then they extracted independent noun
phrases from the corpus that appear in the corpus at least
once without being part of a larger noun phrase. Then
they computed TFIDF vectors of these noun phrases.
Finally, using these vectors, they retrieved the most rel-
evant set of articles associated with specific TTP key-
words such as data breach, privilege escalation.
S4The authors used APT attack-related articles and Syman-
tec threat reports as dataset and MITRE ATT&CK frame-
work for the common vocabulary of TTP. They computed
TFIDF vectors of the articles and then applied three bias
correction techniques named kernel mean matching [13],
Kullback-Liebler importance estimation procedure [43],
and relative density ratio estimation. Finally, they used
SVM classifier on bias corrected data.
S6
The authors used advanced persistent threat (APT) attack
related online articles as dataset and MITRE ATT&CK
framework for the common vocabulary of TTP. They
first computed the TFIDF vectors of the description of
TTP. Then they applied LSI on articles for retrieving a
set of topics. After that, for each article, the authors com-
puted the cosine similarity score between TFIDF vectors
of each TTP and the retrieved topics. Then they used
these computed similarity scores as features. Finally, the
authors used two multi-label classification techniques
named Binary Relevance and Label Powerset [36,41].
4
摘要:

FromThreatReportstoContinuousThreatIntelligence:AComparisonofAttackTechniqueExtractionMethodsfromTextualArtifactsMdRayhanurRahman,LaurieWilliamsmrahman@ncsu.edu,lawilli3@ncsu.eduNorthCarolinaStateUniversityAbstractThecyberthreatlandscapeiscontinuouslyevolving.Hence,continuousmonitoringandsharingofth...

展开>> 收起<<
From Threat Reports to Continuous Threat Intelligence A Comparison of Attack Technique Extraction Methods from Textual Artifacts Md Rayhanur Rahman Laurie Williams.pdf

共18页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:18 页 大小:616.33KB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 18
客服
关注