
extracting TTP is error-prone and inefficient [14]. Cybersecu-
rity researchers have proposed automated extraction of TTP
from threat reports (e.g. [5,14,15,28,29,34]). Moreover,
the MITRE [2] organization uses an open-source tool [3]
for finding TTP from threat reports. These TTP extraction
work use natural language processing (NLP) along with su-
pervised and unsupervised machine learning (ML) techniques
to classify texts to the corresponding TTPs. However, no
comparison among this existing work has been conducted,
and the research has not involved an established ground truth
dataset [34], highlighting the need for a comparison of under-
lying methods of existing TTP extraction work. A compara-
tive study among these methods would provide cybersecurity
researchers and practitioners a baseline for choosing the best
method for TTP extraction, finding room for improvement.
Rahman et al. systematically surveyed the literature and
obtained ten TTP extraction studies [34]. None of these
studies compared their work with a common baseline and
only two of these studies [5,15] compared their results with
one other. In our work, we first select five studies [5,14,21,
28,29] from the ten based on inclusion criteria (Section 3.2)
and implement the underlying methods of the five selected
studies. We then compare the performance of classifying text
(i.e., attack procedure description) to the corresponding attack
techniques. Moreover, as the number of attack techniques
are growing due to the evolution of attack techniques, we
also investigate (i) how the methods perform given that the
dataset has class imbalance problems (existence of majority
and minority classes); and (ii) how the methods perform when
we increase the classification labels (labels are the name of
techniques that would be classified from attack procedure
descriptions).
The goal of this study is to aid cybersecurity researchers
and practitioners choose attack technique extraction methods
for monitoring and sharing of threat intelligence by compar-
ing underlying method from the TTP extraction studies in the
literature. We investigate these following research questions
(RQs):
RQ1: Classification performance
How do the TTP extrac-
tion methods perform in classifying textual descriptions
of attack procedures to attack techniques across different
classifiers?
RQ2: Effect of class imbalance mitigation
What is the ef-
fect on the performance of the compared TTP extraction
methods when oversampling is applied to mitigate class
imbalance?
RQ3: Effect of increase in class labels
How do the TTP
extraction methods perform when the number of class
labels is increased exponentially?
We implement the underlying methods of these five studies:
[5,14,21,28,29]. We construct a pipeline for comparing
the methods on the same machine learning workflow. We
run the comparison utilizing a dataset constructed from the
MITRE ATT&CK framework [24]. We also run the methods
on oversampled data to investigate how the effect of class
imbalance can be mitigated. Finally, we use six different
multiclass classification settings (
n=2,4,8,16,32,64
where
n
denotes the number of class labels) to investigate how the
methods perform in classifying a large number of available
TTP. We list our contributions below.
•
A comparative study of the five TTP extraction methods
from the literature. This article, to the best of our knowl-
edge, is the first study to conduct direct comparisons of
the TTP extraction methods.
•
A sensitivity analysis on the effect of using oversampling
and multiclass classification on the compared method.
Our work investigates these two important aspects of
classification as the number of techniques is more than
hundred and the technique enumeration is being updated
gradually resulting in majority and minority classes.
•
A pipeline for conducting the comparison settings which
ensure the methods are executed in the same machine
learning workflow. We also make our dataset and im-
plementation source code available at [4] for future re-
searchers. The pipeline along with the dataset and imple-
mentation sources serve as a baseline for cybersecurity
researchers to test and compare the performance of the
future TTP extraction method.
•
We provide recommendations on how the methods can
be improved for better extraction performance.
Example 1: Excerpt from a threat report on Solar-
winds attack showing attackers’ actions in bold texts
After an initial dormant period of up to two weeks,
it
retrieves
and
executes
commands, called “Jobs”,
that include the ability to
transfer
files,
execute
files,
profile
the system,
reboot
the machine, and
disable
system services. The malware
masquerades
its net-
work traffic as the Orion Improvement Program (OIP)
protocol and
stores
reconnaissance results within le-
gitimate plugin configuration files allowing it to blend
in with legitimate SolarWinds activity. The backdoor
uses multiple obfuscated blocklists to
identify
foren-
sic and anti-virus tools running as processes, services,
and drivers.
Source: FireEye [7]
The rest of the article is organized as follows. In Section 2,
we discuss a few key concepts relevant to this study. In Sec-
tion 3and 4, we discuss our process to identify the selected
2