
datasets to which the reference texts could contribute, which is
currently not leverage by related work.
Goal. This work aims to use as much textual data as possible to
predict the
CVSS
vector of a vulnerability. This is to achieve the
most accurate estimation of the
CVSS
vector possible. It should be
possible to use not only the short description of the vulnerability,
but also other types of texts, such as Twitter posts and news arti-
cles for prediction in case of a new vulnerability. Possible sources
of textual information about vulnerabilities should be found and
categorized. We aim to answer the following research questions:
Where can relevant textual information on vulnerabilities be found
outside vulnerability databases
(RQ1)
?and To which degree are pub-
lic data sources beyond vulnerability databases suitable for predicting
the
CVSS
vector
(RQ2)
? This will clarify whether there are typical
sources that regularly report on current vulnerabilities and whether
these are suitable as a basis for building a dataset for training a
ML
system.
Here, a rst impression shall be gained by a rough manual
search and then the sources referenced in the databases shall be
analyzed automatically with regard to the type and scope of the
references (e.g., blog posts, patchnotes, GitHub issues). With the
help of the texts, a
ML
model for predicting the
CVSS
vector is to be
trained. The data must be ltered and cleaned for this purpose. The
ML
model shall use Deep Learning and use state-of-the-art models
as a basis. The model is evaluated and compared to previous work.
Contributions. The contribution to current research is an analy-
sis of the references contained in the databases. This will categorize
the references in terms of certain characteristics and suitable for
ML
models and can serve as a starting point for further work on
the use of the references
(C1)
. A method that collects and processes
the text contained on the referenced web pages will be presented.
In addition, a system is implemented and evaluated that, unlike
previous work, such as Elbaz et al
. [12]
and Kuehn et al
. [22]
, uses
more extensive text from the references in addition to descriptions
of vulnerabilities from the databases
(C2)
. This method for predict-
ing
CVSS
vectors surpasses the current state-of-the-art. Further,
do we present an extensive explainability analysis of our trained
models as part of our evaluation (C3).
Outline. The state of the art in research is considered in §2,
followed by a preliminary analysis of the references included in
NVD
see also §3 Requirements for references and the texts contained
in them are dened and consequently the individual references are
evaluated, resulting in a selection of references. §4 explains the
procedure for collecting the texts from the references and a system
for retrieving, processing, and storing the texts is presented. §5
evaluates the
ML
system, while §6 discusses and compares the
results with other work. Finally, a conclusion is drawn in §7.
2 RELATED WORK
This section gives an overview over the state of the art in research.
We focus literature dealing with the prediction of
CVSS
vectors,
scores, or levels. In addition, work that uses sources other than
NVD
in this context is considered. Automated assessment should
provide a time advantage over the assessment by human experts. In
this regard, dierent papers come to dierent conclusions regarding
the duration of the assessment, and the exact methodology is not
always clear. Elbaz et al
. [12]
state for the observed period from
2007 to 2019 that 90% of vulnerabilities were assessed within just
under 30 days, with a median of only one day, while Chen et al
. [7]
indicate an average of 132 days between publication and assessment
for an observed period of 23 months in 2018 and 2019.
NVD, CVSS, Information Sources. Johnson et al
. [20]
perform a
statistical analysis of
CVSS
vectors in dierent databases contain-
ing vulnerabilities. In doing so, they show that despite dierent
sources, the
CVSS
vector is always comparable and, consequently,
seem to be robust. They state the
NVD
is the most robust informa-
tion source for
CVSS
information. On the other hand, Dong et al
.
[10]
show that information in the
NVD
itself is sometimes incon-
sistent and propose a system that relies on external sources to nd,
for example, missing versions of the software in question in the
NVD
. Accordingly, Kuehn et al
. [22]
present an information quality
metric for vulnerability databases and improve several drawbacks
in the
NVD
. In addition to vulnerability databases, other sources of
information are used in vulnerability management. Sabottke et al
.
[29]
use Twitter to predict whether a vulnerability will actually be
exploited. Almukaynizi et al
. [1]
go a step further and use other
data sources, such as ExploitDB
4
and Zero Day Initiative
5
. How-
ever, no text is used, but the simple existence of an article about a
vulnerability is used as a feature for the ML model.
CVSS Prediction. A large number of works deal with the predic-
tion of
CVSS
vector, scores, or levels starting from text. As one of
the rst works, Yamamoto et al
. [37]
use sLDA [
26
] to predict the
CVSS
vector based on the descriptions. For predicting the score,
Khazaei et al
. [21]
use Support Vector Machines (
SVM
s), random
forests [
4
], and fuzzy logic. Spanos and Angelis
[33]
predict the
CVSS
vector using random forests and boosting [
13
].
DL
is rst
used in this context by Han et al
. [17]
. By using an Convolutional
Neural Network (
CNN
), no feature engineering is required. How-
ever, in doing so, the model only determines the
CVSS
severity
level from the options Critical,High,Medium, and Low. Gawron
et al
. [14]
use
DL
in addition to Naive Bayes, but here the result is
a
CVSS
vector. Twitter serves as the data source for Chen et al
. [6]
.
The
ML
model is based on Long Short-Term Memory (
LSTM
) [
18
]
and predicts
CVSS
score. Sahin and Tosun
[30]
also improve on the
Han et al
. [17]
approach by using a
LSTM
. Gong et al
. [15]
show
a multi-task learning method that sets up multiple classiers on a
single Neural Network (
NN
), making it more ecient. Liu et al
. [25]
use the Chinese equivalent, the China National Vulnerability Data-
base of Information Security (
CNNVD
), as the data source rather
than the
NVD
. Jiang and Atif
[19]
take scores not only from the
NVD
but also from other sources as a basis for their prediction of
the score. The work of Elbaz et al
. [12]
focuses on a particularly
tractable classication of the
CVSS
vector. Therefore, they do not
use dimension reduction techniques. Kuehn et al
. [22]
use
DL
to
predict the
CVSS
vector, based on the
NVD
’s descriptions, with the
goal to aid security experts in their nal decision. The most recent
approach proposed Shahid and Debar
[32]
, which uses a separate
classier based on a Bidirectional Encoder Representations from
4https://www.exploit-db.com/
5https://www.zerodayinitiative.com/
2