Analysis Approach [14], [15]. In this research, we have opted
for the dynamic analysis approach for its ability to detect and
classify ransomware based on behavioral patterns regardless of
the code obfuscation techniques deployed by the ransomware
programmers [16], [17]. The main contributions of this paper
are:
•Develop a Web-Crawler, ‘GetRansomware’ to automate
collecting the Windows Portable Executable (PE) files of
15 different ransomware families from the ransomware
repository. The Web-Crawler is essential to automate
searching and downloading the samples and to cut down
the manual workload, but no prior works targeted this
scenario.
•Develop our dataset and conduct feature selection through
a two-phase feature engineering process that includes-
‘Feature Extraction’ from the sample binaries, and ‘Fea-
ture Selection’ to select the most important features for
each ML classifier.
•Develop, evaluate and compare the performance of six
State-of-the-art Supervised Machine Learning models.
Our approach includes utilizing Recursive Feature Elim-
ination with Cross-Validation (RFECV) for selecting the
significant features and RandomSearchCV for selecting
the optimum hyperparameter values for each ML clas-
sifier. Thereby we attempt to optimize each model’s
performance before the comparison is made.
•Present the post-hoc analysis of the best-performing
model using ‘SHapley Additive exPlanations’ or SHAP
values to ascertain the transparency and trustworthiness of
the model’s prediction. This insight presents a better idea
about which features are more dominant in detecting and
classifying the ransomware families. While explainability
has been widely presented in malware detection scenar-
ios, to the best of the authors’ knowledge, till today, no
prior works presented their models’ explainability that
considered only the ransomware families.
The rest of this paper is structured as follows: Section II
discusses the related works. Section III presents our proposed
method. The experimental results and discussion are made
in Section IV. Section V presents our model’s explainability.
Section VI concludes the paper with the direction for future
works.
II. RELATED WORKS
Most researchers prefer the dynamic analysis approach
because it can delineate the behaviors of the ransomware in
a more explicit manner. Maniath et al. [6] analyzed the API
call sequence of 157 ransomware and presented an LSTM-
based ransomware detection method. Despite securing 96.67%
accuracy, this work lacks complete information about the ran-
somware families/variants, and the number of benign software
used for the experiment. VinayaKumar et al. [7] proposed an
MLP-based ransomware detection method focusing on API
call frequency and secured 100%, and 98% accuracy for
binary and multi-class classification respectively. However,
they deployed a simple MLP network that failed to distinguish
CryptoWall and Cryptolocker ransomware. Z. Chen et al. [8]
used the API Call Flow Graph (CFG) generated from the
extracted API sequence of 83 ransomware and 83 benign
software. Regardless of securing 98.2% exactness using the
Logistic Regression model, the work is based on a smaller
dataset that includes only four ransomware families. Also,
graph-similarity analysis requires higher computational power
that some systems may fail to provide. Takeuchi et al. [9] used
API call sequences extracted from 276 ransomware, and 312
benign files to identify zero-day ransomware attacks. Although
the work secured 97.48% accuracy by deploying the Support
Vector Machine, the accuracy of this work decreases while
using standardized vector representation because of the less
diverse dataset. Using the Intel Pin Tool, Bae et al. [10]
extracted the API call sequences from 1000 ransomware, 900
malware, and 300 benign files. Their sequential process in-
cludes generating an n-gram sequence, input vector, and Class
Frequency Non-Class Frequency (CF-NCF) for every sample
before fitting their model. Regardless of obtaining 98.65%
accuracy using the Random Forest classifier, the model’s
performance can be improved with the help of deception-based
techniques. Hwang et al. [11] analyzed the API call sequence
of 2507 ransomware and 3886 benign files. They used two
Markov chains, one for ransomware and another for benign
software to capture the API call sequence patterns. By using
Random Forest, they compensate Markov Chains and control
FPR and FNR to achieve better performance. Despite securing
97.3% accuracy, their model produces high FPR that can be
improved with the help of signature-based techniques.
A good number of researchers chose the static analysis ap-
proach to detect ransomware. Baldwin and Dehghantanha [12]
analyzed the opcode characteristics of 5 crypto-ransomware
families and 350 benign samples. Their experiment involved
the WEKA AI toolset, and the experimental results showed an
accuracy of 96.5% while recognizing five crypto-ransomware
families and benign software by using the Support Vector
Machine classifier. However, their work could be improved by
extending the dataset and extracting those groups of opcodes
identified during the evaluation of attribute selection. Zhang et
al. [13] analyzed the opcode-based characteristics of 1787 ran-
somware of 8 different ransomware families and 100 benign
software. Their technique included moving opcode groupings
to the N-gram sequence and afterward Term Frequency Inverse
Document Frequency (TF-IDF). Five ML classifiers were used
with 10-fold cross-validation among which the Random Forest
classifier achieved the highest 91.43% exactness. However,
their model could not distinguish Reveton, CryptoWall, and
Locky.
Some researchers adopted a hybrid analysis approach that
combines the features extracted from the dynamic and static
analyses. Subedi et al. [14] used both dynamic and static
analysis on the library, assembly, and function calls. Moreover,
they came up with a new analysis tool, namely, CRSTATIC
which was deployed to build signatures that could classify
ransomware families with the help of reverse engineering.
However, they analyzed only 450 samples of ransomware