In the current study, we also evaluated the SARS-CoV-2 Spike mutations as an extra evaluation and a demo for the future
consideration in this field. The SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) is the etiological agent for the
COVID-19 (Coronavirus disease-2019) pandemic. The SARS-CoV-2 is a Betacoronavirus and a member of the sarbecoviruses
sublineage
16
. The virus genome is an ssRNA (Single strand RNA) of 34kb in length. SARS-CoV-2 contains different genes
including ORF1a/b, Spike (S), Envelope (E), Membrane (M), Nucleoprotein (N), and accessory ORFs
17
. The virus binding into
cells by using the S protein attachment to the cellular receptor ACE-2 (Angiotensin-converting enzyme 2)
18
. The S protein is
the most important antigenic part of the virus19.
In recent years, Artificial Intelligence (AI) algorithms have achieved human or even super-human performance on different
tasks such as image classification
20
, text classification
21
, action recognition
22
, etc. Anomaly Detection (AD) is a sub-domain of
AI that is responsible for learning a normal representation space, and detecting anomalous samples at the test time by exploiting
the learned representation. Due to different challenges in labeling of the anomalous samples, such as the high cost or rareness
of such samples, most methods in this domain only use normal samples for the training. This is called the unsupervised AD.
Alternatively, one may use a very limited number of labeled anomalous samples in the training process, which is called the
semi-supervised AD23.
Unsupervised
24–28
and semi-supervised
29,30
anomaly detection methods have recently achieved satisfactory results on a
variety of domains such as image, text, time-series, and video. Deep Semi-Supervised Anomaly Detection (DeepSAD)
29
, as a
recently proposed semi-supervised AD method, made clear that semi-supervised anomaly detectors are significantly superior
compared to the supervised training classification algorithms, specifically when the training dataset is complex, and the number
of normal samples is much higher than the anomalous ones. This is because anomaly detectors attempt to find a compact
representation space for the normal samples while maximizing the margin that exists between normal and abnormal ones. This
helps them to learn the most general and unique features of the normal samples, and not rely overly on the contrast that exists
between normal and anomalous samples to classify them.
Since in the mutation prediction tasks the number of unmutated samples is much higher than the mutated ones, the problem
can be formulated as an anomaly detection task. In this formulation, unmutated and mutated samples are considered as
normal and anomalous samples, respectively. The benefits of this approach are two-fold. Firstly, a semantically meaningful
representation could be learned even with a small number of training samples, which makes generalization to unseen test time
samples possible. Secondly, as the finding and labeling procedure of mutated viruses is an expensive and time-consuming
process, anomaly detectors could work fine with, or without a limited number of anomalous or mutated, training samples23.
By this motivation, we propose the first anomaly detection framework for predicting virus mutations. We use the Long
Short-Term Memory (LSTM)
31
neural network in combination with the Deep Semi-Supervised Anomaly Detection (DeepSAD)
loss
29
to not only learn long-term input dependencies, but also to find a semantic representation space for the mutated and
unmutated training samples. Figure 1shows the overall architecture of the proposed method. We conduct extensive experiments
to show the effectiveness of our method in improving the average recall, F1-score, precision, and Area Under the Curve (AUC)
for three different publicly available Influenza datasets.
Background
For the sake of clarity, we discuss some of the important prerequisites from deep learning literature in this section. At first,
some Recurrent Neural Network architectures, such as LSTMs
31
, are discussed. Then, a brief introduction about the anomaly
detection methods is presented.
Recurrent Neural Networks (RNN):
RNNs are broadly used to model the data sequential dependencies, where the
sequence could be formed based on the temporal or spatial arrangements. Initial architectures of RNNs, such as the vanilla
RNN, suffer from memorizing long-term as well as short-term dependencies. To address this issue, alternative architectures, such
as LSTM
31
networks, bi-directional RNNs
32
, and gated recurrent units
33
GRU’s have been introduced. All these approaches
attempt to summarize previous inputs into their hidden state that is updated in each time step
t
. The mentioned information is
regulated using some parameters or gates. For instance, the LSTM network consists LSTM cells. Each cell contains a state,
ht
,
and memory,
st
. These two are updated based on three different gates that are called input gate,
it
,forget gate,
ft
, and output
gate,
ot
. The input gate selects some of the memory dimensions to modify (Eq. 2). The forget gate decides which memory cell
dimensions should be ignored in the next time step (Eq. 1). The output gate decides which dimensions of the memory should
be transferred to the state (Eq. 3). The cell and state vectors are updated based on these gates and activation values that are
produced through the tanh activation (Eqs. 4,5). Specifically, the memory constitutes previous memory dimensions that are not
forgotten, plus the input activation values that the input gate selects. Finally, the state constitutes memory activation values that
are selected by the output gate.
Note that a sigmoid activation function is used in the gates to map the gate outputs between zero and one, which models
the selection, i.e. gate output of 1 represents complete selection of an embedding, and the 0 value corresponds to a complete
2/15