
viral evolution.
It remains to understand what controls the natural selection of SARS-CoV-2. The mechanism of SARS-
CoV-2 evolution was elusive at the beginning of the COVID-19 pandemic. Indeed, the life cycle of SARS-
CoV-2 is extremely sophisticated, involving the viral entry of host cells, the release of the viral genome,
the synthesis of viral NSPs, RNA replication, the transcription, translation, and synthesis of viral structural
proteins, and the packing, assembly, and release of new viruses [6]. The SARS-CoV-2 mutations occur nearly
randomly on all of its 29 genes, as shown in Figure 2. Nonetheless, in early 2020, we hypothesized that SARS-
CoV-2 natural selection is controlled through infectivity-strengthening mutations [5], which primarily occur
at the viral spike (S) protein receptor-binding domain (RBD) that binds with host angiotensin-converting
enzyme 2 (ACE2) to facilitate the viral cell entry [7–11]. Our hypothesis was initially supported by our
genotyping of 15,140 SARS-CoV-2 genomes extracted from patients. We demonstrated that among 89 unique
RBD mutations, the observed frequencies of infectivity-strengthening mutations outpace those of infectivity-
weakening ones in their time evolution. Our infectivity-strengthening mechanism of natural selection was
proven beyond doubt in April 2021, with 506,768 SARS-CoV-2 genomes isolated from patients [12].
29290 Single Mutations in 3607461 hCoV-19 Genomes
Relevant link: Analysis of S protein RBD mutations
0
ln(Frequency)
NSP1 NPS3 NSP4 3CL NSP6 RdRp Helicase SORF3a EN
GISAID data provided on this website is subject to GISAID’s Terms and Conditions <[Download Summary]>
enabled by data from
5
20
10
15
20200101 20210425 20220930
Figure 2: Illustration of unique mutations on SARS-CoV-2 genomes extracted from patients. Each dot represents a unique
mutation. The x-axis is the gene position of a mutation and the y-axis represents its observed frequency in the natural
logarithmic scale.
However, we found that not all of the most observable RBD mutations strengthen viral infectivity [13].
This exception took place in the middle and late 2021 when a good portion of the population in many
developed countries was vaccinated. By the genotyping of 2,298,349 complete SARS-CoV-2 genomes, we
discovered vaccination-induced antibody-resistant mutations, which make the virus less infectious [13]. This
discovery leads to a complementary mechanism of natural selection, namely antibody-resistant mutations.
In other words, viral evolution also favors RBD mutations in a population that enable the virus to escape
antibody protection generated from vaccination or infection.
The Omicron variant was the first example that was induced by both infectivity strengthening and
antibody resistance mechanisms [13]. It has 32 mutations on the S protein, the main antigenic target of
antibodies [14]. Among them, 15 are on the Omicron RBD, leading to a dramatic increase in SARS-CoV-2
infectivity, vaccine breakthrough, and antibody resistance [15]. The World Health Organization (WHO)
declared Omicron as a variant of concern (VOC) on November 26, 2021. On December 1, 2021, when there
were no experimental data available, we announced our topological artificial intelligence (AI) predictions
based on the genotyping of viral genomes, biophysics, experimental data of protein-protein interactions,
algebraic topology, and deep learning [15]. We predicted that Omicron is about 2.8 times as infectious as the
Delta and has nearly 90% likelihood to escape vaccines, which would compromise essentially all of existing
2