Enrichment Score a better quantitative metric for evaluating the enrichment capacity of molecular docking models

2025-05-06 0 0 783.73KB 11 页 10玖币
侵权投诉
Enrichment Score: a better quantitative metric for
evaluating the enrichment capacity of molecular docking
models
Ian Scott Knight
ian.knight@ucsf.edu
Slava Naprienko
naprienko@stanford.edu
John J. Irwin
john.irwin@ucsf.edu
October 2022
Abstract
The standard quantitative metric for evaluating enrichment capacity known as Lo-
gAUC depends on a cutoff parameter that controls what the minimum value of the
log-scaled x-axis is. Unless this parameter is chosen carefully for a given ROC curve,
one of the two following problems occurs: either (1) some fraction of the first inter-
decoy intervals of the ROC curve are simply thrown away and do not contribute to
the metric at all, or (2) the very first inter-decoy interval contributes too much to
the metric at the expense of all following inter-decoy intervals. We fix this problem
with LogAUC by showing a simple way to choose the cutoff parameter based on
the number of decoys which forces the first inter-decoy interval to always have a
stable, sensible contribution to the total value. Moreover, we introduce a normal-
ized version of LogAUC known as enrichment score, which (1) enforces stability
by selecting the cutoff parameter in the manner described, (2) yields scores which
are more intuitively meaningful, and (3) allows reliably accurate comparison of the
enrichment capacities exhibited by different ROC curves, even those produced using
different numbers of decoys. Finally, we demonstrate the advantage of enrichment
score over unbalanced metrics using data from a real retrospective docking study
performed using the program DOCK 3.7 on the target receptor TRYB1 included in
the DUDE-Z benchmark.
1 Introduction
In the field of computational drug discov-
ery, the technique of molecular docking is
widely used to predict how small molecules
will interact with larger molecules (typically
receptor proteins)[1]. The more favorable
a given interaction is predicted to be, the
higher the small molecule’s expected bind-
ing affinity with the receptor. Generally, a
higher predicted binding affinity is taken to
mean that that small molecule is more likely
to actually bind to that receptor. A small
molecule that actually binds to a receptor is
called a ligand.
A given docking program constitutes
1
arXiv:2210.10905v4 [q-bio.QM] 22 May 2023
a docking model that estimates the bind-
ing affinity of candidate small molecules
to a receptor of interest by performing
a constrained minimization of the model’s
scoring function[1]. Many scoring func-
tions have been proposed[1][2][3][4][5][6][7],
ranging from force field calculations to
knowledge-based methods and even machine
learning methods.
The quality of a docking model is mea-
sured by how well the model can distin-
guish ligands from non-ligands[1][19]. This
is done by examining how the model scores
a dataset of small molecules consisting of
(1) known ligands (called actives) and (2)
molecules known or expected not to bind to
the receptor (known as decoys). The prac-
tice of using a dataset of actives and decoys
to measure how well a docking model can
distinguish between ligands and non-ligands
is known as retrospective docking[8]. This
is the principal model validation technique
available for docking models. Once prop-
erly validated, a docking model may be used
to perform prospective docking, which means
scoring molecules of unknown activity[8]. A
prospective docking screen usually involves
the scoring of many millions or even billions
of molecules[19].
Decoys can be generated for a given
receptor in a number of ways[19][11],
but there are also several established
datasets of actives and decoys available as
benchmarks[8][9][10]. Decoy sets are often
designed to be particularly difficult to dis-
criminate from actives for certain targets,
such that if a given docking model can suc-
cessfully discriminate between them in the
setting of retrospective docking then this
warrants a stronger belief in the model’s use-
fulness as a tool for enrichment in the setting
of prospective docking[19].
An important goal in the field of com-
putational drug discovery is to find a
way to quantitatively measure the ca-
pacity of a given docking model to en-
rich a set of molecules by reliably pre-
dicting mostly favorable interactions for
actives and mostly unfavorable interac-
tions for decoys[14]. There are a num-
ber of enrichment metrics that have been
developed[14][16][18]. For example, the met-
ric known as enrichment factor (EF) cap-
tures the idea of enrichment by equating it
to the proportion of actives present in some
top fraction of best scoring molecules[15].
Quantitative metrics like EF also allow re-
searchers to compare the performance of dif-
ferent docking models on the same dataset
of actives and decoys[16].
The metric known as LogAUC [13] is
one of the most popular metrics for eval-
uating the quality of molecular docking
models[9][10][19][21]. However, it comes
with a significant drawback: it depends on a
cutoff parameter[13][17]. This cutoff param-
eter controls what the minimum value of the
log-scaled x-axis is, which is a mandatory
consequence of the choice to use the loga-
rithmic scale. Unless this parameter is cho-
sen carefully depending on the dataset, one
of the two following situations occurs: either
(1) some fraction of the first inter-decoy in-
tervals of the ROC curve are simply thrown
away and do not contribute to the metric
at all, or (2) the very first inter-decoy inter-
val contributes too much to the metric (even
compared to the contribution of the second
inter-decoy interval), and this comes at the
expense of all inter-decoy intervals following
the first interval.
We fix this problem with LogAUC by
showing a simple way to choose the cutoff
parameter based on the number of decoys
which forces the first inter-decoy interval to
always have a stable, sensible contribution
to the total value. Moreover, we introduce a
normalized version of LogAUC known as en-
richment score, which (1) enforces stability
by selecting the cutoff parameter in the man-
2
ner described, (2) is more intuitively mean-
ingful, and (3) allows reliably accurate com-
parison of the enrichment capacities exhib-
ited by different ROC curves.
2 Receiver Operating
Characteristic
The statistical tool known as the receiver op-
erating characteristic (AKA “ROC curve”)
is a plot of how the true positive rate and
false positive rate of an indexed family of bi-
nary classifiers change as the index is varied.
The ROC curve itself is a step function:
ROC(x) =
y1,if k0< x k1,
y2,if k1< x k2,
.
.
.
yn,if kn1< x kn,
where yiare true positive rates and kiare
false positive rates.
Figure 1: An illustration of how to interpret
the receiver operating characteristic.[22]
To compute the ROC, we usually use the
family of binary classifiers indexed on the
real line where each particular classifier’s in-
dex ispecifies a cutoff value below which it
predicts True (“active”) and otherwise pre-
dicts False (“decoy”):
classifieri(x) = (True,if x < i
False,if xi
Now that we have our family of binary
classifiers, we determine the true positive
rate and false positive rate of each classifier
in the family using the following dataset col-
lected for a given set of molecules: (1) each
molecule’s docking model score siRand
(2) its true active/decoy status biB, where
B={True,False}is the set of Booleans.
This description the family of binary
classifiers typically used for computing ROC
assumes that the docking model being used
indicates more favorable predictions (i.e.
those more likely to be “active”) by assign-
ing lower scores. For example, if the dock-
ing model score is an approximation of the
energy of the physical system formed by
the scored molecule bound to the receptor,
then a more negative score will indicate a
more negative energy, which is more favor-
able. This is why the above classifier func-
tion predicts True for input below its index
rather than above. If the docking model be-
ing used indicates more favorable predictions
by assigning higher scores rather than lower
scores, then the classifier function should be
modified to instead predict True for input
above its index.
3 Linear-log plot and
LogAUC
Calculated using a given docking model’s
predictions for a given set of molecules, the
ROC curve is often used to analyze the
trade-off between true positive rate and false
3
摘要:

EnrichmentScore:abetterquantitativemetricforevaluatingtheenrichmentcapacityofmoleculardockingmodelsIanScottKnightian.knight@ucsf.eduSlavaNaprienkonaprienko@stanford.eduJohnJ.Irwinjohn.irwin@ucsf.eduOctober2022AbstractThestandardquantitativemetricforevaluatingenrichmentcapacityknownasLo-gAUCdependson...

展开>> 收起<<
Enrichment Score a better quantitative metric for evaluating the enrichment capacity of molecular docking models.pdf

共11页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:11 页 大小:783.73KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 11
客服
关注