Enrichment Score a better quantitative metric for evaluating the enrichment capacity of molecular docking models

2025-05-06 0 0 783.73KB 11 页 10玖币

侵权投诉

Enrichment Score: a better quantitative metric for

evaluating the enrichment capacity of molecular docking

models

Ian Scott Knight

ian.knight@ucsf.edu

Slava Naprienko

naprienko@stanford.edu

John J. Irwin

john.irwin@ucsf.edu

October 2022

Abstract

The standard quantitative metric for evaluating enrichment capacity known as Lo-

gAUC depends on a cutoﬀ parameter that controls what the minimum value of the

log-scaled x-axis is. Unless this parameter is chosen carefully for a given ROC curve,

one of the two following problems occurs: either (1) some fraction of the ﬁrst inter-

decoy intervals of the ROC curve are simply thrown away and do not contribute to

the metric at all, or (2) the very ﬁrst inter-decoy interval contributes too much to

the metric at the expense of all following inter-decoy intervals. We ﬁx this problem

with LogAUC by showing a simple way to choose the cutoﬀ parameter based on

the number of decoys which forces the ﬁrst inter-decoy interval to always have a

stable, sensible contribution to the total value. Moreover, we introduce a normal-

ized version of LogAUC known as enrichment score, which (1) enforces stability

by selecting the cutoﬀ parameter in the manner described, (2) yields scores which

are more intuitively meaningful, and (3) allows reliably accurate comparison of the

enrichment capacities exhibited by diﬀerent ROC curves, even those produced using

diﬀerent numbers of decoys. Finally, we demonstrate the advantage of enrichment

score over unbalanced metrics using data from a real retrospective docking study

performed using the program DOCK 3.7 on the target receptor TRYB1 included in

the DUDE-Z benchmark.

1 Introduction

In the ﬁeld of computational drug discov-

ery, the technique of molecular docking is

widely used to predict how small molecules

will interact with larger molecules (typically

receptor proteins)[1]. The more favorable

a given interaction is predicted to be, the

higher the small molecule’s expected bind-

ing aﬃnity with the receptor. Generally, a

higher predicted binding aﬃnity is taken to

mean that that small molecule is more likely

to actually bind to that receptor. A small

molecule that actually binds to a receptor is

called a ligand.

A given docking program constitutes

arXiv:2210.10905v4 [q-bio.QM] 22 May 2023

a docking model that estimates the bind-

ing aﬃnity of candidate small molecules

to a receptor of interest by performing

a constrained minimization of the model’s

scoring function[1]. Many scoring func-

tions have been proposed[1][2][3][4][5][6][7],

ranging from force ﬁeld calculations to

knowledge-based methods and even machine

learning methods.

The quality of a docking model is mea-

sured by how well the model can distin-

guish ligands from non-ligands[1][19]. This

is done by examining how the model scores

a dataset of small molecules consisting of

(1) known ligands (called actives) and (2)

molecules known or expected not to bind to

the receptor (known as decoys). The prac-

tice of using a dataset of actives and decoys

to measure how well a docking model can

distinguish between ligands and non-ligands

is known as retrospective docking[8]. This

is the principal model validation technique

available for docking models. Once prop-

erly validated, a docking model may be used

to perform prospective docking, which means

scoring molecules of unknown activity[8]. A

prospective docking screen usually involves

the scoring of many millions or even billions

of molecules[19].

Decoys can be generated for a given

receptor in a number of ways[19][11],

but there are also several established

datasets of actives and decoys available as

benchmarks[8][9][10]. Decoy sets are often

designed to be particularly diﬃcult to dis-

criminate from actives for certain targets,

such that if a given docking model can suc-

cessfully discriminate between them in the

setting of retrospective docking then this

warrants a stronger belief in the model’s use-

fulness as a tool for enrichment in the setting

of prospective docking[19].

An important goal in the ﬁeld of com-

putational drug discovery is to ﬁnd a

way to quantitatively measure the ca-

pacity of a given docking model to en-

rich a set of molecules by reliably pre-

dicting mostly favorable interactions for

actives and mostly unfavorable interac-

tions for decoys[14]. There are a num-

ber of enrichment metrics that have been

developed[14][16][18]. For example, the met-

ric known as enrichment factor (EF) cap-

tures the idea of enrichment by equating it

to the proportion of actives present in some

top fraction of best scoring molecules[15].

Quantitative metrics like EF also allow re-

searchers to compare the performance of dif-

ferent docking models on the same dataset

of actives and decoys[16].

The metric known as LogAUC [13] is

one of the most popular metrics for eval-

uating the quality of molecular docking

models[9][10][19][21]. However, it comes

with a signiﬁcant drawback: it depends on a

cutoﬀ parameter[13][17]. This cutoﬀ param-

eter controls what the minimum value of the

log-scaled x-axis is, which is a mandatory

consequence of the choice to use the loga-

rithmic scale. Unless this parameter is cho-

sen carefully depending on the dataset, one

of the two following situations occurs: either

(1) some fraction of the ﬁrst inter-decoy in-

tervals of the ROC curve are simply thrown

away and do not contribute to the metric

at all, or (2) the very ﬁrst inter-decoy inter-

val contributes too much to the metric (even

compared to the contribution of the second

inter-decoy interval), and this comes at the

expense of all inter-decoy intervals following

the ﬁrst interval.

We ﬁx this problem with LogAUC by

showing a simple way to choose the cutoﬀ

parameter based on the number of decoys

which forces the ﬁrst inter-decoy interval to

always have a stable, sensible contribution

to the total value. Moreover, we introduce a

normalized version of LogAUC known as en-

richment score, which (1) enforces stability

by selecting the cutoﬀ parameter in the man-

ner described, (2) is more intuitively mean-

ingful, and (3) allows reliably accurate com-

parison of the enrichment capacities exhib-

ited by diﬀerent ROC curves.

2 Receiver Operating

Characteristic

The statistical tool known as the receiver op-

erating characteristic (AKA “ROC curve”)

is a plot of how the true positive rate and

false positive rate of an indexed family of bi-

nary classiﬁers change as the index is varied.

The ROC curve itself is a step function:

ROC(x) = 









y1,if k0< x ≤k1,

y2,if k1< x ≤k2,

yn,if kn−1< x ≤kn,

where yiare true positive rates and kiare

false positive rates.

Figure 1: An illustration of how to interpret

the receiver operating characteristic.[22]

To compute the ROC, we usually use the

family of binary classiﬁers indexed on the

real line where each particular classiﬁer’s in-

dex ispeciﬁes a cutoﬀ value below which it

predicts True (“active”) and otherwise pre-

dicts False (“decoy”):

classiﬁeri(x) = (True,if x < i

False,if x≥i

Now that we have our family of binary

classiﬁers, we determine the true positive

rate and false positive rate of each classiﬁer

in the family using the following dataset col-

lected for a given set of molecules: (1) each

molecule’s docking model score si∈Rand

(2) its true active/decoy status bi∈B, where

B={True,False}is the set of Booleans.

This description the family of binary

classiﬁers typically used for computing ROC

assumes that the docking model being used

indicates more favorable predictions (i.e.

those more likely to be “active”) by assign-

ing lower scores. For example, if the dock-

ing model score is an approximation of the

energy of the physical system formed by

the scored molecule bound to the receptor,

then a more negative score will indicate a

more negative energy, which is more favor-

able. This is why the above classiﬁer func-

tion predicts True for input below its index

rather than above. If the docking model be-

ing used indicates more favorable predictions

by assigning higher scores rather than lower

scores, then the classiﬁer function should be

modiﬁed to instead predict True for input

above its index.

3 Linear-log plot and

LogAUC

Calculated using a given docking model’s

predictions for a given set of molecules, the

ROC curve is often used to analyze the

trade-oﬀ between true positive rate and false

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

EnrichmentScore:abetterquantitativemetricforevaluatingtheenrichmentcapacityofmoleculardockingmodelsIanScottKnightian.knight@ucsf.eduSlavaNaprienkonaprienko@stanford.eduJohnJ.Irwinjohn.irwin@ucsf.eduOctober2022AbstractThestandardquantitativemetricforevaluatingenrichmentcapacityknownasLo-gAUCdependson...

展开>> 收起<<

Enrichment Score a better quantitative metric for evaluating the enrichment capacity of molecular docking models.pdf

共11页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Enrichment Score a better quantitative metric for evaluating the enrichment capacity of molecular docking models

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: