Preprint EDGE K NOWLEDGE -DRIVEN NEW DRUG RECOM - MENDATION

2025-05-02 1 0 912.75KB 17 页 10玖币

侵权投诉

Preprint

EDGE: KNOWLEDGE-DRIVEN NEW DRUG RECOM-

MENDATION

Zhenbang Wu1, Huaxiu Yao2, Zhe Su3, David M Liebovitz4, Lucas M Glass5,

James Zou2, Chelsea Finn2, Jimeng Sun1

ABSTRACT

Drug recommendation assists doctors in prescribing personalized medications to

patients based on their health conditions. Existing drug recommendation solutions

adopt the supervised multi-label classiﬁcation setup and only work with existing

drugs with sufﬁcient prescription data from many patients. However, newly ap-

proved drugs do not have much historical prescription data and cannot leverage

existing drug recommendation methods. To address this, we formulate the new

drug recommendation as a few-shot learning problem. Yet, directly applying ex-

isting few-shot learning algorithms faces two challenges: (1) complex relations

among diseases and drugs and (2) numerous false-negative patients who were

eligible but did not yet use the new drugs. To tackle these challenges, we pro-

pose EDGE, which can quickly adapt to the recommendation for a new drug with

limited prescription data from a few support patients. EDGE maintains a drug-

dependent multi-phenotype few-shot learner to bridge the gap between existing

and new drugs. Speciﬁcally, EDGE leverages the drug ontology to link new drugs

to existing drugs with similar treatment effects and learns ontology-based drug

representations. Such drug representations are used to customize the metric space

of the phenotype-driven patient representations, which are composed of a set of

phenotypes capturing complex patient health status. Lastly, EDGE eliminates the

false-negative supervision signal using an external drug-disease knowledge base.

We evaluate EDGE on two real-world datasets: the public EHR data (MIMIC-

IV) and private industrial claims data. Results show that EDGE achieves 7.3%

improvement on the ROC-AUC score over the best baseline.

1 INTRODUCTION

With the wide adoption of electronic health records (EHR) and the advance of deep learning models,

we have seen great opportunities in assisting clinical decisions with deep learning models to improve

resource utilization, healthcare quality, and patient safety (Xiao et al., 2018). Drug recommendation

is one of the essential applications which aims at assisting doctors in recommending personalized

medications to patients based on their health conditions. Existing drug recommendation methods

typically formulate it as a supervised multi-label classiﬁcation problem (Zhang et al., 2017; Zitnik

et al., 2018; Shang et al., 2019b; Yang et al., 2021; Wu et al., 2022; Tan et al., 2022b). They often

train on massive prescription data to learn patient representations and use the learned representations

to predict medications (i.e., labels). However, in reality, new drugs come to the market all the time.

For example, U.S. Food and Drug Administration (FDA) approves a wide range of new drugs every

year (FDA, 2022). Most of these newly approved drugs do not have much historical data to support

model training (Blass, 2021). Even if sufﬁcient prescription data for new drugs exists, existing

models must be periodically re-trained or updated to recommend new drugs, which is expensive and

complex. As a result, existing drug recommendation methods can only recommend the same set of

drugs seen during training and are no longer applicable when new drugs appear.

To address this, we formulate the recommendation of new drugs as a few-shot classiﬁcation problem.

Given a new drug with limited prescription data from a few support patients (e.g., from clinical

trials (Duijnhoven et al., 2013)), the model should quickly adapt to the recommendation for this drug.

1University of Illinois Urbana-Champaign, 2Stanford University, 3Zhejiang University, 4Northwestern

University, 5IQVIA, Corresponding authors: zw12@illinois.edu

arXiv:2210.05572v1 [cs.LG] 11 Oct 2022

Preprint

Meta-learning approaches have been widely used in such problems by learning how to quickly adapt

the classiﬁer to a new label unseen during training, given only a few support examples (Finn et al.,

2017; Snell et al., 2017). However, most prior meta-learning works focus on vision or language-

related tasks. In the new drug recommendation, applying existing meta-learning algorithms faces the

following challenges. (1) Complex relations among diseases and drugs: diseases and medicines

can have inherent and higher order relations. Deciding whether to prescribe a drug to a speciﬁc

patient depends on many factors, such as disease progression, comorbidities, ongoing treatments,

individual drug response, and drug side effects. General meta-learning algorithms do not explicitly

capture such dependencies. (2) Numerous false-negative patients: many drugs can treat the same

disease, but usually, only one of them is prescribed. For any given drug, there exist many false-

negative patients who were eligible but did not yet use the new drug (e.g., due to drug availability,

doctor’s preference, or insurance coverage). The number of false-negative supervision signals will

substantially confuse the model learning, especially in the few-shot learning setting.

To address these challenges, we introduce EDGE, a drug-dependent multi-phenotype few-shot

learner to quickly adapt to the recommendation for a new drug with limited support patients. Specif-

ically, since drugs within the same category often have similar treatment effects, EDGE utilizes the

drug ontology for drug representation learning to link new drugs with existing drugs. Further, EDGE

learns multi-phenotype patient representations to capture the complex patient health status from dif-

ferent aspects such as chronic diseases, current symptoms, and ongoing treatments. Given a new

drug with a few support patients, EDGE makes recommendations by performing a drug-dependent

phenotype-level comparison between representations of query patients and corresponding support

prototypes. Lastly, to reduce the false-negative supervision signal, EDGE leverages the MEDI (Wei

et al., 2013) drug-disease knowledge base to guide the negative sampling process.

The main contributions of this work include:

• To our best knowledge, this is the ﬁrst work formulating the task of new drug recommendation;

• We propose a meta-learning framework EDGE to solve this problem by considering complicated

relations among diseases and drugs, and eliminating numerous false-negative patients.

• We conduct extensive experiments on the public EHR data MIMIC-IV (Johnson et al., 2020)

and private industrial claims data. Results show that our approach achieves 5.6% over ROC-AUC,

6.3% over Precision@100, and 5.5% over Recall@100 when providing recommended patient lists

for new drugs. We also include detailed analyses and ablation studies to show the effectiveness of

multi-phenotype patient representation, drug-dependent patient distance, and knowledge-guided

negative sampling.

2 PROBLEM FORMULATION AND PRELIMINARIES

Denote the set of all drugs as M; the goal of drug recommendation is to prescribe drugs in M

that are suitable for a patient with a record v= [c1, . . . , cV], which consists of a list of diseases

(and procedures), and Vis the total number of diseases and procedures in the record v. Prior

works (Zhang et al., 2017; Shang et al., 2019b; Yang et al., 2021; Tan et al., 2022b) formulate drug

recommendation as a multi-label classiﬁcation problem by generating a multi-hot output of size

|M|. However, this formulation assumes that the drug label space Mremains unchanged after

training and is not applicable when new drugs appear. Thus, we propose an alternative formulation

for the new drug recommendation as follows.

Assume the entire drug set Mis partitioned into a set of existing drugs Mold and a set of new

drugs Mnew, where Mold ∩ Mnew =∅. Each existing drug mi∈ Mold has sufﬁcient patients

using the drug mi(e.g., from EHR data). Each new drug mt∈ Mnew is associated with a small

support set St={vj}Ns

j=1 consisting of patients using the drug mt(e.g., from clinical trials), and

an unlabeled query patient set Qt={vj}Nq

j=1, where Nsand Nqare the number of patients in the

support and query sets, respectively. The goal of new drug recommendation is to train a model fφ(·)

parameterized by φon existing drugs Mold, such that it can adapt to new drug mt∈ Mnew given

the small support set St, and make correct recommendation on the query set Qt.

To reduce clutter, we use a uniﬁed notation for both diseases and procedures. Since we focus on record-

level prediction, “patient” and “record” are used interchangeably.

Preprint

𝑓

𝑐"𝑐#𝑐$𝑐%𝑐&

%"%$%#%%

'!&

𝐠'() 𝐠'*) 𝐠'+)

𝐡

𝐦"

𝐦#

𝐦$

𝛼!,#

𝛼!,"

𝛼!,!

Ontology-Enhanced

Drug Representation

Multi-Phenotype

Patient Representation Drug-Dependent Patient Distance

Query

𝒢(")

𝐡

𝐩(!)

𝐩#(%)

𝑣*

𝑧(%)

𝑧#(%)

𝐠*

(!)

Medication Disease Procedure Patient Record Phenotype ,

-Pos/Neg Support

Figure 1: EDGE learns the ontology-enhanced drug representation hand multi-phenotype patient

representations {g(l)}3

l=1. For a new drug m1,EDGE decides whether to prescribe it to a query

patient vqby performing a drug-dependent phenotype-level comparison between multi-phenotype

query representations {g(l)

q}3

l=1 and corresponding support prototypes {p(l)}3

l=1.

Our work is inspired by the prototypical network (Snell et al., 2017), which learns a representation

model fφ(·)such that patients using a speciﬁc drug will cluster around a prototype representation.

Recommendation can then be performed by computing the distance to the prototype. To equip the

model with the ability to adapt to new drug with limited support patients, prototypical network

trains the model via episodic training, where each episode is designed to mimic the low-data testing

regime. Concretely, an episode is formed by ﬁrst sampling an existing drug mifrom Mold and then

sampling a set of patients using the drug mi. The sampled patients are divided into two disjoint sets:

(1) a support set Siused to calculate the prototype, and (2) a query set Qiused to calculate the loss.

From the support set Si, prototypical network calculates the prototype representation as,

p=1

|Si|X

j∈Si

fφ(vj),p∈Re,(1)

where pis an e-dimensional vector in the metric space, and |·|denotes cardinality. Next, given a

query patient vq, the probability of recommending drug miis measured by the distance d(·)between

its representation and the corresponding prototypes as,

pφ(yq= +|vq) = exp (−d(fφ(vq),p))

exp (−d(fφ(vq),p)) + exp (−d(fφ(vq),p0)) ,(2)

where p0is the negative prototype obtained from another negative support set S0

iof patients not using

the drug mi(i.e., negative sampling). The loss is computed as the negative log-likelihood (NLL)

loss L(φ) = −log pφ(yq=∗|vq)of the true label ∗∈{+,−}. And the model fφ(·)is optimized on

both the query set Qiand another negative query set Q0

iobtained via negative sampling (similarly

as S0

i).

3 KNOWLEDGE-DRIVEN NEW DRUG RECOMMENDATION

In this section, we introduce EDGE, which can adapt to new drugs with limited support patients via

a drug-dependent multi-phenotype few-shot learner. Speciﬁcally, EDGE consists of the following

modules: (1) Ontology-enhanced drug encoder that fuses ontology information into drug repre-

sentation to link new drugs to existing drugs with similar treatment effects; (2) Multi-phenotype

patient encoder that represents each patient with a set of phenotype-level representations to cap-

ture the complex patient’s health status; (3) Drug-dependent distance measures that learns drug-

dependent phenotype importance scores to customize the patient similarity; (4) knowledge-guide

negative sampling that eliminates the false-negative supervision signal. Figure 1 provides an illus-

tration of EDGE. In the following, we will describe how EDGE decides whether to prescribe a drug

mito a query patient vq, given a small set of support patients Siusing the drug mi.

3.1 ONTOLOGY-ENHANCED DRUG REPRESENTATION LEARNING

Though many new drugs have not been used regularly in clinical practice, they still belong to the

same drug category (from a drug ontology) as some existing drugs and share similar treatment ef-

fects, implicitly indicating similar patient populations. For example, the newly approved Quviviq

Preprint

for treating insomnia belongs to the same category (Orexin Receptor Antagonist) as some existing

drugs, like Belsomra and Dayvigo, which are also sleeping aids. We here leverage the drug ontol-

ogy to enrich the drug representation by attentively combing the drug itself and its corresponding

ancestors (e.g., higher-level drug categories).

Concretely, for the drug mi, we obtain its basic embeddings mi∈Reby feeding its description

into Clinical-BERT (Alsentzer et al., 2019). Then, follow Choi et al. (2017), we use the basic

embeddings of drug miand its ancestors to calculate the ontology-enriched drug representation as,

h=X

j∈Ai

αi,j mj,h∈Re,(3)

where Aidenotes the set of drug miand its ancestors, and the attention score αi,j represents the

importance of ancestor mjfor drug mi, which is calculated as,

αi,j =exp(fφa(mi⊕mj))

Pk∈Aiexp(fφa(mi⊕mk)) , αi,j ∈[0,1],(4)

where ⊕denotes the concatenation operator, and fφa(·) : R2e7→ Ris deﬁned as a two-layer fully

connected neural network with Tanh activation. In this way, we fuse the ontology information into

the representation hfor drug mi, which is later used to customize the metric space of phenotype-

driven patient representations, introduced next.

3.2 MULTI-PHENOTYPE PATIENT REPRESENTATION LEARNING

Patient health status includes many factors, such as disease progression, comorbidities, ongoing

treatments, individual drug response, and drug side effects. Encoding each patient into a single

vector may not capture the complete information, especially for patients with complex health con-

ditions. Therefore, we deﬁne a set of phenotypes and represent each patient with a set of phenotype

vectors. Each phenotype can provide helpful guidance in patient representation learning and further

beneﬁt the new drug recommendation.

Speciﬁcally, for every support/query patient vwith a list of diseases [c1, . . . , cV],EDGE ﬁrst com-

putes the contextualized disease representations by applying the embedding function fφr(·)as,

[r1,...,rV] = fφr([c1,...,cV]) ,rj∈Re,(5)

where rjis the contextualized representation for disease cj. We model fφr(·)using a bi-directional

gated recurrent unit (GRU) due to its popularity in prior works (Zhang et al., 2017; Shang et al.,

2019b; Yang et al., 2021), and also show results with multilayer perceptron (MLP) and Trans-

former (Vaswani et al., 2017) in our experiments.

Next, we leverage domain knowledge to group diseases into different phenotypes. To obtain the

representation for the l-th phenotype, we take the representations from all diseases that belong to

that phenotype, project them to a lower dimension, and calculate their mean representations as,

g(l)=1

|G(l)|X

j∈G(l)

fφg(rj),g(l)∈Rg,(6)

where G(l)represents the set of diseases whose phenotype is l, and fφg(·) : Re→Rgis single-layer

neural network and g < e. We show results with different values of gin the experiment. If G(l)

is empty, we take the pooled sequence representation as a substitute. The phenotypes are extracted

from Clinical Classiﬁcation Software (CCS) (H. CUP, 2010). There are 511 phenotypes in total. In

this way, each support/query patient is represented with a set of phenotype vectors {g(l)}L

l=1.

Based on the multi-phenotype patient representations, we further calculate the phenotype-level pro-

totypes from the support set Siof drug mi, where equation 1 is revised as,

p(l)=1

|Si|X

j∈Si

g(l)

j,p(l)∈Rg,(7)

E.g., Ibuprofen is a nonsteroidal anti-inﬂammatory drug that is used for treating pain, fever, and inﬂam-

mation.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

PreprintEDGE:KNOWLEDGE-DRIVENNEWDRUGRECOM-MENDATIONZhenbangWu1,HuaxiuYao2,ZheSu3,DavidMLiebovitz4,LucasMGlass5,JamesZou2,ChelseaFinn2,JimengSun1ABSTRACTDrugrecommendationassistsdoctorsinprescribingpersonalizedmedicationstopatientsbasedontheirhealthconditions.Existingdrugrecommendationsolutionsadoptt...

展开>> 收起<<

Preprint EDGE K NOWLEDGE -DRIVEN NEW DRUG RECOM - MENDATION.pdf

共17页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Preprint EDGE K NOWLEDGE -DRIVEN NEW DRUG RECOM - MENDATION

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: