1 Drug repositioning for Alzheimers disease with transfer learning

2025-04-30 1 0 593.48KB 13 页 10玖币

侵权投诉

Drug repositioning for Alzheimer’s disease

with transfer learning

Yetao Wu, Han Liu, Jie Yan, Xiaolin Hu

Department of Computer Science and Technology, Institute for Artificial

Intelligence, State Key Laboratory of Intelligent Technology and Systems, BNRist,

THBI, Tsinghua University, Beijing, China

xlhu@tsinghua.edu.cn

Abstract

Deep Learning and DRUG-seq (Digital RNA with perturbation of genes) have

attracted attention in drug discovery. However, the public DRUG-seq dataset is too

small to be used for directly training a deep learning neural network from scratch.

Inspired by the transfer learning technique, we pretrain a drug efficacy prediction

neural network model with the Library of Integrated Network-based Cell-Signature

(LINCS) L1000 data and then use human neural cell DRUG-seq data to fine-tune it.

After training, the model is used for virtual screening to find potential drugs for

Alzheimer’s disease (AD) treatment. Finally, we find 27 potential drugs for AD

treatment including Irsogladine (PDE4 inhibitor), Tasquinimod (HDAC4 selective

inhibitor), Suprofen (dual COX-1/COX-2 inhibitor) et al.

Keywords

Drug repositioning · Alzheimer’s disease · Transfer learning · Deep

learning · DRUG-seq · L1000

1 Introduction

Alzheimer’s disease is a common, complex, neurodegenerative disease(Lee and Kim

2020), which is extremely challenging for drug development. Drug repositioning is

attractive to AD drug development because toxicity, pharmacokinetics, and

pharmacodynamics profiles of a given drug are fully characterized(Kwon et al. 2019).

Recently, deep learning methods have shown great potential for drug discovery(Jang

and Cho 2019) and repositioning(Pham et al. 2022; Pham et al. 2021; Zhu et al.

2021). In particular, several research groups have predicted drug efficacy from the

L1000 dataset(Subramanian et al. 2017). It contains gene expression profiles which

includes the responses of different compound treatments (Pham et al. 2022; Pham et

al. 2021; Zhu et al. 2021).

Despite being widely used, there are three common problems encountered when

utilizing the L1000 dataset. First, most of the L1000 data is obtained from tumor cell

lines which carry many mutations. When the L1000 data is applied to predict the gene

expression profile of normal cells, the context may be different. Second, as the L1000

method and commercial services based on it are not prevalent, it is difficult to get

customized L1000 data for specific cell type. Third, the L1000 dataset contains many

unreliable and noisy gene expression profiles (Pham et al. 2021).

Nowadays RNA-seq methods and commercial services are very prevalent. Based on

RNA-seq methods, DRUG-seq is a reliable and cost-effective tool for comprehensive

transcriptome readout in high-throughput drug screening, which costs 2 – 4 US dollar

per sample(Ye et al. 2018). However, since DRUG-seq is a new technique, there are

no large public DRUG-seq datasets yet. It is very difficult to train a deep learning

neural network from scratch with a small public DRUG-seq dataset. Inspired by the

transfer learning technique (apply knowledge gained in one task to a related task), we

propose to utilize transfer learning for reducing the amount of training data needed

when training neural networks with the DRUG-seq dataset.

Since the DRUG-seq dataset and the L1000 dataset both consist of gene expression

profiles (Jeong et al. 2017) which cover the responses of different compound

treatments, we can take advantage of the similarity and correlation between the

DRUG-seq dataset and the L1000 dataset for transfer learning. In this study, we

pretrained a drug efficacy prediction neural network model with the L1000 data and

then used human neural cell DRUG-seq data to fine-tune it. After training, the model

was used to predict profiles for new chemicals in the DrugBank database. These

profiles were then used for virtual screening to find potential drugs for Alzheimer’s

disease treatment. To the best of our knowledge, this is the first time that the DRUG-

seq data has been used in deep learning.

2 Methods

Datasets

In the following paragraphs we introduce several datasets used in this study, including

STRING, DrugBank, L1000, DRUG-seq dataset of human neural cells, and

transcriptome data of Alzheimer’s disease patients.

The High-quality L1000 dataset

According to the previous literature(Pham et al. 2021; Qiu et al. 2020), experiments

were conducted on a selected high-quality L1000 dataset in this study. Because the

original L1000 dataset contains many unreliable and noisy gene expression profiles, a

much better prediction performance was obtained by using the selected high-quality

L1000 dataset than by using the original L1000 dataset (Pham et al. 2021). The

selected high-quality L1000 dataset consists of 1944 training samples (284

chemicals), 556 developing samples (92 chemicals) and 502 testing samples (92

chemicals) (Pham et al. 2021).

The STRING database

STRING is a database of known and predicted protein–protein interactions, including

direct and indirect associations(Pham et al. 2021; Szklarczyk et al. 2019). The human

protein–protein interaction network, which consists of approximately 12,000,000

interactions (edges) and 19,000 proteins (nodes), was extracted from the STRING

database to compute vector representations for 978 L1000 genes in a previous

study(Pham et al. 2021). And the drug-target vector representations used in this study

were also computed from the STRING database(Pham et al. 2021; Szklarczyk et al.

2019). The details of generating these representations were presented in previous

literature (Pham et al. 2021).

The DRUG-seq dataset

We used the DRUG-seq data of a previous study(Rodriguez et al. 2021), in which

human neural cells were treated with compounds or DMSO for 24h.The number of

samples used was 661, 75% of which was used for training. Raw sequencing data can

be found at the National Center for Biotechnology Information (NCBI) Sequence

Read Archive (SRP301436)(Rodriguez et al. 2021). Processed data can be found on

Gene Expression Omnibus (GSE164788) (Rodriguez et al. 2021).

Expression profiles (RNA-Seq)

Expression profiles from both Alzheimer’s disease patients and healthy negative

controls were obtained from two previous studies(Mizuno et al. 2021; Nativio et al.

2020).

The expression profiles data which we used firstly was downloaded from the NCBI

(GSE173955)(Mizuno et al. 2021). Post mortal human hippocampus from 8 AD and

10 healthy subjects were processed with Illumina TruSeq stranded mRNA LT Sample

Prep kit, and then the sequences were obtained with HiSeq1500 according to the

manufacturer’s protocol(Mizuno et al. 2021). Details of differential expression of

gene level between AD and non-AD hippocampi can be found in a previous

study(Mizuno et al. 2021). Not all 978 L1000 genes appeared in the result. And

therefore, only genes that appeared in both the L1000 dataset and the differential

expression gene list were considered when comparing with drug-induced gene

expression profiles.

The expression profiles data which we used secondly was downloaded from the

NCBI(GSE159699)(Nativio et al. 2020). Post mortal human hippocampus from 10

AD and 12 non-AD control old subjects were processed with the NEBNext Ultra

Directional RNA library Prep Kit for Illumina (NEB), and then the sequences were

obtained with NextSeq 500 Platform (Illumina) according to the manufacturer’s

protocol(Nativio et al. 2020).

The DrugBank database

The DrugBank database which consists of information about 11,179 drugs and their

targets (Pham et al. 2021; Wishart et al. 2006) is a famous, comprehensive, freely

accessible database used in many cheminformatics and bioinformatics tasks(Wishart

et al. 2006). In this study, we predicted gene expression profiles for drugs in

DrugBank. And then they were used to screen potential drugs for Alzheimer’s disease

treatment.

DeepCE

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

1DrugrepositioningforAlzheimer’sdiseasewithtransferlearningYetaoWu,HanLiu,JieYan,XiaolinHuDepartmentofComputerScienceandTechnology,InstituteforArtificialIntelligence,StateKeyLaboratoryofIntelligentTechnologyandSystems,BNRist,THBI,TsinghuaUniversity,Beijing,Chinaxlhu@tsinghua.edu.cnAbstractDeepLearni...

展开>> 收起<<

1 Drug repositioning for Alzheimers disease with transfer learning.pdf

共13页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

1 Drug repositioning for Alzheimers disease with transfer learning

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: