1 Drug repositioning for Alzheimers disease with transfer learning

2025-04-30 0 0 593.48KB 13 页 10玖币
侵权投诉
1
Drug repositioning for Alzheimer’s disease
with transfer learning
Yetao Wu, Han Liu, Jie Yan, Xiaolin Hu
Department of Computer Science and Technology, Institute for Artificial
Intelligence, State Key Laboratory of Intelligent Technology and Systems, BNRist,
THBI, Tsinghua University, Beijing, China
xlhu@tsinghua.edu.cn
Abstract
Deep Learning and DRUG-seq (Digital RNA with perturbation of genes) have
attracted attention in drug discovery. However, the public DRUG-seq dataset is too
small to be used for directly training a deep learning neural network from scratch.
Inspired by the transfer learning technique, we pretrain a drug efficacy prediction
neural network model with the Library of Integrated Network-based Cell-Signature
(LINCS) L1000 data and then use human neural cell DRUG-seq data to fine-tune it.
After training, the model is used for virtual screening to find potential drugs for
Alzheimer’s disease (AD) treatment. Finally, we find 27 potential drugs for AD
treatment including Irsogladine (PDE4 inhibitor), Tasquinimod (HDAC4 selective
inhibitor), Suprofen (dual COX-1/COX-2 inhibitor) et al.
Keywords
Drug repositioning · Alzheimer’s disease · Transfer learning · Deep
learning · DRUG-seq · L1000
1 Introduction
Alzheimer’s disease is a common, complex, neurodegenerative disease(Lee and Kim
2020), which is extremely challenging for drug development. Drug repositioning is
attractive to AD drug development because toxicity, pharmacokinetics, and
pharmacodynamics profiles of a given drug are fully characterized(Kwon et al. 2019).
Recently, deep learning methods have shown great potential for drug discovery(Jang
and Cho 2019) and repositioning(Pham et al. 2022; Pham et al. 2021; Zhu et al.
2021). In particular, several research groups have predicted drug efficacy from the
L1000 dataset(Subramanian et al. 2017). It contains gene expression profiles which
includes the responses of different compound treatments (Pham et al. 2022; Pham et
al. 2021; Zhu et al. 2021).
Despite being widely used, there are three common problems encountered when
utilizing the L1000 dataset. First, most of the L1000 data is obtained from tumor cell
lines which carry many mutations. When the L1000 data is applied to predict the gene
2
expression profile of normal cells, the context may be different. Second, as the L1000
method and commercial services based on it are not prevalent, it is difficult to get
customized L1000 data for specific cell type. Third, the L1000 dataset contains many
unreliable and noisy gene expression profiles (Pham et al. 2021).
Nowadays RNA-seq methods and commercial services are very prevalent. Based on
RNA-seq methods, DRUG-seq is a reliable and cost-effective tool for comprehensive
transcriptome readout in high-throughput drug screening, which costs 2 – 4 US dollar
per sample(Ye et al. 2018). However, since DRUG-seq is a new technique, there are
no large public DRUG-seq datasets yet. It is very difficult to train a deep learning
neural network from scratch with a small public DRUG-seq dataset. Inspired by the
transfer learning technique (apply knowledge gained in one task to a related task), we
propose to utilize transfer learning for reducing the amount of training data needed
when training neural networks with the DRUG-seq dataset.
Since the DRUG-seq dataset and the L1000 dataset both consist of gene expression
profiles (Jeong et al. 2017) which cover the responses of different compound
treatments, we can take advantage of the similarity and correlation between the
DRUG-seq dataset and the L1000 dataset for transfer learning. In this study, we
pretrained a drug efficacy prediction neural network model with the L1000 data and
then used human neural cell DRUG-seq data to fine-tune it. After training, the model
was used to predict profiles for new chemicals in the DrugBank database. These
profiles were then used for virtual screening to find potential drugs for Alzheimer’s
disease treatment. To the best of our knowledge, this is the first time that the DRUG-
seq data has been used in deep learning.
2 Methods
Datasets
In the following paragraphs we introduce several datasets used in this study, including
STRING, DrugBank, L1000, DRUG-seq dataset of human neural cells, and
transcriptome data of Alzheimer’s disease patients.
The High-quality L1000 dataset
According to the previous literature(Pham et al. 2021; Qiu et al. 2020), experiments
were conducted on a selected high-quality L1000 dataset in this study. Because the
original L1000 dataset contains many unreliable and noisy gene expression profiles, a
much better prediction performance was obtained by using the selected high-quality
L1000 dataset than by using the original L1000 dataset (Pham et al. 2021). The
selected high-quality L1000 dataset consists of 1944 training samples (284
chemicals), 556 developing samples (92 chemicals) and 502 testing samples (92
chemicals) (Pham et al. 2021).
The STRING database
STRING is a database of known and predicted protein–protein interactions, including
direct and indirect associations(Pham et al. 2021; Szklarczyk et al. 2019). The human
protein–protein interaction network, which consists of approximately 12,000,000
3
interactions (edges) and 19,000 proteins (nodes), was extracted from the STRING
database to compute vector representations for 978 L1000 genes in a previous
study(Pham et al. 2021). And the drug-target vector representations used in this study
were also computed from the STRING database(Pham et al. 2021; Szklarczyk et al.
2019). The details of generating these representations were presented in previous
literature (Pham et al. 2021).
The DRUG-seq dataset
We used the DRUG-seq data of a previous study(Rodriguez et al. 2021), in which
human neural cells were treated with compounds or DMSO for 24h.The number of
samples used was 661, 75% of which was used for training. Raw sequencing data can
be found at the National Center for Biotechnology Information (NCBI) Sequence
Read Archive (SRP301436)(Rodriguez et al. 2021). Processed data can be found on
Gene Expression Omnibus (GSE164788) (Rodriguez et al. 2021).
Expression profiles (RNA-Seq)
Expression profiles from both Alzheimer’s disease patients and healthy negative
controls were obtained from two previous studies(Mizuno et al. 2021; Nativio et al.
2020).
The expression profiles data which we used firstly was downloaded from the NCBI
(GSE173955)(Mizuno et al. 2021). Post mortal human hippocampus from 8 AD and
10 healthy subjects were processed with Illumina TruSeq stranded mRNA LT Sample
Prep kit, and then the sequences were obtained with HiSeq1500 according to the
manufacturer’s protocol(Mizuno et al. 2021). Details of differential expression of
gene level between AD and non-AD hippocampi can be found in a previous
study(Mizuno et al. 2021). Not all 978 L1000 genes appeared in the result. And
therefore, only genes that appeared in both the L1000 dataset and the differential
expression gene list were considered when comparing with drug-induced gene
expression profiles.
The expression profiles data which we used secondly was downloaded from the
NCBI(GSE159699)(Nativio et al. 2020). Post mortal human hippocampus from 10
AD and 12 non-AD control old subjects were processed with the NEBNext Ultra
Directional RNA library Prep Kit for Illumina (NEB), and then the sequences were
obtained with NextSeq 500 Platform (Illumina) according to the manufacturer’s
protocol(Nativio et al. 2020).
The DrugBank database
The DrugBank database which consists of information about 11,179 drugs and their
targets (Pham et al. 2021; Wishart et al. 2006) is a famous, comprehensive, freely
accessible database used in many cheminformatics and bioinformatics tasks(Wishart
et al. 2006). In this study, we predicted gene expression profiles for drugs in
DrugBank. And then they were used to screen potential drugs for Alzheimer’s disease
treatment.
DeepCE
摘要:

1DrugrepositioningforAlzheimer’sdiseasewithtransferlearningYetaoWu,HanLiu,JieYan,XiaolinHuDepartmentofComputerScienceandTechnology,InstituteforArtificialIntelligence,StateKeyLaboratoryofIntelligentTechnologyandSystems,BNRist,THBI,TsinghuaUniversity,Beijing,Chinaxlhu@tsinghua.edu.cnAbstractDeepLearni...

收起<<
1 Drug repositioning for Alzheimers disease with transfer learning.pdf

共13页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:13 页 大小:593.48KB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 13
客服
关注