A Methodology for the Prediction of Drug Target Interaction using CDK Descriptors Tanya Liyaqat1 Tanvir Ahmad1 and Chandni Saxena2

2025-04-30 0 0 573.12KB 12 页 10玖币
侵权投诉
A Methodology for the Prediction of Drug Target
Interaction using CDK Descriptors
Tanya Liyaqat1, Tanvir Ahmad1, and Chandni Saxena2
1Jamia Millia Islamia University, New Delhi, India
tanyaliyaqat791@gmail.com
tahmad2@jmi.ac.in
2The Chinese University of Hong Kong, Hong Kong SAR
csaxena@cse.cuhk.edu.hk
Abstract. Detecting probable Drug Target Interaction (DTI) is a critical task in
drug discovery. Conventional DTI studies are expensive, labor-intensive, and take
a lot of time, hence there are significant reasons to construct useful computa-
tional techniques that may successfully anticipate possible DTIs. Although cer-
tain methods have been developed for this cause, numerous interactions are yet
to be discovered, and prediction accuracy is still low. To meet these challenges,
we propose a DTI prediction model built on molecular structure of drugs and
sequence of target proteins. In the proposed model, we use Simplified Molecular-
Input Line-Entry System (SMILES) to create CDK descriptors, Molecular AC-
Cess System (MACCS) fingerprints, Electrotopological state (Estate) fingerprints
and amino-acid sequences of targets to get Pseudo Amino Acid Composition
(PseAAC). We target to evaluate performance of DTI prediction models using
CDK descriptors. For comparison, we use benchmark data and evaluate models’
performance on two widely used fingerprints, MACCS fingerprints and Estate fin-
gerprints. The evaluation of performances shows that CDK descriptors are supe-
rior at predicting DTIs. The proposed method also outperforms other previously
published techniques significantly.
Keywords: Drug Target Interactions· CatBoost· CDK descriptors· Molecular fin-
gerprints
1 Introduction
Drug target interaction (DTI) is a prominent task in drug discovery and research. It
entails detecting possible links among chemical compounds and protein targets which
acts as a guide in the preliminary phases of drug discovery and developmental research.
Experiments carried out in wet labs are labor intensive and require a significant amount
of money [22]. According to statistics, each novel molecular entity takes around 1.8
billion USD and the authorization of a novel drug application usually requires at least 9
years [8]. As a result, high-efficiency computational prediction techniques to investigate
drug target interactions based on Machine Learning (ML) and Deep Learning (DL) have
sparked a lot of attention in recent years [2]. The bonding of a medicine to a target’s
location resulting in the alteration of its functioning is considered as drug target inter-
action. Any chemical molecule that causes an alteration in the body’s physiology when
arXiv:2210.11482v1 [q-bio.QM] 20 Oct 2022
2 T. Liyaqat et al.
swallowed, ingested, or inhaled is referred to as a medication or medicine. On the other
hand, targets consist of elements as nucleic acids or lipids, that are intended to modify.
Ion channels, enzymes, nuclear receptors, and G-protein coupled receptors are among
the most popular biological targets. To treat illness and ailments, the medicine inhibits
the target’s function in order to prevent certain catalytic processes from occurring in the
human body. This is accomplished by preventing it from interacting with particular en-
zymes known as substrates. The drug discovery procedure that detects novel therapeutic
molecules for targets relies heavily on DTI prediction [24]. Feature-based computa-
tional techniques for DTI prediction have gained significant attention over the years.
The availability of the structural information of chemical compounds in the form of fin-
gerprints or descriptors has played an important role. However, most studies consider
fingerprints over descriptors. Hence, it becomes important to compare performance and
identify better alternative. We provide more details about feature-based techniques in
Section 2.
Considering the widely accepted ability of structure information of molecules, we
aim to evaluate the performance of CDK descriptors against two widely used finger-
prints, namely Molecular ACCess System (MACCS) and Electrotopological state (Es-
tate) fingerprints. The proposed model utilizes Pseudo amino acid composition de-
rived using amino-acid sequences of targets via iFeature webserver [4]. We use drug
Simplified Molecular-Input Line-Entry System (SMILES) to obtain CDK descriptors,
MACCS fingerprints and Estate fingerprints. The purpose here is to evaluate the impact
of employing CDK descriptors for DTI prediction. We compare models’ performance
against two frequently used fingerprints, MACCS fingerprints and Estate fingerprints on
benchmark data. This work mainly focus on extracting and feature processing, followed
by a systematic prediction methodology based on machine learning. For example, in this
case, we utilize the Categorical Boosting (CatBoost) classifier to make predictions. For
validation, we compare our proposed model to several recently proposed models. The
results reveal that the proposed DTI prediction model identifies drug-target interactions
more accurately using CDK descriptors than MACCS and Estate fingerprints.
We organize the paper as follows. Section 2 offers an overview of computation ap-
proaches to DTI predictions and highlights recent methods closely related to our work.
Section 3 provides the details about the datasets and feature encodings. Section 4 de-
scribes our proposed methodology and a brief overview of the CatBoost algorithm.
Evaluation metrics and performance results are presented in Section 5 and Section 6
respectively. Finally, we conclude the paper in Section 7.
2 Computation Approaches to DTI Predictions
In this section, we provide an overview of computation approaches to DTI predictions
and highlight some closely related work to our proposed methodology. The computa-
tional strategies for the prediction of DTIs can be broadly divided into ligand-based,
docking-based, and chemogenomic approaches [11].
Ligand-based. The rationale behind ligand-based techniques is that identical com-
pounds bind to identical biological targets and have identical features. It starts with a
single molecule or a group of chemicals known to be effective against the target and
A Methodology for the Prediction of Drug Target Interaction using CDK Descriptors 3
Fig. 1: A brief taxonomy of computational approaches to DTI prediction
it is further guided by the structure-activity relationships. However, there are certain
drawbacks to this strategy. Because the protein sequence information is not employed
in the prediction process, discovering new interactions reduces the connection across
the identified ligand and protein families [9].
Docking-based. The docking-based approach, on the other hand, uses the 3D shape
of proteins and chemical compounds to determine their possibilities of interaction [14,6].
However, specific proteins like the membrane proteins have unknown 3D structures that
make it less applicable [20].
Chemogenomic. The chemogenomic approaches use drug and protein information
together to anticipate interactions. To infer probable interactions, a shared subspace is
created by unifying the biochemical space of drugs and the genome space of targets.
The main benefit of this method is that it utilizes a significant amount of biological data
that is freely accessible from public repositories [33]. Chemogenomic approaches are
roughly divided into network-based methods and feature-based methods. Network-
based approaches integrate data like drug-drug interactions, protein-protein interac-
tions, drug-disease interactions, and drug-target interactions from multiple sources into
a single unified framework to boost DTI prediction [3,12,16,17,35]. For instance, Wan
et al. [28] devised an end-to-end technique entitled NeoDTI to combine data from omics
networks and learn topology that preserves the information of drugs and targets. Recent
years have seen a fast growth of ML models based on knowledge graphs (KG). Moham-
mad et al. [19] suggest triModel, a model based on Knowledge Graph (KG) embeddings
to derive novel DTI from the model’s scores built by learning embeddings about drugs
and targets from multi-modal heterogeneous data. On the other hand, feature-based ap-
proaches represent each drug target pair as an array of descriptors. Drugs and proteins
are transformed into corresponding descriptors based on their chemical properties. In-
tegration of individual features of drugs and targets forms the input to these approaches
as a 1D array [15,21,34].
Most researchers prefer feature-dependent computation models to predict DTIs fo-
cussed on structural information of drugs based on molecular fingerprints that are bit
strings indicating the existence of a specific substructure. For example, Han et al. [27]
摘要:

AMethodologyforthePredictionofDrugTargetInteractionusingCDKDescriptorsTanyaLiyaqat1,TanvirAhmad1,andChandniSaxena21JamiaMilliaIslamiaUniversity,NewDelhi,Indiatanyaliyaqat791@gmail.comtahmad2@jmi.ac.in2TheChineseUniversityofHongKong,HongKongSARcsaxena@cse.cuhk.edu.hkAbstract.DetectingprobableDrugTarg...

展开>> 收起<<
A Methodology for the Prediction of Drug Target Interaction using CDK Descriptors Tanya Liyaqat1 Tanvir Ahmad1 and Chandni Saxena2.pdf

共12页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:12 页 大小:573.12KB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 12
客服
关注