A Methodology for the Prediction of Drug Target Interaction using CDK Descriptors Tanya Liyaqat1 Tanvir Ahmad1 and Chandni Saxena2

2025-04-30 0 0 573.12KB 12 页 10玖币

侵权投诉

A Methodology for the Prediction of Drug Target

Interaction using CDK Descriptors

Tanya Liyaqat1, Tanvir Ahmad1, and Chandni Saxena2

1Jamia Millia Islamia University, New Delhi, India

tanyaliyaqat791@gmail.com

tahmad2@jmi.ac.in

2The Chinese University of Hong Kong, Hong Kong SAR

csaxena@cse.cuhk.edu.hk

Abstract. Detecting probable Drug Target Interaction (DTI) is a critical task in

drug discovery. Conventional DTI studies are expensive, labor-intensive, and take

a lot of time, hence there are signiﬁcant reasons to construct useful computa-

tional techniques that may successfully anticipate possible DTIs. Although cer-

tain methods have been developed for this cause, numerous interactions are yet

to be discovered, and prediction accuracy is still low. To meet these challenges,

we propose a DTI prediction model built on molecular structure of drugs and

sequence of target proteins. In the proposed model, we use Simpliﬁed Molecular-

Input Line-Entry System (SMILES) to create CDK descriptors, Molecular AC-

Cess System (MACCS) ﬁngerprints, Electrotopological state (Estate) ﬁngerprints

and amino-acid sequences of targets to get Pseudo Amino Acid Composition

(PseAAC). We target to evaluate performance of DTI prediction models using

CDK descriptors. For comparison, we use benchmark data and evaluate models’

performance on two widely used ﬁngerprints, MACCS ﬁngerprints and Estate ﬁn-

gerprints. The evaluation of performances shows that CDK descriptors are supe-

rior at predicting DTIs. The proposed method also outperforms other previously

published techniques signiﬁcantly.

Keywords: Drug Target Interactions· CatBoost· CDK descriptors· Molecular ﬁn-

gerprints

1 Introduction

Drug target interaction (DTI) is a prominent task in drug discovery and research. It

entails detecting possible links among chemical compounds and protein targets which

acts as a guide in the preliminary phases of drug discovery and developmental research.

Experiments carried out in wet labs are labor intensive and require a signiﬁcant amount

of money [22]. According to statistics, each novel molecular entity takes around 1.8

billion USD and the authorization of a novel drug application usually requires at least 9

years [8]. As a result, high-efﬁciency computational prediction techniques to investigate

drug target interactions based on Machine Learning (ML) and Deep Learning (DL) have

sparked a lot of attention in recent years [2]. The bonding of a medicine to a target’s

location resulting in the alteration of its functioning is considered as drug target inter-

action. Any chemical molecule that causes an alteration in the body’s physiology when

arXiv:2210.11482v1 [q-bio.QM] 20 Oct 2022

2 T. Liyaqat et al.

swallowed, ingested, or inhaled is referred to as a medication or medicine. On the other

hand, targets consist of elements as nucleic acids or lipids, that are intended to modify.

Ion channels, enzymes, nuclear receptors, and G-protein coupled receptors are among

the most popular biological targets. To treat illness and ailments, the medicine inhibits

the target’s function in order to prevent certain catalytic processes from occurring in the

human body. This is accomplished by preventing it from interacting with particular en-

zymes known as substrates. The drug discovery procedure that detects novel therapeutic

molecules for targets relies heavily on DTI prediction [24]. Feature-based computa-

tional techniques for DTI prediction have gained signiﬁcant attention over the years.

The availability of the structural information of chemical compounds in the form of ﬁn-

gerprints or descriptors has played an important role. However, most studies consider

ﬁngerprints over descriptors. Hence, it becomes important to compare performance and

identify better alternative. We provide more details about feature-based techniques in

Section 2.

Considering the widely accepted ability of structure information of molecules, we

aim to evaluate the performance of CDK descriptors against two widely used ﬁnger-

prints, namely Molecular ACCess System (MACCS) and Electrotopological state (Es-

tate) ﬁngerprints. The proposed model utilizes Pseudo amino acid composition de-

rived using amino-acid sequences of targets via iFeature webserver [4]. We use drug

Simpliﬁed Molecular-Input Line-Entry System (SMILES) to obtain CDK descriptors,

MACCS ﬁngerprints and Estate ﬁngerprints. The purpose here is to evaluate the impact

of employing CDK descriptors for DTI prediction. We compare models’ performance

against two frequently used ﬁngerprints, MACCS ﬁngerprints and Estate ﬁngerprints on

benchmark data. This work mainly focus on extracting and feature processing, followed

by a systematic prediction methodology based on machine learning. For example, in this

case, we utilize the Categorical Boosting (CatBoost) classiﬁer to make predictions. For

validation, we compare our proposed model to several recently proposed models. The

results reveal that the proposed DTI prediction model identiﬁes drug-target interactions

more accurately using CDK descriptors than MACCS and Estate ﬁngerprints.

We organize the paper as follows. Section 2 offers an overview of computation ap-

proaches to DTI predictions and highlights recent methods closely related to our work.

Section 3 provides the details about the datasets and feature encodings. Section 4 de-

scribes our proposed methodology and a brief overview of the CatBoost algorithm.

Evaluation metrics and performance results are presented in Section 5 and Section 6

respectively. Finally, we conclude the paper in Section 7.

2 Computation Approaches to DTI Predictions

In this section, we provide an overview of computation approaches to DTI predictions

and highlight some closely related work to our proposed methodology. The computa-

tional strategies for the prediction of DTIs can be broadly divided into ligand-based,

docking-based, and chemogenomic approaches [11].

Ligand-based. The rationale behind ligand-based techniques is that identical com-

pounds bind to identical biological targets and have identical features. It starts with a

single molecule or a group of chemicals known to be effective against the target and

A Methodology for the Prediction of Drug Target Interaction using CDK Descriptors 3

Fig. 1: A brief taxonomy of computational approaches to DTI prediction

it is further guided by the structure-activity relationships. However, there are certain

drawbacks to this strategy. Because the protein sequence information is not employed

in the prediction process, discovering new interactions reduces the connection across

the identiﬁed ligand and protein families [9].

Docking-based. The docking-based approach, on the other hand, uses the 3D shape

of proteins and chemical compounds to determine their possibilities of interaction [14,6].

However, speciﬁc proteins like the membrane proteins have unknown 3D structures that

make it less applicable [20].

Chemogenomic. The chemogenomic approaches use drug and protein information

together to anticipate interactions. To infer probable interactions, a shared subspace is

created by unifying the biochemical space of drugs and the genome space of targets.

The main beneﬁt of this method is that it utilizes a signiﬁcant amount of biological data

that is freely accessible from public repositories [33]. Chemogenomic approaches are

roughly divided into network-based methods and feature-based methods. Network-

based approaches integrate data like drug-drug interactions, protein-protein interac-

tions, drug-disease interactions, and drug-target interactions from multiple sources into

a single uniﬁed framework to boost DTI prediction [3,12,16,17,35]. For instance, Wan

et al. [28] devised an end-to-end technique entitled NeoDTI to combine data from omics

networks and learn topology that preserves the information of drugs and targets. Recent

years have seen a fast growth of ML models based on knowledge graphs (KG). Moham-

mad et al. [19] suggest triModel, a model based on Knowledge Graph (KG) embeddings

to derive novel DTI from the model’s scores built by learning embeddings about drugs

and targets from multi-modal heterogeneous data. On the other hand, feature-based ap-

proaches represent each drug target pair as an array of descriptors. Drugs and proteins

are transformed into corresponding descriptors based on their chemical properties. In-

tegration of individual features of drugs and targets forms the input to these approaches

as a 1D array [15,21,34].

Most researchers prefer feature-dependent computation models to predict DTIs fo-

cussed on structural information of drugs based on molecular ﬁngerprints that are bit

strings indicating the existence of a speciﬁc substructure. For example, Han et al. [27]

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

AMethodologyforthePredictionofDrugTargetInteractionusingCDKDescriptorsTanyaLiyaqat1,TanvirAhmad1,andChandniSaxena21JamiaMilliaIslamiaUniversity,NewDelhi,Indiatanyaliyaqat791@gmail.comtahmad2@jmi.ac.in2TheChineseUniversityofHongKong,HongKongSARcsaxena@cse.cuhk.edu.hkAbstract.DetectingprobableDrugTarg...

展开>> 收起<<

A Methodology for the Prediction of Drug Target Interaction using CDK Descriptors Tanya Liyaqat1 Tanvir Ahmad1 and Chandni Saxena2.pdf

共12页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

A Methodology for the Prediction of Drug Target Interaction using CDK Descriptors Tanya Liyaqat1 Tanvir Ahmad1 and Chandni Saxena2

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: