2 T. Liyaqat et al.
swallowed, ingested, or inhaled is referred to as a medication or medicine. On the other
hand, targets consist of elements as nucleic acids or lipids, that are intended to modify.
Ion channels, enzymes, nuclear receptors, and G-protein coupled receptors are among
the most popular biological targets. To treat illness and ailments, the medicine inhibits
the target’s function in order to prevent certain catalytic processes from occurring in the
human body. This is accomplished by preventing it from interacting with particular en-
zymes known as substrates. The drug discovery procedure that detects novel therapeutic
molecules for targets relies heavily on DTI prediction [24]. Feature-based computa-
tional techniques for DTI prediction have gained significant attention over the years.
The availability of the structural information of chemical compounds in the form of fin-
gerprints or descriptors has played an important role. However, most studies consider
fingerprints over descriptors. Hence, it becomes important to compare performance and
identify better alternative. We provide more details about feature-based techniques in
Section 2.
Considering the widely accepted ability of structure information of molecules, we
aim to evaluate the performance of CDK descriptors against two widely used finger-
prints, namely Molecular ACCess System (MACCS) and Electrotopological state (Es-
tate) fingerprints. The proposed model utilizes Pseudo amino acid composition de-
rived using amino-acid sequences of targets via iFeature webserver [4]. We use drug
Simplified Molecular-Input Line-Entry System (SMILES) to obtain CDK descriptors,
MACCS fingerprints and Estate fingerprints. The purpose here is to evaluate the impact
of employing CDK descriptors for DTI prediction. We compare models’ performance
against two frequently used fingerprints, MACCS fingerprints and Estate fingerprints on
benchmark data. This work mainly focus on extracting and feature processing, followed
by a systematic prediction methodology based on machine learning. For example, in this
case, we utilize the Categorical Boosting (CatBoost) classifier to make predictions. For
validation, we compare our proposed model to several recently proposed models. The
results reveal that the proposed DTI prediction model identifies drug-target interactions
more accurately using CDK descriptors than MACCS and Estate fingerprints.
We organize the paper as follows. Section 2 offers an overview of computation ap-
proaches to DTI predictions and highlights recent methods closely related to our work.
Section 3 provides the details about the datasets and feature encodings. Section 4 de-
scribes our proposed methodology and a brief overview of the CatBoost algorithm.
Evaluation metrics and performance results are presented in Section 5 and Section 6
respectively. Finally, we conclude the paper in Section 7.
2 Computation Approaches to DTI Predictions
In this section, we provide an overview of computation approaches to DTI predictions
and highlight some closely related work to our proposed methodology. The computa-
tional strategies for the prediction of DTIs can be broadly divided into ligand-based,
docking-based, and chemogenomic approaches [11].
Ligand-based. The rationale behind ligand-based techniques is that identical com-
pounds bind to identical biological targets and have identical features. It starts with a
single molecule or a group of chemicals known to be effective against the target and