self-interpretable models are being developed, such as
self-explainable neural networks (SENN) [2] and Proto-
type Graph Neural Network (ProtGNN) [46]. Only the
latter can be applied to the graph prediction problem.
However, ProtGNN is designed for classification
problems only since it requires a fixed assignment of
prototypes to the classes. While for a regression prob-
lem, the model predicts a single label making such an
assignment impossible. To overcome the limited ap-
plicability of ProtGNN, we introduce the Prototypical
Graph Regression Soft Trees (ProGReST) model that is
suitable for a graph regression problem, common in the
molecular property prediction [41]. It employs prototyp-
ical parts (in the paper, we use the terms ”prototypical
parts” and ”prototypes” interchangeably.) [6] that pre-
serve information about activation patterns and ensure
intrinsic interpretability (see Fig. 1). Prototypes are
derived from the training examples and used to explain
the model’s decision. To build a model with prototypes,
we use Soft Neural Trees [8].
Hence the regression task is more challenging than
the classification, it also requires more training epochs
for a model to converge. And, prototypical-part-based
methods use projection operation periodically [6, 46] to
enforce the closeness of prototypes to the training data.
In ProtGNN, projection is based on an MCTS algorithm
that requires lots of computational time to find mean-
ingful prototypes. In ProGReST, we propose proxy pro-
jection to reduce the training time and perform MCTS-
based at the end to ensure the full interpretability of
the derived prototypes.
The ProGReST achieves state-of-the-art results on
five cheminformatics datasets for molecular property
prediction and provides intuitive explanations of its
prediction in the form of a tree. Also, we confronted
the findings of the ProGReST with chemists to validate
the usability of our model.
Our contributions can be summarized as follows:
•we introduce ProGReST, a self-explainable
prototype-based model for regression of molecular
properties,
•we employ a tree-based model to derive meaningful
prototypes,
•we define a novel proxy projection function that
substantially accelerates the training process.
2 Related Works
2.1 Molecular property prediction The accurate
prediction of molecular properties is critical in chemical
modeling. In machine learning, chemical compounds
can be described using calculated molecular descrip-
tors, which are computed as a function of the compound
structure [37]. Many successful applications of machine
learning in drug discovery utilize chemical structures di-
rectly by employing molecular fingerprints [5] or molec-
ular graphs as an input to the model [9].
Currently, molecular graphs are a preferable rep-
resentation in cheminformatics because they can cap-
ture nonlinear structure of the data. In a molecular
graph, atoms are represented as nodes, and the chemi-
cal bonds are graph edges. Each atom is attributed with
atomic features that encode chemical symbols of the
atom and other relevant features [32]. This graphical
representation can be processed by graph neural net-
works that learn the molecule-level vector representa-
tion of the compound and use it for property prediction.
Graph neural networks usually implement the message
passing scheme [11], in which information is passed be-
tween nodes along the edges, and the atom features are
updated [45]. However, more recent architectures focus
on modeling long-range dependencies between atoms,
e.g. by implementing graph transformers [26].
2.2 Interpretability of deep learning Methods
explaining deep learning models can be divided into the
post-hoc and interpretable [34]. The first one creates
explainer that reveals the reasoning process of a black
box model. Post-hoc methods include: a saliency
map [3] that highlights crucial input parts. Another
one is Concept Activation Vectors (CAV), that uses
concepts to explain the neural network predictions [17].
Other methods analyze the output of the model on the
perturbation of the input [33] or determine contribution
of a given feature to a prediction [44]. Implementation
of post hoc methods is straightforward since there is
no intervention into its architecture. However, they
can produce biased and unreliable explanations [1].
That is why more focus is recently on designing self-
explainable models [2] to make the decision process
directly visible. Recently, a widely used self-explainable
model introduced in [6] (ProtoPNet) has a hidden layer
of prototypes representing the activation patterns.
Many of the works extended the ProtoPNet, such
as TesNet [38] employing Grassman manifold to find
prototypes. Also, methods like ProtoPShare [35], Pro-
toPool [36] and ProtoTree [29] reduce the number of
used prototypes. Lastly, those solutions are widely
adopted in various fields such as medical imaging [18]
and graph classification [46]. Yet, none of these do not
consider regression.
3 ProGReST
3.1 Architecture The architecture of ProGReST,
depicted in Fig. 2, consists of a graph representation
Copyright ©2023 by SIAM
Unauthorized reproduction of this article is prohibited