2
the-art (SOTA) graph kernel methods and outperform deep
learning methods for graph classification tasks [14].
1.2 Pairwise Relationships for Scientific Discovery
Accurate link prediction for applied real-world problems
have important consequence to the understanding and in-
terpretation of the complex systems that they represent.
Algorithms capable of accurately predicting missing links
enable data mining applications, accelerate network data
collection, and improve network model validation. Recent
trends have seen improved performance in link predictions
through multi-sided recommendation, model calibration,
model stacking, and/or large-scale ensembles.
In the work of Berlusconi et al., link prediction was lever-
aged to identify possible missing links in a criminal network
by considering multi-sided similarity measures of pairs of
nodes in the network and inferring them a contrario with the
assumption that putative social ties will be characterized by
opposite features [16].
Model “stacking” is an ensemble approach that learns
a meta-model that learns how to leverage the predictions
from individual component predictors [17] that differs from
conventional bagging and boosting. Unlike bagging, in
stacking, the contributing component models are typically
diverse (e.g. a variety of algorithms or (deep) learning mod-
els) and fit on the same dataset (as opposed to a sampling of
the training dataset). Unlike boosting, in stacking, a single
model is used to learn how to best combine the predictions
from the contributing models as opposed to a sequence of
models that correct the predictions of prior models.
Ghasemian et al. reported a systematic evaluation of
203 individual link predictor algorithms, representing three
popular families of methods, applied to a large corpus of 550
structurally diverse networks from six scientific domains
[18]. Excitingly, the stacked models achieved (near) optimal
levels of accuracy over synthetically generated datasets for
which the maximally achievable level of performance was
known and the stacked meta-classifiers classifiers, when
trained on real-world datasets, were consistently superior
to component models [18]. These findings demonstrate the
broad utility of stacked meta-classification methods on di-
verse problem sets.
Finally, model calibration is crucial in high-stakes scenar-
ios such as drug-target interaction (DTI) prediction where
end-users need trustworthy and interpretable decisions. In
a binary classification formulation (e.g. positive prediction
indicates a putative interaction and a negative prediction
indicates non-interaction) probability calibration is impor-
tant when the confidence in a given prediction must make
probabilistic sense. For example, if a given model predicts
a fact is true with 80% confidence, the model should be
correct 80% of the time. Adherence to this property is
evaluated by means of calibration/reliability curves which
plot the model’s mean predicted value along the x-axis and
the fraction of positives along the y-axis with the identity
function representing perfect calibration. In the work of
Tabacof and Costabello, the application of Platt scaling [19]
and isotonic regression [20] was used to calibrate knowledge
graph embedding models [21] and Wang et al. proposed
methods broadly applicable to graph neural networks [22].
Calibration methods are a form of post-processing/model
stacking for refining prediction scores, which is conceptually
similar to RP.
1.3 Reciprocal Perspective for Biomedical Discovery
The fields of bioinformatics and computational biology have
been central application domains for the study and ex-
emplification of network-based methods; these approaches
have been widely used to investigate biological systems at
various scales, whether in macroscopic ecological dynam-
ics down to microscopic and molecular interaction studies
[23]. The Reciprocal Perspective (RP) methodology was
first discovered and investigated within these domains by
reframing several applications as pairwise link prediction
problems.
Briefly, the RP framework is a cascaded classifier that
refines raw pairwise link prediction scores by considering
the context of all possible link scores involving either el-
ement of the pair. To provide a concrete example, con-
sider the task of predicting all protein-protein interactions
(PPI) among an organism’s nproteins. From all the n(n+1)
/2
possible pairs of proteins, a subset are known to interact
(positive) or not interact (negative) through experimental
validation studies. These known links are useful for training
and evaluating a PPI prediction algorithm, denoted fΘ(x).
RP begins by applying this initial predictor,fΘ(x), to infer
prediction scores between all n(n+1)
/2possible pairs (includ-
ing those that are known via a cross-validation schema).
This results in a complete Knweighted-edged graph of
predicted scores from fΘ(x)with a corresponding complete
adjacency matrix of predicted scores that we denote the
Comprehensive Prediction Matrix (CPM). A given row i
sliced from the matrix is a n×1vector of all scores between
protein iand every other protein in the proteome. Similarly,
a given column jsliced from the matrix is a 1×nvector
of all the scores between protein jand every other protein
in the proteome (including protein iin cell i, j). These
vectors, when each sorted in rank-order by monotonically
decreasing score, represent a pair of One-to-All (O2A) score
curves. The pair of O2A curves share only a single common
value representing the query pair i, j (identical in value,
but usually differing in sorted rank). These O2A curves
(which we also refer to as protein i’s perspective, and protein
j’s perspective) typically exhibit a characteristic ”S”- or ”L”-
shaped distribution with a baseline that we can attribute to
non-interacting pairs. The singular common point between
the two reciprocal perspectives enable a number of numeric
features to be computed characterizing the location of that
point in the broader context of all the inferred scores within
the two distributions and with respect to their baselines.
Intuitively, we wish to identify a pair for which their shared
score would be relatively high-scoring with respect to each
perspective’s baseline. Thus, the RP framework extracts, for
any pair of proteins i, j, a new numerical vector of O2A-
derived features that can be leveraged to subsequently train
and evaluate a cascaded predictor that refines the original
predictions. Additionally, the cascaded predictor can also
function as a means of combining multiple experts (CME)
by fusing the RP feature vectors obtained from the CPMs
generated by numerous initial predictors to function in a
stacked generalization schema.