
Our contribution.
In this work, we focus on calibrating GNNs for the node classification task
[
14
,
40
]. First, we aim at understanding the specific challenges posed by GNNs by conducting
a systematic study on the calibration qualities of GNN node predictions. Our study reveals five
factors that influence the calibration performance of GNNs: general under-confident tendency,
diversity of nodewise predictive distributions, distance to training nodes, relative confidence level,
and neighborhood similarity. Second, we develop Graph Attention Temperature Scaling (GATS)
approach, which is designed in a way that accounts for the aforementioned influential factors. GATS
generates nodewise temperatures that calibrate GNN predictions based on the graph topology. Third,
we conduct a series of GNN calibration experiments and empirically verify the effectiveness of GATS
in terms of calibration, data-efficiency, and expressivity.
2 Related work
For standard multi-class classification tasks, a variety of post-hoc calibration methods have been
proposed in order to make neural networks uncertainty aware: temperature scaling (TS) [
7
], ensemble
temperature scaling (ETS) [
43
], multi-class isotonic regression (IRM) [
43
], Dirichlet calibration [
19
],
spline calibration [
8
], etc. Additionally, calibration has been formulated for regression tasks [
17
].
More generally, instead of transforming logits after training a classifier, a plethora of methods exists
that modify either the model architecture or the training process itself. This includes methods that are
based on Bayesian paradigm [
12
,
1
,
6
,
22
,
42
], evidential theory [
33
], adversarial calibration [
37
] and
model ensembling [
20
]. One common caveat of these methods is the trade-off between accuracy and
calibration, which oftentimes do not go hand in hand. Post-hoc methods like temperature scaling, on
the other hand, are accuracy preserving. They ensure that the per node logit rankings are unaltered.
Calibration of GNNs is currently a substantially less explored topic. Nodewise post-hoc calibration
on GNNs using methods developed for the multi-class setting has been empirically evaluated by
Teixeira et al.
[36]
. They show that these methods, which perform uniform calibration of nodewise
predictions, are unable to produce calibrated predictions for some harder tasks. Wang et al.
[41]
observe that GNNs tend to be under-confident in contrast to the majority of multi-class classifiers,
which are generally overconfident [
7
]. Based on their findings, Wang et al.
[41]
propose the CaGCN
approach, which attaches a GCN on top of the backbone GNN for calibration. Some approaches
improve the uncertainty estimation of GNNs by adjusting model training. This includes Bayesian
learning approaches [45, 10] and methods based on the evidential theory [46, 35].
3 Problem setup for GNN calibration
We consider the problem of calibrating GNNs for node classification tasks: given a graph
G=
(V,E)
, the training data consist of nodewise input features
{xi}i∈V ∈ X
and ground-truth labels
{yi}i∈L ∈ Y ={1, . . . , K}
for a subset
L⊂V
of nodes, and the goal is to predict the labels
{yi}i∈U ∈ Y
for the rest of the nodes
U=V \ L
. A graph neural network tackles the problem
by producing nodewise probabilistic forecasts
ˆpi
. These forecasts yield the corresponding label
predictions
ˆyi:= argmaxyˆpi(y)
and confidences
ˆci:= maxyˆpi(y)
. The GNN is calibrated when
its probabilistic forecasts are reliable, e.g., for predictions with confidence
0.8
, they should be correct
80% of the time. Formally, a GNN is perfectly calibrated [41] if
∀c∈[0,1],P(yi= ˆyi|ˆci=c) = c. (1)
In practice, we quantify the calibration quality with the expected calibration error (ECE) [
27
,
7
]. We
follow the commonly used definition from Guo et al.
[7]
which uses a equal width binning scheme to
estimate calibration error for any node subset
N ⊂ V
: the predictions are regrouped according to
M
equally spaced confidence intervals, i.e.
(B1, . . . , BM)
with
Bm={j∈ N | m−1
M<ˆcj≤m
M}
, and
the expected calibration error of the GNN forecasts is defined as
ECE =
M
X
m=1
|Bm|
|N |
acc(Bm)−conf(Bm)
,with (2)
acc(Bm) = 1
|Bm|X
i∈Bm
1(yi= ˆyi)and conf(Bm) = 1
|Bm|X
i∈Bm
ˆci.(3)
2