
KDD ’22, August 14–18, 2022, Washington, DC, USA Jian Kang∗, Qinghai Zhou∗, and Hanghang Tong
method can provide post-hoc uncertainty quantication. As long as
the input graph and a GCN are provided, our method can always
quantify node uncertainty without any epoch(s) of model training.
The major contributions of this paper are summarized as follows.
•Problem denition.
To our best knowledge, we provide the
rst frequentist-based analysis of GCN uncertainty and for-
mally dene the JuryGCN problem.
•Algorithm and analysis.
We propose JuryGCN to quantify
jackknife uncertainty on GCN. The key idea is to leverage a
jackknife estimator to construct a leave-one-out predictive
condence interval for each node, where the leave-one-out
predictions are estimated using the inuence functions with
respect to model parameters.
•Experimental evaluations.
We demonstrate the eective-
ness of JuryGCN through extensive experiments on real-world
graphs in active learning on node classication and semi-
supervised node classication.
2 PROBLEM DEFINITION
In this section, we rst introduce preliminary knowledge on the
Graph Convolutional Network (GCN), predictive uncertainty and
jackknife resampling. Then, we formally dene the problem of
jackknife uncertainty quantication on GCN (JuryGCN).
Unless otherwise specied, we use bold upper-case letters for
matrices (e.g.,
A
), bold lower-case letters for vectors (e.g.,
x
), calli-
graphic letters for sets (e.g.,
G
) and fraktur font for high-dimensional
tensors (
ℌ
). We use supercript
𝑇
for matrix transpose and super-
script
−1
for matrix inversion, i.e.,
A𝑇
and
A−1
are the transpose and
inverse of
A
, respectively. We use conventions similar to PyTorch
in Python for indexing. For example,
A[𝑖, 𝑗]
represents the entry of
A
at the
𝑖
-th row and
𝑗
-th column;
A[𝑖,
:
]
and
A[
:
, 𝑗]
demonstrate
the 𝑖-th row and 𝑗-th column of A, respectively.
2.1 Preliminaries
1 – Graph Convolutional Network (GCN)
Let
G={V,A,X}
denote a graph whose node set is
V
, adjacency matrix is
A
and node
feature matrix is
X
. For the
𝑙
-th hidden layer in an
𝐿
-layer GCN, we
assume
E(𝑙)
is the output node embeddings (where
E(0)=X
) and
W(𝑙)
is the weight matrix. Mathematically, the graph convolution at
the
𝑙
-th hidden layer can be represented by
E(𝑙)=𝜎(ˆ
AE(𝑙−1)W(𝑙))
where
𝜎
is the activation and
ˆ
A=˜
D−1
2(A+I)˜
D−1
2
is the renormal-
ized graph Laplacian with ˜
Dbeing the degree matrix of (A+I).
2 – Uncertainty quantication
is one of the cornerstones in
safe-critical applications. It provides accurate quantication on
how condent a mining model is towards its predictions. In general,
uncertainty can be divided into two types: aleatoric uncertainty
and epistemic uncertainty [
1
]. Aleatoric uncertainty (or data uncer-
tainty) refers to the variability in mining results due to the inherent
randomness in input data, which is irreducible due to complexity
of input data (e.g., noise); whereas epistemic uncertainty (or model
uncertainty) measures how well the mining model ts the training
data due to the lack of knowledge on the optimal model parameters,
which is reducible by increasing the size of training data.
3 – Jackknife resampling
is a classic method to estimate the bias
and variance of a population [
41
]. It often relies on a jackknife
estimator which is built by leaving out an observation from the
entire population (i.e., leave-one-out) and evaluating the error of
the model re-trained on the held-out population. Suppose we have
(1) a set of
𝑛
data points
D={(x𝑖, 𝑦𝑖)|𝑖=
1
, . . . , 𝑛}
, (2) a test
point
(xtest, 𝑦test)
, (3) a mining model
𝑓𝜃()
parameterized by
𝜃
(e.g.,
a neural network) where
𝑓𝜃(x)
is the prediction of input feature
x
and (4) a target coverage level
(
1
−𝛼)
such that the label
𝑦
is
covered by the predictive condence interval with probability
(
1
−
𝛼)
. Mathematically, the condence interval constructed by the naive
jackknife [
13
] is upper bounded by
C+(xtest)=𝑄1−𝛼(R+)
and
lower bounded by
C−(xtest)=𝑄𝛼(R−)
, where
𝑄𝛼
nds the
𝛼
quantile of a set and
R𝛾={𝑓𝜃(xtest) + 𝛾· |𝑦test −𝑓𝜃−𝑖(xtest)||𝑖=
1
, . . . , 𝑛}
for
𝛾∈ {−,+}
and
|𝑦test −𝑓𝜃−𝑖(xtest)|
is the error residual of
the re-trained model on the dataset
D\{(x𝑖, 𝑦𝑖)}
(i.e., parameterized
by
𝜃−𝑖
).
1
Hence,
R+
and
R−
represent the sets of upper and lower
uncertainty bound on the original model prediction (i.e.,
𝑓𝜃(xtest)
).
Furthermore, jackknife+ [
3
] constructs the predictive condence
interval for exchangeable data as
C+(xtest)=𝑄1−𝛼(P+)C−(xtest)=𝑄𝛼(P−)(1)
where
P𝛾
for
𝛾∈ {−,+}
is dened as
P𝛾={𝑓𝜃−𝑖(xtest) + 𝛾· |𝑦𝑖−
𝑓𝜃−𝑖(x𝑖)||𝑖=
1
, . . . , 𝑛}
. Similarly,
P−
and
P+
represent the sets
of the lower and upper uncertainty bound of the leave-one-out
prediction
𝑓𝜃−𝑖(xtest)
, respectively. With the assumption on data
exchangeability, it yields a (1−2𝛼)coverage rate theoretically.
2.2 Problem Denition
Existing works on deterministic uncertainty quantication on a
graph neural network (GNN) mainly rely on changing the training
procedures of a vanilla GNN (i.e., graph neural network without
consideration of uncertainty) [
40
,
49
]. As such, given a well-trained
GNN, it requires epoch(s) of re-training to quantify its uncertainty.
Nevertheless, it would cost a lot of computational resources to re-
train it, especially when the model has already been deployed in
an operational environment. Additionally, it remains a challenging
problem to further comprehend the predictive results of GNN from
the perspective of uncertainty, and to answer the following question:
to what extent the GNN model is condent of the current prediction?
Therefore, it is essential to investigate the uncertainty quantication
in a post-hoc manner, i.e., quantifying uncertainty without further
(re-)training on the model.
Regarding post-hoc uncertainty quantication for IID (i.e., non-
graph) data, Alaa and van der Schaar [
2
] propose a frequentist-
based method inspired by jackknife resampling. It uses high-order
inuence functions to quantify the impact of a data point on the
underlying neural network. Given that the parameters of a neural
network are derived by learning with data points, high-order inu-
ence functions are capable of understanding how much a data point
will aect the model parameters. Then, the change in model param-
eters can be used to infer the uncertainty of the corresponding data
point on the neural network by a jackknife estimator [
3
]. Under
mild assumption (such as the algorithmic stability assumption and
the IID/exchangebility of data), the naive jackknife estimator [
13
]
and its variants [
3
] bear strong theoretical guarantee in terms of
the coverage such that the condence interval will cover the true
model parameters with a high probability.
Building upon the jackknife resampling [
34
] and the general prin-
ciple outlined in [
2
], we seek to bridge the gap between frequentist-
based uncertainty quantication and graph neural networks. To
1
We use
𝛾
to represent the symbol before the leave-one-out error, i.e.,
𝑓𝜃(xtest) −
|𝑦test −𝑓𝜃−𝑖(xtest) | when 𝛾=−, or 𝑓𝜃(xtest) + |𝑦test −𝑓𝜃−𝑖(xtest) | otherwise.