JuryGCN Quantifying Jackknife Uncertainty on Graph Convolutional Networks

2025-04-27 0 0 924.15KB 11 页 10玖币
侵权投诉
JuryGCN: antifying Jackknife Uncertainty on Graph
Convolutional Networks
Jian Kang, Qinghai Zhou, and Hanghang Tong
University of Illinois at Urbana-Champaign, {jiank2, qinghai2, htong}@illinois.edu;
ABSTRACT
Graph Convolutional Network (GCN) has exhibited strong empiri-
cal performance in many real-world applications. The vast majority
of existing works on GCN primarily focus on the accuracy while
ignoring how condent or uncertain a GCN is with respect to its
predictions. Despite being a cornerstone of trustworthy graph min-
ing, uncertainty quantication on GCN has not been well studied
and the scarce existing eorts either fail to provide deterministic
quantication or have to change the training procedure of GCN by
introducing additional parameters or architectures. In this paper,
we propose the rst frequentist-based approach named JuryGCN in
quantifying the uncertainty of GCN, where the key idea is to quan-
tify the uncertainty of a node as the width of condence interval
by a jackknife estimator. Moreover, we leverage the inuence func-
tions to estimate the change in GCN parameters without re-training
to scale up the computation. The proposed JuryGCN is capable of
quantifying uncertainty deterministically without modifying the
GCN architecture or introducing additional parameters. We per-
form extensive experimental evaluation on real-world datasets in
the tasks of both active learning and semi-supervised node classi-
cation, which demonstrate the ecacy of the proposed method.
CCS CONCEPTS
Information systems Data mining.
KEYWORDS
Graph neural networks, uncertainty quantication, jackknife
ACM Reference Format:
Jian Kang
, Qinghai Zhou
, and Hanghang Tong. 2022. JuryGCN: Quantify-
ing Jackknife Uncertainty on Graph Convolutional Networks. In Proceedings
of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Min-
ing (KDD ’22), August 14–18, 2022, Washington, DC, USA. ACM, New York,
NY, USA, 11 pages. https://doi.org/10.1145/3534678.3539286
1 INTRODUCTION
Graph Convolutional Network (GCN) has become a prevalent learn-
ing paradigm in many real-world applications, including nancial
fraud detection [
45
], drug discovery [
19
] and trac prediction [
7
].
To date, the vast majority of existing works do not take into ac-
count the uncertainty of a GCN regarding its prediction, which is
* Equal contribution.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from permissions@acm.org.
KDD ’22, August 14–18, 2022, Washington, DC, USA
©2022 Association for Computing Machinery.
ACM ISBN 978-1-4503-9385-0/22/08. . . $15.00
https://doi.org/10.1145/3534678.3539286
alarming especially in high-stake scenarios. For example, in auto-
mated nancial fraud detection, it is vital to let expert banker to
take controls if a GCN-based detector is highly uncertain about
its predictions in order to prevent wrong decisions on suspending
banking account(s).
A well established study on uncertainty quantication of GCN
could bring several crucial benets. First, it is a cornerstone in
trustworthy graph mining. Uncertainty quantication aims to un-
derstand to what extent the model is likely to be incorrect, and thus
provides natural remedy to questions like how uncertain is a GCN
in its own predictions? Second, an accurate quantication of GCN
uncertainty could potentially answer how to improve GCN predic-
tions by leveraging its uncertainty in many graph mining tasks. For
example, in active learning on graphs, nodes with high uncertainty
could be selected as the most valuable node to query the oracle;
in node classication, the node uncertainty could help calibrate
the condence of GCN predictions, thereby improving the overall
classication accuracy.
Important as it could be, very few studies on uncertainty quan-
tication of GCN exist, which mainly focuses on two dierent direc-
tions: Bayesian-based approaches and deterministic quantication-
based approaches. Regarding Bayesian-based approaches [
21
,
48
],
they either drops edge(s) with certain sampling strategies or lever-
ages random graph model (e.g., stochastic block model) to assign
edge probabilities for training. However, these models fall short in
explicitly quantifying the uncertainty on model predictions. An-
other type of methods, i.e., deterministic quantication-based ap-
proaches [
30
,
40
,
49
], directly quanties uncertainty by parame-
terizing a Dirichlet distribution as prior to estimate the posterior
distribution under a Bayesian framework. Nevertheless, it changes
the training procedures of a graph neural network by introducing
additional parameters (e.g., parameters for Dirichlet distribution) or
additional architectures (e.g., teacher network) in order to precisely
estimate the uncertainty.
To address the aforementioned limitations, we provide the rst
study on frequentist-based analysis of the GCN uncertainty, which
we term as the JuryGCN problem. Building upon the general prin-
ciple of jackknife (leave-one-out) resampling [
34
], the jackknife
uncertainty of a node is dened as the width of condence in-
terval constructed by a jackknife estimator when leaving the cor-
responding node out. In order to estimate the GCN parameters
without exhaustively re-training GCN, we leverage inuence func-
tions [
28
] to quantify the change in GCN parameters by innites-
imally upweighting the loss of a training node. Compared with
existing works, our method brings several advantages. First, our
method provides deterministic uncertainty quantication, which
is not available in Bayesian-based approaches [
21
,
48
]. Second,
dierent from existing works on deterministic uncertainty quanti-
cation [
30
,
40
,
49
], our method does not introduce any additional
parameters or components in the GCN architecture. Third, our
arXiv:2210.05959v1 [cs.LG] 12 Oct 2022
KDD ’22, August 14–18, 2022, Washington, DC, USA Jian Kang, Qinghai Zhou, and Hanghang Tong
method can provide post-hoc uncertainty quantication. As long as
the input graph and a GCN are provided, our method can always
quantify node uncertainty without any epoch(s) of model training.
The major contributions of this paper are summarized as follows.
Problem denition.
To our best knowledge, we provide the
rst frequentist-based analysis of GCN uncertainty and for-
mally dene the JuryGCN problem.
Algorithm and analysis.
We propose JuryGCN to quantify
jackknife uncertainty on GCN. The key idea is to leverage a
jackknife estimator to construct a leave-one-out predictive
condence interval for each node, where the leave-one-out
predictions are estimated using the inuence functions with
respect to model parameters.
Experimental evaluations.
We demonstrate the eective-
ness of JuryGCN through extensive experiments on real-world
graphs in active learning on node classication and semi-
supervised node classication.
2 PROBLEM DEFINITION
In this section, we rst introduce preliminary knowledge on the
Graph Convolutional Network (GCN), predictive uncertainty and
jackknife resampling. Then, we formally dene the problem of
jackknife uncertainty quantication on GCN (JuryGCN).
Unless otherwise specied, we use bold upper-case letters for
matrices (e.g.,
A
), bold lower-case letters for vectors (e.g.,
x
), calli-
graphic letters for sets (e.g.,
G
) and fraktur font for high-dimensional
tensors (
). We use supercript
𝑇
for matrix transpose and super-
script
1
for matrix inversion, i.e.,
A𝑇
and
A1
are the transpose and
inverse of
A
, respectively. We use conventions similar to PyTorch
in Python for indexing. For example,
A[𝑖, 𝑗]
represents the entry of
A
at the
𝑖
-th row and
𝑗
-th column;
A[𝑖,
:
]
and
A[
:
, 𝑗]
demonstrate
the 𝑖-th row and 𝑗-th column of A, respectively.
2.1 Preliminaries
1 – Graph Convolutional Network (GCN)
Let
G={V,A,X}
denote a graph whose node set is
V
, adjacency matrix is
A
and node
feature matrix is
X
. For the
𝑙
-th hidden layer in an
𝐿
-layer GCN, we
assume
E(𝑙)
is the output node embeddings (where
E(0)=X
) and
W(𝑙)
is the weight matrix. Mathematically, the graph convolution at
the
𝑙
-th hidden layer can be represented by
E(𝑙)=𝜎(ˆ
AE(𝑙1)W(𝑙))
where
𝜎
is the activation and
ˆ
A=˜
D1
2(A+I)˜
D1
2
is the renormal-
ized graph Laplacian with ˜
Dbeing the degree matrix of (A+I).
2 – Uncertainty quantication
is one of the cornerstones in
safe-critical applications. It provides accurate quantication on
how condent a mining model is towards its predictions. In general,
uncertainty can be divided into two types: aleatoric uncertainty
and epistemic uncertainty [
1
]. Aleatoric uncertainty (or data uncer-
tainty) refers to the variability in mining results due to the inherent
randomness in input data, which is irreducible due to complexity
of input data (e.g., noise); whereas epistemic uncertainty (or model
uncertainty) measures how well the mining model ts the training
data due to the lack of knowledge on the optimal model parameters,
which is reducible by increasing the size of training data.
3 – Jackknife resampling
is a classic method to estimate the bias
and variance of a population [
41
]. It often relies on a jackknife
estimator which is built by leaving out an observation from the
entire population (i.e., leave-one-out) and evaluating the error of
the model re-trained on the held-out population. Suppose we have
(1) a set of
𝑛
data points
D={(x𝑖, 𝑦𝑖)|𝑖=
1
, . . . , 𝑛}
, (2) a test
point
(xtest, 𝑦test)
, (3) a mining model
𝑓𝜃()
parameterized by
𝜃
(e.g.,
a neural network) where
𝑓𝜃(x)
is the prediction of input feature
x
and (4) a target coverage level
(
1
𝛼)
such that the label
𝑦
is
covered by the predictive condence interval with probability
(
1
𝛼)
. Mathematically, the condence interval constructed by the naive
jackknife [
13
] is upper bounded by
C+(xtest)=𝑄1𝛼(R+)
and
lower bounded by
C(xtest)=𝑄𝛼(R)
, where
𝑄𝛼
nds the
𝛼
quantile of a set and
R𝛾={𝑓𝜃(xtest) + 𝛾· |𝑦test 𝑓𝜃𝑖(xtest)||𝑖=
1
, . . . , 𝑛}
for
𝛾∈ {−,+}
and
|𝑦test 𝑓𝜃𝑖(xtest)|
is the error residual of
the re-trained model on the dataset
D\{(x𝑖, 𝑦𝑖)}
(i.e., parameterized
by
𝜃𝑖
).
1
Hence,
R+
and
R
represent the sets of upper and lower
uncertainty bound on the original model prediction (i.e.,
𝑓𝜃(xtest)
).
Furthermore, jackknife+ [
3
] constructs the predictive condence
interval for exchangeable data as
C+(xtest)=𝑄1𝛼(P+)C(xtest)=𝑄𝛼(P)(1)
where
P𝛾
for
𝛾∈ {−,+}
is dened as
P𝛾={𝑓𝜃𝑖(xtest) + 𝛾· |𝑦𝑖
𝑓𝜃𝑖(x𝑖)||𝑖=
1
, . . . , 𝑛}
. Similarly,
P
and
P+
represent the sets
of the lower and upper uncertainty bound of the leave-one-out
prediction
𝑓𝜃𝑖(xtest)
, respectively. With the assumption on data
exchangeability, it yields a (12𝛼)coverage rate theoretically.
2.2 Problem Denition
Existing works on deterministic uncertainty quantication on a
graph neural network (GNN) mainly rely on changing the training
procedures of a vanilla GNN (i.e., graph neural network without
consideration of uncertainty) [
40
,
49
]. As such, given a well-trained
GNN, it requires epoch(s) of re-training to quantify its uncertainty.
Nevertheless, it would cost a lot of computational resources to re-
train it, especially when the model has already been deployed in
an operational environment. Additionally, it remains a challenging
problem to further comprehend the predictive results of GNN from
the perspective of uncertainty, and to answer the following question:
to what extent the GNN model is condent of the current prediction?
Therefore, it is essential to investigate the uncertainty quantication
in a post-hoc manner, i.e., quantifying uncertainty without further
(re-)training on the model.
Regarding post-hoc uncertainty quantication for IID (i.e., non-
graph) data, Alaa and van der Schaar [
2
] propose a frequentist-
based method inspired by jackknife resampling. It uses high-order
inuence functions to quantify the impact of a data point on the
underlying neural network. Given that the parameters of a neural
network are derived by learning with data points, high-order inu-
ence functions are capable of understanding how much a data point
will aect the model parameters. Then, the change in model param-
eters can be used to infer the uncertainty of the corresponding data
point on the neural network by a jackknife estimator [
3
]. Under
mild assumption (such as the algorithmic stability assumption and
the IID/exchangebility of data), the naive jackknife estimator [
13
]
and its variants [
3
] bear strong theoretical guarantee in terms of
the coverage such that the condence interval will cover the true
model parameters with a high probability.
Building upon the jackknife resampling [
34
] and the general prin-
ciple outlined in [
2
], we seek to bridge the gap between frequentist-
based uncertainty quantication and graph neural networks. To
1
We use
𝛾
to represent the symbol before the leave-one-out error, i.e.,
𝑓𝜃(xtest) −
|𝑦test 𝑓𝜃𝑖(xtest) | when 𝛾=, or 𝑓𝜃(xtest) + |𝑦test 𝑓𝜃𝑖(xtest) | otherwise.
JuryGCN: antifying Jackknife Uncertainty on Graph Convolutional Networks KDD ’22, August 14–18, 2022, Washington, DC, USA
be specic, given an input graph and a GCN, we aim to estimate
the uncertainty of a node as the impact of leaving out its loss when
computing the overall loss. Formally, we dene the problem of
jackknife uncertainty quantication on GCN, which is referred to
as JuryGCN problem.
Problem 1. JuryGCN:
J
ackknife
U
nce
r
taint
y
Quantication on
Graph Convolutional Network
Given:
(1) An undirected graph
G={V,A,X}
; (2) an
𝐿
-layer
GCN with the set of weights
Θ
; (3) a task-specic loss function
𝑅(G,Y,Θ)where Yis the set of node labels.
Find:
An uncertainty score
UΘ(𝑢)
for any node
𝑢
in graph
G
with respect to the GCN parameters
Θ
and the task-specic loss
function 𝑅(G,Y,Θ).
3JURYGCN: MEASURE AND COMPUTATION
In this section, we start by discussing the general strategy of quan-
tifying jackknife uncertainty with inuence functions, and formally
dene the node jackknife uncertainty. After that, we present math-
ematical analysis on the inuence function computation for GCN.
3.1 Jackknife Uncertainty of GCN
In this paper, we consider an
𝐿
-layer GCN with ReLU activation for
node-level tasks (e.g., node classication). We further assume that
the nodes of the input graph Gare exchangeable data.2
We rst observe that the loss function of many node-level tasks
can often be decomposed into a set of node-specic subproblems.
Mathematically, it can be written as
Θ=argmin
Θ
𝑅(G,Y
train,Θ)=argmin
Θ
1
|Vtrain|
𝑣Vtrain
𝑟(𝑣, y𝑣,Θ)
(2)
where
Vtrain ⊆ V
is the set of training nodes,
|Vtrain|
is the num-
ber of training nodes,
Y
train ={y𝑣|𝑣∈ Vtrain}
is the set of ground-
truth training labels and
y𝑣
is the label of node
𝑣
. In this case,
the overall loss function
𝑅(G,Y
train,Θ)
is decomposed into several
subproblems, each of which minimizes the node-specic loss func-
tion
𝑟(𝑣, y𝑣,Θ)
for a node
𝑣
. An example is the cross entropy with
node-specic loss as
𝑟(𝑣, y𝑣,Θ)=Í𝑐
𝑖=1y𝑣[𝑐]log GCN(𝑣, Θ) [𝑐]
,
where
𝑐
is the number of classes,
GCN(𝑣, Θ)
is the vector of pre-
dicted probabilities for each class using the GCN with parameters
Θ
. If we upweight the importance of optimizing the loss of a node
𝑖
with some small constant 𝜖, we have the following loss function.
Θ
𝜖,𝑖 =argmin
Θ
𝜖𝑟 (𝑖, y𝑖,Θ) + 1
|Vtrain|
𝑣Vtrain
𝑟(𝑣, y𝑣,Θ)(3)
Inuence function is a powerful approach to evaluate the depen-
dence of the estimator on the value of the data examples [
28
,
52
]. In
order to obtain
Θ
𝜖,𝑖
without re-training the GCN, we leverage the
inuence function [
28
], which is essentially the Taylor expansion
over the model parameters.
Θ
𝜖,𝑖 Θ+𝜖IΘ(𝑖)(4)
where
IΘ(𝑖)=𝑑Θ
𝜖,𝑖
𝑑𝜖 |𝜖=0
is the inuence function with respect to
node
𝑖
. The inuence function
IΘ(𝑖)
can be further computed using
the classical result in [9] as
IΘ(𝑖)=H1
ΘΘ𝑟(𝑖, y𝑖,Θ)(5)
2
A sequence of random variables is exchangeable if and only if the joint distribu-
tion of the random variables remains unchanged regardless of their ordering in the
sequence [
44
]. This assumption is commonly used in random graph models, e.g.,
Erdős-Rényi model [14], stochastic block model [22].
where
HΘ=1
|Vtrain |2
Θ𝑅(G,Y
train,Θ)
is the Hessian matrix with
respect to model parameters
Θ
. For a training node
𝑖
, by setting
𝜖=1
|Vtrain |
, Eq.
(4)
eciently estimates the leave-one-out (LOO)
parameters
Θ
𝜖,𝑖
if leaving out the loss of node
𝑖
. After that, by simply
switching the original model parameters to
Θ
𝜖,𝑖
, we estimate the
leave-one-out error err𝑖of node 𝑖as follows.
err𝑖=y𝑖GCN(𝑖, Θ
𝜖,𝑖 )2(6)
where
GCN(𝑢, Θ
𝜖,𝑖 )
represents the output of node
𝑢
using the GCN
with leave-one-out parameters Θ
𝜖,𝑖 .
With Eq.
(6)
, we use jackknife+ [
3
], which requires the data to be
exchangeable instead of IID, to construct the condence interval of
node
𝑢
. Mathematically, the lower bound
C
Θ(𝑢)
and upper bound
C+
Θ(𝑢)of the predictive condence interval of node 𝑢are
C
Θ(𝑢)=𝑄𝛼({GCN(𝑢, Θ
𝜖,𝑖 )2err𝑖|𝑖∈ Vtrain \ {𝑢}})
C+
Θ(𝑢)=𝑄1𝛼({GCN(𝑢, Θ
𝜖,𝑖 )2+err𝑖|𝑖∈ Vtrain \ {𝑢}})
(7)
where
𝑄𝛼
and
𝑄1𝛼
are the
𝛼
and
(
1
𝛼)
quantile of a set. Since
a wide condence interval of node
𝑢
means that the model is less
condent with respect to node
𝑢
, it implies that node
𝑢
has high
uncertainty. Following this intuition, the uncertainty of node
𝑢
can
be naturally quantied by the width of the corresponding con-
dence interval (Eq.
(7)
). Since the uncertainty is quantied using the
condence interval constructed by a jackknife estimator, we term it
as jackknife uncertainty which is formally dened in Denition 1.
Definition 1. (Node Jackknife Uncertainty) Given an input graph
G
with node set
V
, a set of training nodes
Vtrain ⊆ V
and an
𝐿
-layer
GCN with parameters
Θ
,
𝑖∈ Vtrain
and
𝑢∈ V
, we assume (1)
the nodes are exchangeable, and denote that (2) the LOO parameters
are
Θ𝜖,𝑖
, (3) the error is dened as Eq.
(6)
and (4) the lower bound
C
Θ(𝑢)and the upper bound C+
Θ(𝑢)of predictive condence interval
are dened as Eq. (7), the jackknife uncertainty of node 𝑢is
UΘ(𝑢)=C+
Θ(𝑢) C
Θ(𝑢)(8)
We note that Alaa and van Der Schaar [
2
] leverage high-order
inuence functions to quantify the jackknife uncertainty for IID
data. Though Eq.
(4)
shares the same form as in [
2
] when the order
is up to 1, our work bears three subtle dierences. First, [
2
] views
the model parameters as statistical functionals of data distribution
and exploits von Mises expansion over the data distribution to esti-
mate the LOO parameters,
3
which is fundamentally dierent from
our Taylor expansion-based estimation. Specically, von Mises ex-
pansion requires that the perturbed data distribution should be
in a convex set of the original data distribution and all possible
empirical distributions [
16
]. Since the node distribution of a graph
is often unknown, the basic assumption of von Mises expansion
might not hold on graph data. However, our denition relies on the
Taylor expansion over model parameters which are often drawn
independently from Gaussian distribution(s). Thus, our method
is able to generalize on graphs. Second, [
2
] works for regression
or binary classication tasks by default, whereas we target more
general learning settings on graphs (e.g., multi-class node classi-
cation). Third, jackknife uncertainty is always able to quantify
aleatoric uncertainty and epistemic uncertainty simultaneously on
3A statistical functional is a map that maps a distribution to a real number.
摘要:

JuryGCN:QuantifyingJackknifeUncertaintyonGraphConvolutionalNetworksJianKang∗,QinghaiZhou∗,andHanghangTongUniversityofIllinoisatUrbana-Champaign,{jiank2,qinghai2,htong}@illinois.edu;ABSTRACTGraphConvolutionalNetwork(GCN)hasexhibitedstrongempiri-calperformanceinmanyreal-worldapplications.Thevastmajori...

展开>> 收起<<
JuryGCN Quantifying Jackknife Uncertainty on Graph Convolutional Networks.pdf

共11页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:11 页 大小:924.15KB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 11
客服
关注