JuryGCN Quantifying Jackknife Uncertainty on Graph Convolutional Networks

2025-04-27 0 0 924.15KB 11 页 10玖币

侵权投诉

JuryGCN: antifying Jackknife Uncertainty on Graph

Convolutional Networks

Jian Kang∗, Qinghai Zhou∗, and Hanghang Tong

University of Illinois at Urbana-Champaign, {jiank2, qinghai2, htong}@illinois.edu;

ABSTRACT

Graph Convolutional Network (GCN) has exhibited strong empiri-

cal performance in many real-world applications. The vast majority

of existing works on GCN primarily focus on the accuracy while

ignoring how condent or uncertain a GCN is with respect to its

predictions. Despite being a cornerstone of trustworthy graph min-

ing, uncertainty quantication on GCN has not been well studied

and the scarce existing eorts either fail to provide deterministic

quantication or have to change the training procedure of GCN by

introducing additional parameters or architectures. In this paper,

we propose the rst frequentist-based approach named JuryGCN in

quantifying the uncertainty of GCN, where the key idea is to quan-

tify the uncertainty of a node as the width of condence interval

by a jackknife estimator. Moreover, we leverage the inuence func-

tions to estimate the change in GCN parameters without re-training

to scale up the computation. The proposed JuryGCN is capable of

quantifying uncertainty deterministically without modifying the

GCN architecture or introducing additional parameters. We per-

form extensive experimental evaluation on real-world datasets in

the tasks of both active learning and semi-supervised node classi-

cation, which demonstrate the ecacy of the proposed method.

CCS CONCEPTS

•Information systems →Data mining.

KEYWORDS

Graph neural networks, uncertainty quantication, jackknife

ACM Reference Format:

Jian Kang

∗

, Qinghai Zhou

∗

, and Hanghang Tong. 2022. JuryGCN: Quantify-

ing Jackknife Uncertainty on Graph Convolutional Networks. In Proceedings

of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Min-

ing (KDD ’22), August 14–18, 2022, Washington, DC, USA. ACM, New York,

NY, USA, 11 pages. https://doi.org/10.1145/3534678.3539286

1 INTRODUCTION

Graph Convolutional Network (GCN) has become a prevalent learn-

ing paradigm in many real-world applications, including nancial

fraud detection [

], drug discovery [

] and trac prediction [

To date, the vast majority of existing works do not take into ac-

count the uncertainty of a GCN regarding its prediction, which is

* Equal contribution.

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for prot or commercial advantage and that copies bear this notice and the full citation

on the rst page. Copyrights for components of this work owned by others than ACM

must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,

to post on servers or to redistribute to lists, requires prior specic permission and/or a

fee. Request permissions from permissions@acm.org.

KDD ’22, August 14–18, 2022, Washington, DC, USA

ACM ISBN 978-1-4503-9385-0/22/08. . . $15.00

https://doi.org/10.1145/3534678.3539286

alarming especially in high-stake scenarios. For example, in auto-

mated nancial fraud detection, it is vital to let expert banker to

take controls if a GCN-based detector is highly uncertain about

its predictions in order to prevent wrong decisions on suspending

banking account(s).

A well established study on uncertainty quantication of GCN

could bring several crucial benets. First, it is a cornerstone in

trustworthy graph mining. Uncertainty quantication aims to un-

derstand to what extent the model is likely to be incorrect, and thus

provides natural remedy to questions like how uncertain is a GCN

in its own predictions? Second, an accurate quantication of GCN

uncertainty could potentially answer how to improve GCN predic-

tions by leveraging its uncertainty in many graph mining tasks. For

example, in active learning on graphs, nodes with high uncertainty

could be selected as the most valuable node to query the oracle;

in node classication, the node uncertainty could help calibrate

the condence of GCN predictions, thereby improving the overall

classication accuracy.

Important as it could be, very few studies on uncertainty quan-

tication of GCN exist, which mainly focuses on two dierent direc-

tions: Bayesian-based approaches and deterministic quantication-

based approaches. Regarding Bayesian-based approaches [

they either drops edge(s) with certain sampling strategies or lever-

ages random graph model (e.g., stochastic block model) to assign

edge probabilities for training. However, these models fall short in

explicitly quantifying the uncertainty on model predictions. An-

other type of methods, i.e., deterministic quantication-based ap-

proaches [

], directly quanties uncertainty by parame-

terizing a Dirichlet distribution as prior to estimate the posterior

distribution under a Bayesian framework. Nevertheless, it changes

the training procedures of a graph neural network by introducing

additional parameters (e.g., parameters for Dirichlet distribution) or

additional architectures (e.g., teacher network) in order to precisely

estimate the uncertainty.

To address the aforementioned limitations, we provide the rst

study on frequentist-based analysis of the GCN uncertainty, which

we term as the JuryGCN problem. Building upon the general prin-

ciple of jackknife (leave-one-out) resampling [

], the jackknife

uncertainty of a node is dened as the width of condence in-

terval constructed by a jackknife estimator when leaving the cor-

responding node out. In order to estimate the GCN parameters

without exhaustively re-training GCN, we leverage inuence func-

tions [

] to quantify the change in GCN parameters by innites-

imally upweighting the loss of a training node. Compared with

existing works, our method brings several advantages. First, our

method provides deterministic uncertainty quantication, which

is not available in Bayesian-based approaches [

]. Second,

dierent from existing works on deterministic uncertainty quanti-

cation [

], our method does not introduce any additional

parameters or components in the GCN architecture. Third, our

arXiv:2210.05959v1 [cs.LG] 12 Oct 2022

KDD ’22, August 14–18, 2022, Washington, DC, USA Jian Kang∗, Qinghai Zhou∗, and Hanghang Tong

method can provide post-hoc uncertainty quantication. As long as

the input graph and a GCN are provided, our method can always

quantify node uncertainty without any epoch(s) of model training.

The major contributions of this paper are summarized as follows.

•Problem denition.

To our best knowledge, we provide the

rst frequentist-based analysis of GCN uncertainty and for-

mally dene the JuryGCN problem.

•Algorithm and analysis.

We propose JuryGCN to quantify

jackknife uncertainty on GCN. The key idea is to leverage a

jackknife estimator to construct a leave-one-out predictive

condence interval for each node, where the leave-one-out

predictions are estimated using the inuence functions with

respect to model parameters.

•Experimental evaluations.

We demonstrate the eective-

ness of JuryGCN through extensive experiments on real-world

graphs in active learning on node classication and semi-

supervised node classication.

2 PROBLEM DEFINITION

In this section, we rst introduce preliminary knowledge on the

Graph Convolutional Network (GCN), predictive uncertainty and

jackknife resampling. Then, we formally dene the problem of

jackknife uncertainty quantication on GCN (JuryGCN).

Unless otherwise specied, we use bold upper-case letters for

matrices (e.g.,

), bold lower-case letters for vectors (e.g.,

), calli-

graphic letters for sets (e.g.,

) and fraktur font for high-dimensional

tensors (

ℌ

). We use supercript

𝑇

for matrix transpose and super-

script

−1

for matrix inversion, i.e.,

A𝑇

and

A−1

are the transpose and

inverse of

, respectively. We use conventions similar to PyTorch

in Python for indexing. For example,

A[𝑖, 𝑗]

represents the entry of

at the

𝑖

-th row and

𝑗

-th column;

A[𝑖,

]

and

, 𝑗]

demonstrate

the 𝑖-th row and 𝑗-th column of A, respectively.

2.1 Preliminaries

1 – Graph Convolutional Network (GCN)

Let

G={V,A,X}

denote a graph whose node set is

, adjacency matrix is

and node

feature matrix is

. For the

𝑙

-th hidden layer in an

𝐿

-layer GCN, we

assume

E(𝑙)

is the output node embeddings (where

E(0)=X

) and

W(𝑙)

is the weight matrix. Mathematically, the graph convolution at

the

𝑙

-th hidden layer can be represented by

E(𝑙)=𝜎(ˆ

AE(𝑙−1)W(𝑙))

where

𝜎

is the activation and

A=˜

D−1

2(A+I)˜

D−1

is the renormal-

ized graph Laplacian with ˜

Dbeing the degree matrix of (A+I).

2 – Uncertainty quantication

is one of the cornerstones in

safe-critical applications. It provides accurate quantication on

how condent a mining model is towards its predictions. In general,

uncertainty can be divided into two types: aleatoric uncertainty

and epistemic uncertainty [

]. Aleatoric uncertainty (or data uncer-

tainty) refers to the variability in mining results due to the inherent

randomness in input data, which is irreducible due to complexity

of input data (e.g., noise); whereas epistemic uncertainty (or model

uncertainty) measures how well the mining model ts the training

data due to the lack of knowledge on the optimal model parameters,

which is reducible by increasing the size of training data.

3 – Jackknife resampling

is a classic method to estimate the bias

and variance of a population [

]. It often relies on a jackknife

estimator which is built by leaving out an observation from the

entire population (i.e., leave-one-out) and evaluating the error of

the model re-trained on the held-out population. Suppose we have

(1) a set of

𝑛

data points

D={(x𝑖, 𝑦𝑖)|𝑖=

, . . . , 𝑛}

, (2) a test

point

(xtest, 𝑦test)

, (3) a mining model

𝑓𝜃()

parameterized by

𝜃

(e.g.,

a neural network) where

𝑓𝜃(x)

is the prediction of input feature

and (4) a target coverage level

(

−𝛼)

such that the label

𝑦

covered by the predictive condence interval with probability

(

−

𝛼)

. Mathematically, the condence interval constructed by the naive

jackknife [

] is upper bounded by

C+(xtest)=𝑄1−𝛼(R+)

and

lower bounded by

C−(xtest)=𝑄𝛼(R−)

, where

𝑄𝛼

nds the

𝛼

quantile of a set and

R𝛾={𝑓𝜃(xtest) + 𝛾· |𝑦test −𝑓𝜃−𝑖(xtest)||𝑖=

, . . . , 𝑛}

for

𝛾∈ {−,+}

and

|𝑦test −𝑓𝜃−𝑖(xtest)|

is the error residual of

the re-trained model on the dataset

D\{(x𝑖, 𝑦𝑖)}

(i.e., parameterized

𝜃−𝑖

Hence,

R+

and

R−

represent the sets of upper and lower

uncertainty bound on the original model prediction (i.e.,

𝑓𝜃(xtest)

Furthermore, jackknife+ [

] constructs the predictive condence

interval for exchangeable data as

C+(xtest)=𝑄1−𝛼(P+)C−(xtest)=𝑄𝛼(P−)(1)

where

P𝛾

for

𝛾∈ {−,+}

is dened as

P𝛾={𝑓𝜃−𝑖(xtest) + 𝛾· |𝑦𝑖−

𝑓𝜃−𝑖(x𝑖)||𝑖=

, . . . , 𝑛}

. Similarly,

P−

and

P+

represent the sets

of the lower and upper uncertainty bound of the leave-one-out

prediction

𝑓𝜃−𝑖(xtest)

, respectively. With the assumption on data

exchangeability, it yields a (1−2𝛼)coverage rate theoretically.

2.2 Problem Denition

Existing works on deterministic uncertainty quantication on a

graph neural network (GNN) mainly rely on changing the training

procedures of a vanilla GNN (i.e., graph neural network without

consideration of uncertainty) [

]. As such, given a well-trained

GNN, it requires epoch(s) of re-training to quantify its uncertainty.

Nevertheless, it would cost a lot of computational resources to re-

train it, especially when the model has already been deployed in

an operational environment. Additionally, it remains a challenging

problem to further comprehend the predictive results of GNN from

the perspective of uncertainty, and to answer the following question:

to what extent the GNN model is condent of the current prediction?

Therefore, it is essential to investigate the uncertainty quantication

in a post-hoc manner, i.e., quantifying uncertainty without further

(re-)training on the model.

Regarding post-hoc uncertainty quantication for IID (i.e., non-

graph) data, Alaa and van der Schaar [

] propose a frequentist-

based method inspired by jackknife resampling. It uses high-order

inuence functions to quantify the impact of a data point on the

underlying neural network. Given that the parameters of a neural

network are derived by learning with data points, high-order inu-

ence functions are capable of understanding how much a data point

will aect the model parameters. Then, the change in model param-

eters can be used to infer the uncertainty of the corresponding data

point on the neural network by a jackknife estimator [

]. Under

mild assumption (such as the algorithmic stability assumption and

the IID/exchangebility of data), the naive jackknife estimator [

]

and its variants [

] bear strong theoretical guarantee in terms of

the coverage such that the condence interval will cover the true

model parameters with a high probability.

Building upon the jackknife resampling [

] and the general prin-

ciple outlined in [

], we seek to bridge the gap between frequentist-

based uncertainty quantication and graph neural networks. To

We use

𝛾

to represent the symbol before the leave-one-out error, i.e.,

𝑓𝜃(xtest) −

|𝑦test −𝑓𝜃−𝑖(xtest) | when 𝛾=−, or 𝑓𝜃(xtest) + |𝑦test −𝑓𝜃−𝑖(xtest) | otherwise.

JuryGCN: antifying Jackknife Uncertainty on Graph Convolutional Networks KDD ’22, August 14–18, 2022, Washington, DC, USA

be specic, given an input graph and a GCN, we aim to estimate

the uncertainty of a node as the impact of leaving out its loss when

computing the overall loss. Formally, we dene the problem of

jackknife uncertainty quantication on GCN, which is referred to

as JuryGCN problem.

Problem 1. JuryGCN:

ackknife

nce

taint

Quantication on

Graph Convolutional Network

Given:

(1) An undirected graph

G={V,A,X}

; (2) an

𝐿

-layer

GCN with the set of weights

; (3) a task-specic loss function

𝑅(G,Y,Θ)where Yis the set of node labels.

Find:

An uncertainty score

UΘ(𝑢)

for any node

𝑢

in graph

with respect to the GCN parameters

and the task-specic loss

function 𝑅(G,Y,Θ).

3JURYGCN: MEASURE AND COMPUTATION

In this section, we start by discussing the general strategy of quan-

tifying jackknife uncertainty with inuence functions, and formally

dene the node jackknife uncertainty. After that, we present math-

ematical analysis on the inuence function computation for GCN.

3.1 Jackknife Uncertainty of GCN

In this paper, we consider an

𝐿

-layer GCN with ReLU activation for

node-level tasks (e.g., node classication). We further assume that

the nodes of the input graph Gare exchangeable data.2

We rst observe that the loss function of many node-level tasks

can often be decomposed into a set of node-specic subproblems.

Mathematically, it can be written as

Θ∗=argmin

𝑅(G,Y

train,Θ)=argmin

|Vtrain|

𝑣∈Vtrain

𝑟(𝑣, y𝑣,Θ)

(2)

where

Vtrain ⊆ V

is the set of training nodes,

|Vtrain|

is the num-

ber of training nodes,

train ={y𝑣|𝑣∈ Vtrain}

is the set of ground-

truth training labels and

y𝑣

is the label of node

𝑣

. In this case,

the overall loss function

𝑅(G,Y

train,Θ)

is decomposed into several

subproblems, each of which minimizes the node-specic loss func-

tion

𝑟(𝑣, y𝑣,Θ)

for a node

𝑣

. An example is the cross entropy with

node-specic loss as

𝑟(𝑣, y𝑣,Θ)=−Í𝑐

𝑖=1y𝑣[𝑐]log GCN(𝑣, Θ) [𝑐]

where

𝑐

is the number of classes,

GCN(𝑣, Θ)

is the vector of pre-

dicted probabilities for each class using the GCN with parameters

. If we upweight the importance of optimizing the loss of a node

𝑖

with some small constant 𝜖, we have the following loss function.

Θ∗

𝜖,𝑖 =argmin

𝜖𝑟 (𝑖, y𝑖,Θ) + 1

|Vtrain|

𝑣∈Vtrain

𝑟(𝑣, y𝑣,Θ)(3)

Inuence function is a powerful approach to evaluate the depen-

dence of the estimator on the value of the data examples [

]. In

order to obtain

Θ∗

𝜖,𝑖

without re-training the GCN, we leverage the

inuence function [

], which is essentially the Taylor expansion

over the model parameters.

Θ∗

𝜖,𝑖 ≈Θ∗+𝜖IΘ∗(𝑖)(4)

where

IΘ∗(𝑖)=𝑑Θ∗

𝜖,𝑖

𝑑𝜖 |𝜖=0

is the inuence function with respect to

node

𝑖

. The inuence function

IΘ∗(𝑖)

can be further computed using

the classical result in [9] as

IΘ∗(𝑖)=H−1

Θ∗∇Θ𝑟(𝑖, y𝑖,Θ∗)(5)

A sequence of random variables is exchangeable if and only if the joint distribu-

tion of the random variables remains unchanged regardless of their ordering in the

sequence [

]. This assumption is commonly used in random graph models, e.g.,

Erdős-Rényi model [14], stochastic block model [22].

where

HΘ∗=1

|Vtrain |∇2

Θ𝑅(G,Y

train,Θ∗)

is the Hessian matrix with

respect to model parameters

Θ∗

. For a training node

𝑖

, by setting

𝜖=−1

|Vtrain |

, Eq.

(4)

eciently estimates the leave-one-out (LOO)

parameters

Θ∗

𝜖,𝑖

if leaving out the loss of node

𝑖

. After that, by simply

switching the original model parameters to

Θ∗

𝜖,𝑖

, we estimate the

leave-one-out error err𝑖of node 𝑖as follows.

err𝑖=∥y𝑖−GCN(𝑖, Θ∗

𝜖,𝑖 )∥2(6)

where

GCN(𝑢, Θ∗

𝜖,𝑖 )

represents the output of node

𝑢

using the GCN

with leave-one-out parameters Θ∗

𝜖,𝑖 .

With Eq.

(6)

, we use jackknife+ [

], which requires the data to be

exchangeable instead of IID, to construct the condence interval of

node

𝑢

. Mathematically, the lower bound

C−

Θ(𝑢)

and upper bound

C+

Θ(𝑢)of the predictive condence interval of node 𝑢are

C−

Θ∗(𝑢)=𝑄𝛼({∥GCN(𝑢, Θ∗

𝜖,𝑖 )∥2−err𝑖|∀𝑖∈ Vtrain \ {𝑢}})

C+

Θ∗(𝑢)=𝑄1−𝛼({∥GCN(𝑢, Θ∗

𝜖,𝑖 )∥2+err𝑖|∀𝑖∈ Vtrain \ {𝑢}})

(7)

where

𝑄𝛼

and

𝑄1−𝛼

are the

𝛼

and

(

−𝛼)

quantile of a set. Since

a wide condence interval of node

𝑢

means that the model is less

condent with respect to node

𝑢

, it implies that node

𝑢

has high

uncertainty. Following this intuition, the uncertainty of node

𝑢

can

be naturally quantied by the width of the corresponding con-

dence interval (Eq.

(7)

). Since the uncertainty is quantied using the

condence interval constructed by a jackknife estimator, we term it

as jackknife uncertainty which is formally dened in Denition 1.

Definition 1. (Node Jackknife Uncertainty) Given an input graph

with node set

, a set of training nodes

Vtrain ⊆ V

and an

𝐿

-layer

GCN with parameters

∀𝑖∈ Vtrain

and

∀𝑢∈ V

, we assume (1)

the nodes are exchangeable, and denote that (2) the LOO parameters

are

Θ𝜖,𝑖

, (3) the error is dened as Eq.

(6)

and (4) the lower bound

C−

Θ(𝑢)and the upper bound C+

Θ(𝑢)of predictive condence interval

are dened as Eq. (7), the jackknife uncertainty of node 𝑢is

UΘ(𝑢)=C+

Θ(𝑢) − C−

Θ(𝑢)(8)

We note that Alaa and van Der Schaar [

] leverage high-order

inuence functions to quantify the jackknife uncertainty for IID

data. Though Eq.

(4)

shares the same form as in [

] when the order

is up to 1, our work bears three subtle dierences. First, [

] views

the model parameters as statistical functionals of data distribution

and exploits von Mises expansion over the data distribution to esti-

mate the LOO parameters,

which is fundamentally dierent from

our Taylor expansion-based estimation. Specically, von Mises ex-

pansion requires that the perturbed data distribution should be

in a convex set of the original data distribution and all possible

empirical distributions [

]. Since the node distribution of a graph

is often unknown, the basic assumption of von Mises expansion

might not hold on graph data. However, our denition relies on the

Taylor expansion over model parameters which are often drawn

independently from Gaussian distribution(s). Thus, our method

is able to generalize on graphs. Second, [

] works for regression

or binary classication tasks by default, whereas we target more

general learning settings on graphs (e.g., multi-class node classi-

cation). Third, jackknife uncertainty is always able to quantify

aleatoric uncertainty and epistemic uncertainty simultaneously on

3A statistical functional is a map that maps a distribution to a real number.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

JuryGCN:QuantifyingJackknifeUncertaintyonGraphConvolutionalNetworksJianKang∗,QinghaiZhou∗,andHanghangTongUniversityofIllinoisatUrbana-Champaign,{jiank2,qinghai2,htong}@illinois.edu;ABSTRACTGraphConvolutionalNetwork(GCN)hasexhibitedstrongempiri-calperformanceinmanyreal-worldapplications.Thevastmajori...

展开>> 收起<<

JuryGCN Quantifying Jackknife Uncertainty on Graph Convolutional Networks.pdf

共11页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

JuryGCN Quantifying Jackknife Uncertainty on Graph Convolutional Networks

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: