Maximum Common Subgraph Guided Graph Retrieval Late and Early Interaction Networks Indradyumna Roy Soumen Chakrabarti and Abir De

2025-04-29 0 0 1.54MB 25 页 10玖币
侵权投诉
Maximum Common Subgraph Guided Graph Retrieval:
Late and Early Interaction Networks
Indradyumna Roy, Soumen Chakrabarti, and Abir De
IIT Bombay,
{indraroy15, soumen, abir}@cse.iitb.ac.in
Abstract
The graph retrieval problem is to search in a large corpus of graphs for ones that are most similar to a query graph.
A common consideration for scoring similarity is the maximum common subgraph (MCS) between the query and
corpus graphs, usually counting the number of common edges (i.e., MCES). In some applications, it is also desirable
that the common subgraph be connected, i.e., the maximum common connected subgraph (MCCS). Finding exact
MCES and MCCS is intractable, but may be unnecessary if ranking corpus graphs by relevance is the goal. We
design fast and trainable neural functions that approximate MCES and MCCS well. Late interaction methods compute
representations for the query and corpus graphs separately, and compare these representations using simple similarity
functions at the last stage, leading to highly scalable systems. Early interaction methods combine information from
both graphs right from the input stages, are usually considerably more accurate, but slower. We propose both late and
early interaction neural MCES and MCCS formulations. They are both based on a continuous relaxation of a node
alignment matrix between query and corpus nodes. For MCCS, we propose a novel differentiable ‘gossip’ network for
estimating the size of the largest connected common subgraph. Extensive experiments with seven data sets show that
our proposals are superior among late interaction models in terms of both accuracy and speed. Our early interaction
models provide accuracy competitive with the state of the art, at substantially greater speeds.
1 Introduction
Given a query graph, the graph retrieval problem is to search for relevant or similar response graphs from a corpus of
graphs [
1
,
2
,
3
,
4
,
5
,
6
,
7
,
8
,
9
,
10
,
11
,
12
]. Depending on the application, the notion of relevance may involve graph
edit distance (GED) [
13
,
14
,
15
], the size of the maximum common subgraph (MCS) [
6
,
7
,
8
,
9
,
10
], full graph or
subgraph isomorphism [
6
,
11
,
12
], etc. In this work, we focus on two variations of MCS-based relevance measures:
(i) maximum common edge subgraph (MCES) [
16
], which has applications in distributed computing [
17
,
16
] and
molecule search [
18
,
19
,
20
,
21
] and (ii) maximum common connected subgraph (MCCS) [
22
], which has applications
in keyword search over knowledge graphs [
23
,
24
], software development [
25
,
26
,
27
], image analysis [
28
,
29
,
30
], etc.
In recent years, there has been an increasing interest in designing neural graph retrieval models [6, 7, 8, 9, 10, 11].
However, most of them learn black box relevance models which provide suboptimal performance in the context of MCS
based retrieval (Section 4). Moreover, they do not provide intermediate matching evidence to justify their scores and
therefore, they lack interpretability. In this context, Li et al.
[6]
proposed a graph matching network (GMN) [
6
] based
on a cross-graph attention mechanism, which works extremely well in practice (Section 4). Nevertheless, it suffers from
three key limitations, leaving considerable scope for the design of enhanced retrieval models. (i) Similar to other graph
retrieval models, it uses a general-purpose scoring layer, which renders it suboptimal in the context of MCS based graph
retrieval. (ii) As acknowledged by the authors, GMN is slow in both training and inference, due to the presence of the
expensive cross-attention mechanism. (iii) MCS or any graph similarity function entails an injective mapping between
nodes and edges across the graph pairs. In contrast, cross-attention induces potentially inconsistent and non-injective
mappings, where a given query node can be mapped to multiple corpus nodes and vice-versa.
1
arXiv:2210.11020v1 [cs.LG] 20 Oct 2022
1.1 Our contributions
We begin by writing down (the combinatorial forms of) MCES and MCCS objectives in specific ways that facilitate
subsequent adaptation to neural optimization using both late and early interaction networks. Notably, these networks are
trained end-to-end, using only the distant supervision by MCES or MCCS values of query-corpus graph pairs, without
explicit annotation of the structural mappings identifying the underlying common subgraph.
Late interaction models.
We use a graph neural network (GNN) to first compute the node embeddings of the query
and corpus graphs independently of each other and then deploy an interaction network for computing the relevance
scores. This decoupling between embedding computation and interaction steps leads to efficient training and inference.
We introduce LMCES and LMCCS, two late interaction neural architectures for MCES and MCCS based graph
retrieval respectively. The interaction model is a differentiable graph alignment planner. It learns a Gumbel-Sinkhorn
(GS) network to provide an approximate alignment plan between the query and corpus graphs. In contrast to GMN [
6
],
it induces an approximately injective mapping between the nodes and edges of the query and corpus graphs. The MCES
objective is then computed as a differentiable network applied to this mapping. For MCCS, we further develop a novel
differentiable gossip network that computes the size of the largest connected component in the common subgraph
estimated from the above mapping. These neural gadgets may be of independent interest.
Early interaction model.
In the early interaction model, we perform the interaction step during the node embedding
computation phase, which makes the query and corpus embeddings dependent on each other. This improves predictive
power, at the cost of additional training and inference time. Here, we propose XMCS (cross-MCS), an early interaction
model that works well for both MCES and MCCS based graph retrieval. At each propagation layer of the GNN, we first
refine the alignment plan using the embeddings computed in the previous layer, then update the underlying coverage
objective using the refined alignment and finally use these signals to compute the node embeddings of the current layer.
Comprehensive evaluation.
We experiment with seven diverse datasets, which show that: (i) our late and early
interaction models outperform the corresponding state-of-the-art methods in terms of both accuracy and inference
time; (ii) in many cases, LMCES and LMCCS outperform the early interaction model of GMN [
6
]; and (iii) GMN’s
accuracy can be significantly improved by substituting its final layer with our MCS-specific neural surrogate.
1.2 Related work
Combinatorial algorithms for MCS.
Both MCES and MCCS are NP-complete [
16
]. Several works designed heuristics
for computing MCES for specific types of graphs [
19
,
18
,
31
]. Bahiense et al.
[16]
formulated MCES as an integer
programming problem, provided a polyhedral analysis of the underlying formulation, and finally designed a branch
and cut algorithm to solve it. Such polyhedral study for MCES was also performed by others [
32
,
33
]. Combinatorial
methods for different variants of MCCS have also been thoroughly studied. Some of them provide exact MCCS [
34
,
35
,
36
]. McCreesh et al.
[34]
proposed McSplit, a branch and bound algorithm for maximum common induced and
connected graph, which is efficient in terms of time and memory. Other works provide effective heuristics to find
approximate MCCS [37, 38]. However, these methods are not differentiable and therefore not suitable for data-driven
MCS estimation.
Learning models for MCS.
There are some recent efforts to design machine learning models for graph similarity
and search [
39
,
8
]. Among them, Bai et al.
[39
, GLsearch
]
compute MCS between two graphs using a reinforcement
learning setup. In contrast, we consider a supervised learning setting for graph retrieval. Although Bai et al.
[8
,
GraphSim
]
focus on the supervised learning scenario, their relevance scoring model performs poorly for MCS based
retrieval.
Graph matching for computer vision.
Neural models for graph matching are used in applications of computer vision
for computing image similarity, object detection, etc. However, these applications permit explicit supervision of the
underlying node alignments [
40
,
41
,
42
,
43
,
44
,
45
,
46
,
47
,
48
,
49
]. They adopt different types of losses which include
permutation loss [
42
,
43
,
44
,
45
], Hungarian loss [
47
], and displacement loss [
41
]. In our problem, we only use distant
supervision in the form of size of the underlying MCS. Moreover, these work mostly consider graph matching problems,
whereas we consider maximum common subgraph detection using distant supervision from MCS score alone.
Graph representation learning.
Representation learning on graphs has been widely researched in the last decade [
50
,
2
51
,
52
,
53
,
54
,
55
]. Among them, graph neural networks (GNN) are the most popular node embedding models [
50
,
51
,
52
,
53
,
56
,
57
]. Given a node
u
, a GNN collects information from
K
-hop neighbors of the node
u
and applies a
symmetric aggregator on top of it to obtain the representation vector of the nodes. In the context of graph retrieval, the
node embeddings are used in two ways. In the first approach, they are further aggregated into graph embeddings, which
are then used to compute the similarity between query and corpus graphs, by comparing the embeddings in the vector
space [
11
,
6
]. The second approach consists of SimGNN [
7
], GOTSim [
9
], GraphSim [
8
], GEN [
6
] and GMN [
6
],
which compare the node embeddings and find suitable alignments between them. Here, GMN applies cross attention
based mechanism on the node embeddings given a graph neural network [
50
]. Recently, Chen et al.
[58]
designed a
structure aware transformer architecture, which can represent a subgraph around a node more effectively than several
other representation models.
Differentiable solvers for combinatorial algorithms.
Our work attempts to find a neural surrogate for the combinato-
rial challenge of maximizing objective scores over a permutation space. In effect, we are attempting to solve an Optimal
Transport problem, where the central challenge is to present a neural gadget which is differentiable and backpropagable,
thus enabling end-to-end training. Cuturi
[59]
utilized iterative row and column normalization, earlier proposed by
Sinkhorn [
60
], to approximately solve the transportation problem subject to marginal constraints. In another approach,
Vlastelica et al.
[61]
attempted to solve combinatorial problems exactly using available black-box solvers, by proposing
to use the derivative of an affine surrogate of the piece-wise constant function in the backward pass. Rolínek et al.
[62]
leverage this to perform deep graph matching based on explicit supervision of the ground truth node alignments. In
another approach, Berthet et al.
[63]
perturb the inputs to the discrete solvers with random noise, so as to make them
differentiable. Karalias and Loukas
[64]
design probabilistic loss functions for tackling the combinatorial objective of
selecting a subset of nodes adhering to some given property. Finally, Kotary et al.
[65]
present a detailed survey of the
existing neural approaches for solving constrained optimization problems on graphs.
2 Late interaction models for MCES and MCCS
In this section, we first write down the exact objectives for MCES and MCCS. These expressions are partly based upon
a pairing of nodes between query and corpus graphs, and partly on typical graph algorithms. They lead naturally to our
subsequent development of two late interaction models, LMCES and LMCCS. We begin with formal definitions of
MCES and MCCS.
Definition 1 (MCES and MCCS) Given query and corpus graphs Gq=(Vq, Eq)and Gc=(Vc, Ec).
(1)
The maximum common edge subgraph
MCES(Gq, Gc)
is the common (not necessarily induced nor connected)
subgraph between Gqand Gc, having the maximum number of edges [16].
(2)
The maximum common connected subgraph
MCCS(Gq, Gc)
is the common connected subgraph with the maximum
number of nodes [22].
2.1 Combinatorial formulations for MCES and MCCS
As mentioned in Section 1.2, combinatorial algorithms for MCES and MCCS abound in the literature [
35
,
34
,
16
,
19
,
18
,
31
]. However, it is difficult to design neural surrogates for these algorithms. Therefore, we come up with the new
optimization objectives for MCES and MCCS, which allow us to design neural surrogates by gradually replacing its
different components with differentiable parameterized components.
Exact MCES.
Given a query graph
Gq= (Vq, Eq)
and a corpus graph
Gc= (Vc, Ec)
, we pad the graph having fewer
vertices (typically,
Gq
), with
||Vc| − |Vq||
disconnected nodes. This ensures that the padded graphs have an equal
number of nodes
N
. Let us denote the adjacency matrices of
Gq
and
Gc
(after padding) as
Aq∈ {0,1}N×N
and
Ac∈ {0,1}N×N
. To find
MCES(Gq, Gc)
from the adjacency matrices
Aq
and
Ac
, we first obtain the candidate
common subgraph under some proposed node alignment given by permutations
P
of
Ac
, which can be characterized by
the adjacency matrix
min(Aq,P AcPT)
. This matrix shows the overlapping edges under the proposed node alignment.
Subsequently, we choose the permutation which maximizes the total number of edges in this subgraph. Formally, we
3
compute the MCES score by solving a coverage maximization problem, as follows:
maxP∈P Pi,j min Aq,P AcP>i,j (1)
where Pis the set of permutation matrices of size N×Nand the min operator is applied elementwise.
Exact MCCS.
MCES does not require the common subgraph to be connected, which may be desirable in some
applications. For example, in keyword search and question answering over knowledge graphs (KGs) [
23
,
66
,
24
], one
may wish to have the entity nodes, forming the response, to be connected to each other. In molecule search, one may
require a connected functional group to be present in both query and corpus graphs [
20
]. In such situations, MCCS may
be more appropriate.
Given a query graph
Gq
and a corpus graph
Gc
and their adjacency matrices
Aq
and
Ac
after padding, we first
apply a row-column permutation on
Ac
using the permutation matrix
P
and obtain the candidate common subgraph
with adjacency matrix
min(Aq,P AcP>)
. Then we apply Tarjan’s algorithm [
67
] to return the size of the largest
connected component, which we maximize w.r.t. P:
maxP∈P TARJANSCC min Aq,P AcP>.(2)
Here TARJANSCC takes the adjacency matrix as input and returns the size of the largest connected component of
corresponding graph.
Bottleneck of approximating the optimization problems (1) and (2).
One way of avoiding the intractability of
searching through
P
, is to replace the hard permutation matrix with a differentiable soft surrogate via the Gumbel-
Sinkhorn (GS) network [68, also see Section B]. However, such a relaxation is not adequate on its own.
1. Most elements of min(Aq,P AcP>)are binary, which deprives the learner of gradient signals.
2.
In practice, the nodes or edges may have associated (noisy) features, which play an important role in determining
similarity between nodes or edges across query and corpus graphs. For example, in scene graph based image
retrieval, "
panther
" may be deemed similar to "
leopard
". The objectives in Eqs.
(1)
and
(2)
do not capture
such phenomenon.
3.
Tarjan’s algorithm first applies DFS on a graph to find the connected components and then computes the size of the
largest among them, in terms of the number of nodes. Therefore, even for a fixed P, it is not differentiable.
Next, we address the above bottlenecks by replacing the objectives
(1)
and
(2)
with two neural surrogates, which are
summarized in Figure 1.
2.2 Design of LMCES
We design the neural surrogate of Eq.
(1)
by replacing the adjacency matrices with the corresponding continuous node
embeddings computed by a GNN, and the hard permutation matrix with a soft surrogate—a doubly stochastic matrix—
generated by the Gumbel-Sikhorn network [68]. These node embeddings allow us to compute non-zero gradients and
approximate similarity between nodes and their local neighborhood in the continuous domain. Specifically, we compute
this neural surrogate in the following two steps.
Step 1: Computing node embeddings.
We use a message passing graph neural network [
50
,
69
]
GNNθ
with
R
propagation layers and trainable parameters
θ
, to compute the node embeddings
hu(1),...,hu(R)Rd
, for
each node
u
in the query and corpus graphs, to which
GNNθ
is applied separately. Finally, we build two matrices
Hq(r),Hc(r)RN×d
for
r[R]
by stacking the node embedding vectors for query and corpus graphs. Formally,
we have
Hq(1),...,Hq(R) = GNNθ(Gq),Hc(1),...,Hc(R) = GNNθ(Gc)(3)
Step 2: Interaction between Hq(r)and Hc(r).
In principle, the embeddings
hu(1),...,hu(R)
of a node
u
capture
information about the neighborhood of
u
. Thus, we can view the set of embedding matrices
{Hq(r)|r[R]}
and
{Hc(r)|r[R]}
as a reasonable representation of the query and corpus graphs, respectively. To compute a smooth
surrogate of the adjacency matrix of the common subgraph, i.e.,
min(Aq,P AcP>)
, we seek to find the corresponding
alignment between
Hq(r)
and
Hc(r)
using soft-permutation (doubly stochastic) matrices
P(r)
generated through a
Gumbel-Sikhorn network
GSφ
. Here, we feed
Hq(r)
and
Hc(r)
into
GSφ
and obtain a doubly stochastic matrix
P(r)
:
P(r)= GSφ(Hq(r),Hc(r)) r[R](4)
4
Finally, we replace the true relevance scoring function in Eq. (1) with the following smooth surrogate:
s(Gq, Gc) = X
r[R]
wrX
i,j
min(Hq(r),P(r)Hc(r))i,j (5)
Here
{wr0 : r[R]}
are trainable parameters, balancing the quality of signals over all message rounds
r
. Note
that the
R
message rounds execute on the query and corpus graphs separately. The interaction between corresponding
rounds, happens at the very end.
2.3 Design of LMCCS
In case of MCES, our key modification was to replace the adjacency matrices with node embeddings and design
a differentiable network to generate a soft-permutation matrix. In the case of MCCS, we have to also replace the
non-differentiable step of finding the size of the largest connected component of the common subgraph with a neural
surrogate, for which we design a novel gossip protocol.
Differentiable gossip protocol to compute the largest connected component.
Given any graph
G= (V, E)
with
the adjacency matrix
B
, we can find its largest connected component (LCC) by using a gossip protocol. At iteration
t= 0
, we start with assigning each node a message vector
xu(0) RN
, which is the one-hot encoding of the node
u
,i.e.,
xu(0)[v] = 1
for
v=u
and
0
otherwise. In iteration
t > 0
, we update the message vectors
xu(t+ 1)
as
xu(t+ 1) = Pvnbr(u)∪{u}xv(t)
. Here,
nbr(u)
is the set of neighbors of
u
. Initially we start with sparse vector
xu
with exactly one non-zero entry. As we increase the number of iterations,
u
would gradually collect messages from
Algorithm 1 GOSSIP(B)
1: X(0) = I#identity
2: for t= 1,...T 1do
3: X(t+ 1) X(t)(B+I)
4: Return maxuV||X(T)[, u]||0
the distant nodes which are reachable from
u
. This would increase the
number of non-zero entries of
xu
. For sufficiently large value of iterations
T
(diameter of
G
), we will attain
xu(T)[v]6= 0
whenever
u
and
v
lie in
the same connected component and
xu(T)[v] = 0
, otherwise. Once this
stage is reached, one can easily compute the number of nodes in the largest
connected component of
G
as
maxu||xu(T)||0
,i.e., the maximum number
of non-zero entries in a message vector. Algorithm 1 shows the gossip
protocol, with the adjacency matrix Bas input and the size of the largest connected component as output.
Exact MCCS computation using the gossip protocol.
Given the query and corpus graphs
Gq
and
Gc
with their
adjacency matrices Aqand Ac, we can rewrite Eqn. (2) in the equivalent form
max
P∈P GOSSIP(min(Aq,P AcP>)).(6)
Recall that Pis the set permutation matrices of size N×N.
Neural surrogate.
One can use a
GS
network to obtain a permutation matrix
P
, which in principle, can support
backpropagation of
min(Aq,P AcP>)
. However, as mentioned in Section 2.1 (items 1 and 2),
A
is 0/1, and
P
often
saturates, losing gradient signal for training. In response, we will design a neural surrogate for the true MCCS size
given in Eq. (2), using three steps.
Step 1: Computation of edge embeddings.
To tackle the above challenge, we introduce a parallel back-propagation
path, in order to allow the GNNs to learn which edges are more important than others. To this end, we first use
the GNNs to compute edge embedding vectors
me(r)RdE
for edges
eE
, in addition to node embeddings
at each propagation layer
r[R]
. Then, we gather the edge embeddings from the final layer
R
, into the matrices
Mq(R),Mc(R)R|EdE
for query and corpus pairs. Next, we feed each separately into a neural network
Lα
with
trainable parameters
α
, which predicts the importance of edges based on each edge embedding. Thus, we obtain two
matrices
SqRN×N
and
ScRN×N
, which are composed of the edge scores between the corresponding node pairs.
Formally, we have:
Hq(R),Mq(R) = GNNθ(Gq),Hc(R),Mc(R) = GNNθ(Gc)(7)
Sq=Lα(Mq(R)) ,Sc=Lα(Mc(R)) (8)
In our implementation, Lαconsists of one linear layer, a ReLU layer and another linear layer.
Step 2: Continuous approximation of MCCS.
In MCES, we approximated the MCES score in Eq.
(1)
directly, using
the neural coverage function defined in Eq.
(5)
in terms of the node embeddings. Note that, here, we did not attempt
to develop any continuous approximation of
min Aq,P AcP>
— the adjacency matrix of the candidate common
5
摘要:

MaximumCommonSubgraphGuidedGraphRetrieval:LateandEarlyInteractionNetworksIndradyumnaRoy,SoumenChakrabarti,andAbirDeIITBombay,{indraroy15,soumen,abir}@cse.iitb.ac.inAbstractThegraphretrievalproblemistosearchinalargecorpusofgraphsforonesthataremostsimilartoaquerygraph.Acommonconsiderationforscoringsim...

展开>> 收起<<
Maximum Common Subgraph Guided Graph Retrieval Late and Early Interaction Networks Indradyumna Roy Soumen Chakrabarti and Abir De.pdf

共25页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:25 页 大小:1.54MB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 25
客服
关注