
Finally, we replace the true relevance scoring function in Eq. (1) with the following smooth surrogate:
s(Gq, Gc) = X
r∈[R]
wrX
i,j
min(Hq(r),P(r)Hc(r))i,j (5)
Here
{wr≥0 : r∈[R]}
are trainable parameters, balancing the quality of signals over all message rounds
r
. Note
that the
R
message rounds execute on the query and corpus graphs separately. The interaction between corresponding
rounds, happens at the very end.
2.3 Design of LMCCS
In case of MCES, our key modification was to replace the adjacency matrices with node embeddings and design
a differentiable network to generate a soft-permutation matrix. In the case of MCCS, we have to also replace the
non-differentiable step of finding the size of the largest connected component of the common subgraph with a neural
surrogate, for which we design a novel gossip protocol.
Differentiable gossip protocol to compute the largest connected component.
Given any graph
G= (V, E)
with
the adjacency matrix
B
, we can find its largest connected component (LCC) by using a gossip protocol. At iteration
t= 0
, we start with assigning each node a message vector
xu(0) ∈RN
, which is the one-hot encoding of the node
u
,i.e.,
xu(0)[v] = 1
for
v=u
and
0
otherwise. In iteration
t > 0
, we update the message vectors
xu(t+ 1)
as
xu(t+ 1) = Pv∈nbr(u)∪{u}xv(t)
. Here,
nbr(u)
is the set of neighbors of
u
. Initially we start with sparse vector
xu
with exactly one non-zero entry. As we increase the number of iterations,
u
would gradually collect messages from
Algorithm 1 GOSSIP(B)
1: X(0) = I#identity
2: for t= 1,...T −1do
3: X(t+ 1) ←X(t)(B+I)
4: Return maxu∈V||X(T)[•, u]||0
the distant nodes which are reachable from
u
. This would increase the
number of non-zero entries of
xu
. For sufficiently large value of iterations
T
(diameter of
G
), we will attain
xu(T)[v]6= 0
whenever
u
and
v
lie in
the same connected component and
xu(T)[v] = 0
, otherwise. Once this
stage is reached, one can easily compute the number of nodes in the largest
connected component of
G
as
maxu||xu(T)||0
,i.e., the maximum number
of non-zero entries in a message vector. Algorithm 1 shows the gossip
protocol, with the adjacency matrix Bas input and the size of the largest connected component as output.
Exact MCCS computation using the gossip protocol.
Given the query and corpus graphs
Gq
and
Gc
with their
adjacency matrices Aqand Ac, we can rewrite Eqn. (2) in the equivalent form
max
P∈P GOSSIP(min(Aq,P AcP>)).(6)
Recall that Pis the set permutation matrices of size N×N.
Neural surrogate.
One can use a
GS
network to obtain a permutation matrix
P
, which in principle, can support
backpropagation of
min(Aq,P AcP>)
. However, as mentioned in Section 2.1 (items 1 and 2),
A•
is 0/1, and
P
often
saturates, losing gradient signal for training. In response, we will design a neural surrogate for the true MCCS size
given in Eq. (2), using three steps.
Step 1: Computation of edge embeddings.
To tackle the above challenge, we introduce a parallel back-propagation
path, in order to allow the GNNs to learn which edges are more important than others. To this end, we first use
the GNNs to compute edge embedding vectors
me(r)∈RdE
for edges
e∈E
, in addition to node embeddings
at each propagation layer
r∈[R]
. Then, we gather the edge embeddings from the final layer
R
, into the matrices
Mq(R),Mc(R)∈R|E|×dE
for query and corpus pairs. Next, we feed each separately into a neural network
Lα
with
trainable parameters
α
, which predicts the importance of edges based on each edge embedding. Thus, we obtain two
matrices
Sq∈RN×N
and
Sc∈RN×N
, which are composed of the edge scores between the corresponding node pairs.
Formally, we have:
Hq(R),Mq(R) = GNNθ(Gq),Hc(R),Mc(R) = GNNθ(Gc)(7)
Sq=Lα(Mq(R)) ,Sc=Lα(Mc(R)) (8)
In our implementation, Lαconsists of one linear layer, a ReLU layer and another linear layer.
Step 2: Continuous approximation of MCCS.
In MCES, we approximated the MCES score in Eq.
(1)
directly, using
the neural coverage function defined in Eq.
(5)
in terms of the node embeddings. Note that, here, we did not attempt
to develop any continuous approximation of
min Aq,P AcP>
— the adjacency matrix of the candidate common
5