
in Figure 1 with two connected components, because all nodes have the same number of degrees,
they will have exactly the same rooted subtrees. Nevertheless, the nodes in the left component are
clearly different from the nodes in the right component, because the left component is a 3-cycle and
the right one is a 4-cycle. Such cases can not be discriminated by GNNs or the 1-WL test. We refer
to this type of GNN as plain GNN or basic GNN.
Figure 1: An example where
plain GNN fails to distinguish
nodes in the graph.
On the other hand, even when a carefully-designed GNN can dif-
ferentiate all node substructures in a graph, Zhang et al.
[54]
show
that the learned node representation still cannot effectively perform
structural learning tasks involving multiple nodes, including link
predictions and subgraph predictions, because the nodes are not able
to learn the relative structural information. We will discuss this in
more detail in Section 3.1.
These limitations motivate recent research that pushes the limit of
GNN’s expressiveness. One branch is to use higher-order GNNs
that mimic higher-dimensional Weisfeiler-Lehman tests [
23
,
26
]. They extend node-wise message
passing to node-tuple-wise and obtain more expressive representations. Another branch is to use
labeling methods. By extracting subgraphs around the nodes of interest, they design a set of subgraph-
customized labels as additional node features. The labels can be used to distinguish between the
nodes of interest and other nodes. The resulting node representations can encode substructures that
plain GNNs cannot, like cycles.
However, the major drawback of these methods is that compared to plain GNNs, their efficiency is
significantly compromised. Higher-order GNNs update embeddings over tuples and incur at least
cubic complexity [
52
,
23
]. Labeling tricks usually require subgraph extraction on a large graph and
running GNNs on mini-batches of subgraphs. Because the subgraphs overlap with each other, the
aggregated size of the subgraphs is possibly hundreds of times the large graph. Nowadays GNNs are
massive, the increased graph computation cost makes labeling methods inapplicable to real-world
problems with millions of nodes.
We observe that a great portion of the labeling methods relies on geodesics, namely the shortest
paths, between nodes. For node representation, Distance-Encoding (DE) [
25
] labels the target nodes’
surrounding nodes with their shortest path distance to the target nodes and achieves higher than 1-WL
test expressiveness. Current state-of-the-art methods for link prediction, such as SEAL [
50
], leverage
the labeling tricks in order to capture the shortest path distance along with the number of shortest
paths between the two nodes of the link. Inspired by this, we propose our graph representation
learning framework, Geodesic GNN (GDGNN), where we run GNN
only once
and inject higher
expressiveness into the model using shortest paths information by a geodesic pooling layer. Our
framework can obtain higher than 1-WL power without multiple runs of plain GNNs, which largely
improves the efficiency of more expressive methods.
Specifically, our GDGNN model learns a function that maps the learning target to a geodesic-
augmented target representation (the target can be a node, edge, graph, etc). The model consists
of two units, a GNN unit, and a geodesic pooling unit. The two units are connected to perform
end-to-end training. We first apply the GNN to the graph to obtain embeddings of all nodes. Then, for
each target, we find the task-specific geodesics and extract the corresponding GNN representations
of nodes on the geodesics. The task-specific geodesics have three levels. For node-level tasks, we
extract the geodesics between the target node and all of its neighbors. For edge-level tasks, we
extract the geodesics between the two target nodes of the edge. For graph-level tasks, we get the
geodesic-augmented node representation for each node as in node-level tasks and readout all node
representations as the graph representation. Finally, using the geodesic pooling unit, we combine
the geodesic representation and target representation to form the final geodesic augmented target
representation. Figure 2 summarizes the GDGNN framework.
All task-specific geodesics are formed by one or multiple
pair-wise
geodesics between two nodes.
We propose two methods to extract the pair-wise geodesic information, horizontal and vertical. The
horizontal geodesic is obtained by directly extracting one shortest path between the two nodes. The
vertical geodesic is obtained by extracting all direct neighbors of the two nodes that are on any of
their shortest paths. They focus on depth and breadth respectively. Horizontal geodesic is good at
capturing long-distance information not covered by the GNN. Vertical geodesic is provably more
powerful than plain GNN. Moreover, by incorporating the information of the subgraph induced by
2