
prising when each node only updates its representation ac-
cording to its neighbours. Yet, beyond these observed edges,
there is a dark world that could provide diverse and useful
information to the representation updates and help to over-
come the over-smoothing problem at the same time, as illus-
trated in Fig. 1. The reasoning is this: two nodes without
edge between them should have different representations,
so we can understand this as a negative link. However, in
contrast to the commonly used positive links, these negative
links are rarely used in the existing various GCNs.
Selecting appropriate negative samples in a graph is not
trivial. To our knowledge, only three studies have explored
procedures to this end. The first is Kim and Oh (Kim and
Oh 2021), who propose the natural and easy approach of
uniformly randomly selecting some negative samples from
non-neighbouring nodes. However, as we all know, a large
graph is normally has the small-world property (Watts and
Strogatz 1998) which means the graph will tend to contain
some clusters. Moreover, nodes in the same cluster will tend
to have similar representations while nodes within different
clusters will tend to have different representations, as illus-
trated in Fig. 1. Hence, under uniform random selection, the
large clusters will overwhelm the smaller ones, and the final
converged node representations will be short on information
contained in the small clusters. Of the other two methods,
one is based on Monte Carlo chains (Yang et al. 2020) and
the other on personalised PageRank (Ying et al. 2018), and
neither covers diverse information as well. Further, the se-
lected negative samples should not be redundant; each of
them should not be overlapping and hold distinct informa-
tion. Hence, as a definition, a good negative sample should
contribute negative information to the given node contrast
to its positive samples and include as much information as
possible to reflect the variety of the dark world.
In this paper, we propose a graph convolutional neu-
ral network boosted by diverse negative samples selected
through a determinant point process (DPP). DPP is special
point process for defining a probability distribution on a set
where more diverse subset has a higher probability (Hough
et al. 2009). Our idea is to find diverse negative samples
for a given node by firstly defining a DPP-based distribu-
tion on diverse subsets of all non-neighbouring nodes of this
node, and then outputting a diverse subset from this distri-
bution. The number of subsets is limited because the sam-
ples are mutually exclusive. However, when applying this
algorithm to large graphs, the computational cost can be as
high as O(N3), where Nis the number of non-neighbouring
nodes. Therefore, we propose a method based on a depth-
first search (DFS) that collects diverse negative samples se-
quentially along the search path for a node. The motivation
is that a DFS path is able to go through all different clus-
ters in the graph and, therefore, the local samples collected
will form a good, diverse approximation of all the infor-
mation in the dark world. Further, the computational cost
is approximately O(pa ·deg3)where pa Nis the path
length (normally smaller than the diameter of the graph) and
deg Nis the average degree of the graph. As an ex-
ample, consider a Cora graph (Sen et al. 2008) with 3,327
Symbol Meaning
Ga graph
Gnthe node set of graph G
Ya ground set
Ya node subset
kthe number of negative samples of a given
node in a graph
N(i)neighbours (positive samples) of node i
N(i)negative samples of node i
x(l)
ithe representation of node iat layer l
deg(i)the degree of node i
Lthe L-ensemble of DPP
λthe eigenvalues of L
e|Y|
kthe kth elementary symmetric polynomial on
eigenvalues λ1, λ2, . . . , λ|Y| of L
vthe eigenvectors of the L-ensemble
Vthe set v
Table 1: Key notations
nodes, pa = 19, and deg = 3.9. Thus, O(N3)=3.6e10
1,127 = O(pa ·deg3).
In evaluating these ideas, we conducted empirical experi-
ments on different tasks, including node classification and
over-smoothing tests. The results show that our approach
produces superior results.
Thus, the contributions of this study include:
• A new negative sampling method based on a DPP that
selects diverse negative samples to better represent the
dark information;
• A DFS-based negative sampling method that sequentially
collects locally diverse negative samples along the DFS
path to greatly improve computational efficiency;
• We are the first to fuse negative samples into the graph
convolution, yielding a new GCN boosted by diverse
negative samples (D2GCN) to improve the quality of
node representations and alleviate the over-smoothing
problem.
Preliminaries
This section briefly introduces the basic concepts of graph
convolutional neural networks, message passing, and deter-
minantal point process.
Graph Convolutional Neural Networks (GCNs)
GCNs introduce traditional convolution from classical neu-
ral networks to graph neural networks. Like a message-
passing mechanism, the representation of a node is updated
using its neighbours’ representations through
xi(l)=X
j∈Ni∪{i}
1
pdeg(i)·pdeg(j)Θ(l)·x(l−1)
j(1)
where x(l)
iare the representation of node iat layer l,N(i)
is the neighbours of node i(i.e., positive samples), deg(i)is