examples. Nevertheless, in the real world, graphs often contain multiple types of objects and multiple
types of relationships between them, which are called heterogeneous graphs, or heterogeneous
information networks (HINs) [
27
]. Due to the challenges caused by the heterogeneity, existing SSL
methods on homogeneous graphs cannot be straightforwardly applied to HINs. Very recently, several
works have made some efforts to conduct SSL on HINs [
33
,
23
,
18
,
15
,
13
,
14
,
37
,
10
]. In comparison
with SSL methods on homogeneous graphs, the key difference is that they usually have different
example generation strategies, so as to capture the heterogeneous structural properties in HINs.
The strategies of generating high-quality positive/negative examples are critical to the performance of
existing methods [
34
,
4
,
41
,
36
]. Unfortunately, whether for homogeneous graphs or heterogeneous
graphs, the example generation strategies are dataset-specific, and may not be applicable to all
scenarios. This is because real-world graphs are abstractions of things from various domains, e.g.,
social networks, citation networks, etc. They usually have significantly different structural properties
and semantics. Previous works have systematically studied this and found that different strategies
are good at capturing different structural semantics. For example, study [
41
] observed that edge
perturbation benefits social networks but hurts some biochemical networks, and study [
36
] observed
that negative examples benefit sparser graphs. Consequently, in practice, the example generation
strategies have to be empirically constructed and investigated through either trial-and-error or rules
of thumb. This significantly limits the practicality and general applicability of existing methods.
In this work, we focus on HINs which are more challenging, and propose a novel SSL approach,
named SHGP. Different from existing methods, SHGP requires neither positive examples nor negative
examples, thus circumventing the above issues. Specifically, SHGP adopts any HGNN model that is
based on attention-aggregation scheme as the base encoder, which is termed as the module Att-HGNN.
The attention coefficients in Att-HGNN are particularly used to combine with the structural clustering
method LPA (label propagation algorithm) [
21
], as the module Att-LPA. Through performing
structural clustering on HINs, Att-LPA is able to produce clustering labels, which are treated as
pseudo-labels. In turn, these pseudo-labels serve as guidance signals to help Att-HGNN learn better
embeddings as well as better attention coefficients. Thus, the two modules are able to exploit and
enhance each other, finally leading the model to learn discriminative and informative embeddings. In
summary, we have three main contributions as follows:
•
We propose a novel SSL method on HINs, SHGP. It innovatively consists of the Att-LPA module
and the Att-HGNN module. The two modules can effectively enhance each other, facilitating the
model to learn effective embeddings.
•
To the best of our knowledge, SHGP is the first attempt to perform SSL on HINs without any
positive or negative examples. Therefore, it can directly avoid the laborious investigation of
example generation strategies, improving the model’s generalization ability and flexibility.
•
We transfer the object embeddings learned by SHGP to various downstream tasks. The experimental
results show that SHGP can outperform state-of-the-art baselines, even including some semi-
supervised baselines, demonstrating its superior effectiveness.
2 Related work
SSL on HINs.
There are several existing methods [
33
,
23
,
18
,
15
,
37
,
13
,
14
,
10
] that conduct
SSL on HINs. Determined by their contrastive loss functions, all these methods require high-quality
positive and negative examples to effectively learn embeddings. Thus, their effectiveness and
performance hinge on the specific strategies of generating positive examples and negative examples,
which limits their flexibility and generalization ability.
SSL on homogeneous graphs.
Existing SSL methods on homogeneous graphs [
29
,
7
,
41
,
25
,
20
,
19
,
43
,
36
,
35
,
30
] also need to generate sufficient positive and negative examples to effectively
perform graph contrastive learning. They only handle homogeneous graphs and cannot be easily
applied to HINs. In this work, we seek to perform SSL on HINs without any positive examples or
negative examples.
GNN+LPA methods.
There exist several methods [
31
,
1
,
24
] that combine LPA [
21
] with GNNs.
However, they are all supervised learning methods, and only deal with homogeneous graphs. In this
work, we study SSL on HINs.
2