CIKM ’22, October 17–21, 2022, Atlanta, GA, USA. Phillip Howard et al.
graphs from unlabeled text using an existing large-scale common-
sense knowledge graph (ConceptNet, Speer et al
. [34]
) and a Trans-
former -based generative knowledge source ne-tuned for the task
of predicting relations within a domain (COMET, Bosselut et al
.
[2]
). Second, we present a methodology for determining when it is
benecial to inject external knowledge into a Transformer model
for aspect extraction through the application of syntactic informa-
tion. Third, we explore two alternative approaches for injecting
knowledge into language models: via insertion of a pivot token for
query enrichment and through a disentangled attention mechanism.
Experimental results demonstrate how this methodology achieves
state-of-the-art performance on cross-domain aspect extraction us-
ing benchmark datasets from three dierent domains of consumer
reviews: restaurants, laptops and digital devices [
29
,
30
,
39
]. Finally,
we contribute an improved version of the benchmark digital devices
dataset to facilitate future work on aspect-based sentiment analysis.
2 RELATED WORK
2.1 Knowledge Graphs
A variety of knowledge graphs have been created in recent years to
store large quantities of factual and commonsense knowledge about
the world. ConceptNet is a widely-used and freely-available source
of commonsense knowledge that was constructed from both expert
sources and crowdsourcing. A variety of solutions that leverage
ConceptNet have been developed for NLP tasks in recent years,
including multi-hop generative QA [
1
], story completion [
5
], and
machine reading comprehension [41].
The main challenge in using ConceptNet is the selection and qual-
ity assessment of paths queried from the graph to produce relevant
subgraphs for downstream use. A variety of heuristic approaches
have been proposed for this task, including setting a maximum path
length [
12
], limiting the length of the path based on the number of
returned nodes [
3
], and utilizing measures of similarity calculated
over embeddings [
10
]. Auxiliary models that assess the naturalness
of paths have also been proposed for predicting path quality [42].
2.2 Domain Adaptation
Developing models that can generalize well to unseen and out-of-
domain examples is a fundamental challenge in robust solution
design. A key objective of many previous domain adaptation ap-
proaches has been to learn domain-invariant latent features that can
be used by a model for its nal predictions. Prior to the widespread
usage of Deep Neural Networks (DNNs) for domain adaptation
tasks, various methods were proposed that attempted to learn the
latent features by constructing a low-dimensional space where the
distance between features from the source and target domain is
minimized [23, 24].
With the recent introduction of DNNs for domain adaptation
tasks, there has been a shift towards monolithic approaches in
which the domain-invariant feature transformation is learned si-
multaneously with the task-specic classier as part of the training
process. These methods incorporate mechanisms such as a Gradient
Reversal Layer [
9
] and explicit partitioning of a DNN [
4
] to implic-
itly learn both domain-invariant and domain-specic features in
an end-to-end manner.
Such approaches have been applied to various problems in NLP,
including cross-domain sentiment analysis. Du et al
. [8]
and Gong
et al
. [11]
introduce additional training tasks for BERT [
6
] in an
eort to learn both domain-invariant and domain-specic feature
representations for sentiment analysis tasks. The utilization of
syntactic information has also been shown to be an eective way
of introducing domain-invariant knowledge, which can help bridge
the gap between domains [7, 16, 26, 37].
2.3 Knowledge Informed Architectures
An alternative paradigm for developing robust solutions is to aug-
ment models using external knowledge queried from a large non-
parametric memory store, commonly known as a Knowledge Base
(KB) or Knowledge Graph (KG). We refer to this class of models as
knowledge informed architectures. Much of the existing work on
knowledge informed architectures augments BERT [
6
] with exter-
nal knowledge from sources such as WordNet [
21
] and ConceptNet.
These approaches have led to a myriad of new BERT-like models
such as KnowBERT [
27
], K-BERT [
18
], and E-BERT [
28
] which
attempt to curate and inject knowledge from KBs in various ways.
How knowledge is acquired and used in these models is highly task
dependent.
Knowledge informed architectures have been shown to be eec-
tive at a variety of tasks, achieving superior performance in recent
challenges such as Ecient Question-Answering [
22
] and Open-
domain Question-Answering [
17
,
33
] where external knowledge is
used to enrich input queries with additional context that supple-
ments the implicit knowledge stored in the model’s parameters. To
the best of our knowledge, no previous knowledge informed archi-
tectures have been developed for cross-domain aspect extraction.
3 METHODOLOGY
Our approach consists of a three-step process: (1) preparing a
domain-specic KG for each of the target domains, (2) determin-
ing when the model can benet from external information, and (3)
injecting knowledge retrieved from the KG when applicable. We
explore two alternative methods for the nal knowledge injection
step of this process: insertion of a pivot token into the original
query, and knowledge injection into hidden state representations
via a disentangled attention mechanism. We provide an illustration
of our approach in Figure 1 and detail the methodology for each
step of the process in the subsequent sections.
3.1 Domain-Specic KG Preparation
In order to ground the model in concepts related to the target do-
main, we create a domain-specic KG by rst querying a subgraph
from ConceptNet using a list of seed terms that are related to the do-
main. For each target domain
𝑑
, the seed term list
𝑆𝑑={𝑠1, 𝑠2, ..., 𝑠𝑘}
is generated by applying TF-IDF to all of the unlabeled text in do-
main
𝑑
and identifying the top-
𝑘
scoring noun phrases. We use
𝑘=
7seed terms in this work but note that the number of seed
terms can be adjusted based on the desired size of the KG.
For each seed term
𝑠∈𝑆𝑑
, we query ConceptNet for all English-
language nodes connected by an edge to
𝑠
and add them to the
domain-specic subgraph along with the seed term
𝑠
. The subgraph
is further expanded by iteratively querying additional nodes that