Cross-Domain Aspect Extraction using Transformers Augmented with Knowledge Graphs

2025-05-06 0 0 2.27MB 11 页 10玖币
侵权投诉
Cross-Domain Aspect Extraction using Transformers
Augmented with Knowledge Graphs
Phillip Howard
phillip.r.howard@intel.com
Intel Labs
Chandler, Arizona, USA
Arden Ma
Intel Labs
Santa Clara, California, USA
Vasudev Lal
Intel Labs
Hillsboro, Oregon, USA
Ana Paula Simoes
Intel Labs
Santa Clara, California, USA
Daniel Korat
Intel Labs
Petah Tikva, Israel
Oren Pereg
Intel Labs
Petah Tikva, Israel
Moshe Wasserblat
Intel Labs
Petah Tikva, Israel
Gadi Singer
Intel Labs
Santa Clara, California, USA
ABSTRACT
The extraction of aspect terms is a critical step in ne-grained senti-
ment analysis of text. Existing approaches for this task have yielded
impressive results when the training and testing data are from the
same domain. However, these methods show a drastic decrease
in performance when applied to cross-domain settings where the
domain of the testing data diers from that of the training data.
To address this lack of extensibility and robustness, we propose
a novel approach for automatically constructing domain-specic
knowledge graphs that contain information relevant to the identi-
cation of aspect terms. We introduce a methodology for injecting
information from these knowledge graphs into Transformer models,
including two alternative mechanisms for knowledge insertion: via
query enrichment and via manipulation of attention patterns. We
demonstrate state-of-the-art performance on benchmark datasets
for cross-domain aspect term extraction using our approach and
investigate how the amount of external knowledge available to the
Transformer impacts model performance.
CCS CONCEPTS
Computing methodologies Natural language processing
;
Supervised learning;Neural networks.
KEYWORDS
Knowledge graphs, transformers, aspect extraction, knowledge
injection, aspect-based sentiment analysis
ACM Reference Format:
Phillip Howard, Arden Ma, Vasudev Lal, Ana Paula Simoes, Daniel Korat,
Oren Pereg, Moshe Wasserblat, and Gadi Singer. 2022. Cross-Domain Aspect
Extraction using Transformers Augmented with Knowledge Graphs. In
Proceedings of the 31st ACM International Conference on Information and
Knowledge Management (CIKM ’22), October 17–21, 2022, Atlanta, GA, USA.
ACM, New York, NY, USA, 11 pages. https://doi.org/10.1145/3511808.3557275
CIKM ’22, October 17–21, 2022, Atlanta, GA, USA.
©2022 Association for Computing Machinery.
This is the author’s version of the work. It is posted here for your personal use. Not
for redistribution. The denitive Version of Record was published in Proceedings of the
31st ACM International Conference on Information and Knowledge Management (CIKM
’22), October 17–21, 2022, Atlanta, GA, USA, https://doi.org/10.1145/3511808.3557275.
1 INTRODUCTION
Sentiment analysis is a fundamental task in NLP which has been
widely studied in a variety of dierent settings. While the majority
of existing research has focused on sentence- and document-level
sentiment extraction, there is considerable interest in ne-grained
sentiment analysis that seeks to understand sentiment at a word
or phrase level. For example, in the sentence “The appetizer was
delicious”, it may be of interest to understand the author’s sentiment
regarding a specic aspect (appetizer) in the form of an expressed
opinion (delicious). This task is commonly referred to as Aspect-
Based Sentiment Analysis (ABSA).
ABSA is often formulated as a sequence tagging problem, where
the input to a model is a sequence of tokens
𝑋={𝑥1, 𝑥2, ..., 𝑥𝑛}
. For
each token
𝑥𝑖𝑋
, the objective is to correctly predict a label
𝑦𝑖
{𝐵𝐴, 𝐼𝐴, 𝐵𝑂, 𝐼𝑂, 𝑁 }
. The labels
𝐵𝐴
and
𝐼𝐴
denote the beginning
and inside tokens of aspect phrases while
𝐵𝑂
and
𝐼𝑂
indicate the
beginning and inside tokens of opinions. The class
𝑁
denotes tokens
that are neither aspects nor opinions. The focus of our work is
improving the identication of aspects within the context of the
ABSA sequence tagging problem.
Existing work on aspect term extraction has achieved promising
results in single-domain settings where both the training and test-
ing data arise from the same distribution. However, such methods
typically perform much worse when the training (or source) domain
diers from the testing (or target) domain. This cross-domain set-
ting for aspect extraction poses a greater challenge because there
is often very little overlap between aspects used in dierent do-
mains. For example, aspects prevalent in consumer reviews about
laptops (e.g., processor,hardware) are unrelated to common aspects
in restaurant reviews (e.g., food,appetizer).
To address this challenging task, we introduce a novel method for
enhancing pretrained Transformer models [
36
] with information
from domain-specic knowledge graphs that are automatically con-
structed from semantic knowledge sources. We show how injecting
information from these knowledge graphs into Transformer mod-
els improves domain transfer by providing contextual information
about potential aspects in the target domain.
This work consists of four primary contributions. First, we in-
troduce an approach for constructing domain-specic knowledge
arXiv:2210.10144v1 [cs.CL] 18 Oct 2022
CIKM ’22, October 17–21, 2022, Atlanta, GA, USA. Phillip Howard et al.
graphs from unlabeled text using an existing large-scale common-
sense knowledge graph (ConceptNet, Speer et al
. [34]
) and a Trans-
former -based generative knowledge source ne-tuned for the task
of predicting relations within a domain (COMET, Bosselut et al
.
[2]
). Second, we present a methodology for determining when it is
benecial to inject external knowledge into a Transformer model
for aspect extraction through the application of syntactic informa-
tion. Third, we explore two alternative approaches for injecting
knowledge into language models: via insertion of a pivot token for
query enrichment and through a disentangled attention mechanism.
Experimental results demonstrate how this methodology achieves
state-of-the-art performance on cross-domain aspect extraction us-
ing benchmark datasets from three dierent domains of consumer
reviews: restaurants, laptops and digital devices [
29
,
30
,
39
]. Finally,
we contribute an improved version of the benchmark digital devices
dataset to facilitate future work on aspect-based sentiment analysis.
2 RELATED WORK
2.1 Knowledge Graphs
A variety of knowledge graphs have been created in recent years to
store large quantities of factual and commonsense knowledge about
the world. ConceptNet is a widely-used and freely-available source
of commonsense knowledge that was constructed from both expert
sources and crowdsourcing. A variety of solutions that leverage
ConceptNet have been developed for NLP tasks in recent years,
including multi-hop generative QA [
1
], story completion [
5
], and
machine reading comprehension [41].
The main challenge in using ConceptNet is the selection and qual-
ity assessment of paths queried from the graph to produce relevant
subgraphs for downstream use. A variety of heuristic approaches
have been proposed for this task, including setting a maximum path
length [
12
], limiting the length of the path based on the number of
returned nodes [
3
], and utilizing measures of similarity calculated
over embeddings [
10
]. Auxiliary models that assess the naturalness
of paths have also been proposed for predicting path quality [42].
2.2 Domain Adaptation
Developing models that can generalize well to unseen and out-of-
domain examples is a fundamental challenge in robust solution
design. A key objective of many previous domain adaptation ap-
proaches has been to learn domain-invariant latent features that can
be used by a model for its nal predictions. Prior to the widespread
usage of Deep Neural Networks (DNNs) for domain adaptation
tasks, various methods were proposed that attempted to learn the
latent features by constructing a low-dimensional space where the
distance between features from the source and target domain is
minimized [23, 24].
With the recent introduction of DNNs for domain adaptation
tasks, there has been a shift towards monolithic approaches in
which the domain-invariant feature transformation is learned si-
multaneously with the task-specic classier as part of the training
process. These methods incorporate mechanisms such as a Gradient
Reversal Layer [
9
] and explicit partitioning of a DNN [
4
] to implic-
itly learn both domain-invariant and domain-specic features in
an end-to-end manner.
Such approaches have been applied to various problems in NLP,
including cross-domain sentiment analysis. Du et al
. [8]
and Gong
et al
. [11]
introduce additional training tasks for BERT [
6
] in an
eort to learn both domain-invariant and domain-specic feature
representations for sentiment analysis tasks. The utilization of
syntactic information has also been shown to be an eective way
of introducing domain-invariant knowledge, which can help bridge
the gap between domains [7, 16, 26, 37].
2.3 Knowledge Informed Architectures
An alternative paradigm for developing robust solutions is to aug-
ment models using external knowledge queried from a large non-
parametric memory store, commonly known as a Knowledge Base
(KB) or Knowledge Graph (KG). We refer to this class of models as
knowledge informed architectures. Much of the existing work on
knowledge informed architectures augments BERT [
6
] with exter-
nal knowledge from sources such as WordNet [
21
] and ConceptNet.
These approaches have led to a myriad of new BERT-like models
such as KnowBERT [
27
], K-BERT [
18
], and E-BERT [
28
] which
attempt to curate and inject knowledge from KBs in various ways.
How knowledge is acquired and used in these models is highly task
dependent.
Knowledge informed architectures have been shown to be eec-
tive at a variety of tasks, achieving superior performance in recent
challenges such as Ecient Question-Answering [
22
] and Open-
domain Question-Answering [
17
,
33
] where external knowledge is
used to enrich input queries with additional context that supple-
ments the implicit knowledge stored in the model’s parameters. To
the best of our knowledge, no previous knowledge informed archi-
tectures have been developed for cross-domain aspect extraction.
3 METHODOLOGY
Our approach consists of a three-step process: (1) preparing a
domain-specic KG for each of the target domains, (2) determin-
ing when the model can benet from external information, and (3)
injecting knowledge retrieved from the KG when applicable. We
explore two alternative methods for the nal knowledge injection
step of this process: insertion of a pivot token into the original
query, and knowledge injection into hidden state representations
via a disentangled attention mechanism. We provide an illustration
of our approach in Figure 1 and detail the methodology for each
step of the process in the subsequent sections.
3.1 Domain-Specic KG Preparation
In order to ground the model in concepts related to the target do-
main, we create a domain-specic KG by rst querying a subgraph
from ConceptNet using a list of seed terms that are related to the do-
main. For each target domain
𝑑
, the seed term list
𝑆𝑑={𝑠1, 𝑠2, ..., 𝑠𝑘}
is generated by applying TF-IDF to all of the unlabeled text in do-
main
𝑑
and identifying the top-
𝑘
scoring noun phrases. We use
𝑘=
7seed terms in this work but note that the number of seed
terms can be adjusted based on the desired size of the KG.
For each seed term
𝑠𝑆𝑑
, we query ConceptNet for all English-
language nodes connected by an edge to
𝑠
and add them to the
domain-specic subgraph along with the seed term
𝑠
. The subgraph
is further expanded by iteratively querying additional nodes that
Cross-Domain Aspect Extraction using Transformers Augmented with Knowledge Graphs CIKM ’22, October 17–21, 2022, Atlanta, GA, USA.
Figure 1: Illustration of our pivot token knowledge injection approach for aspect extraction
are connected by an edge to a node already present in the subgraph,
up to a maximum distance of
edges from
𝑠
. In our experiments,
we utilized a maximum edge distance of
=
2for eciency and
based on the observation that querying beyond two edges from
a given node in ConceptNet does not signicantly increase the
identication of domain-relevant concepts.
To increase the relevancy of the queried subgraph to the target
domain, we prune nodes on paths in the graph that have a low relat-
edness score to the seed term
𝑠
from which they originated. While
our approach is compatible with various embedding methods, we
utilize pre-computed ConceptNet Numberbatch embeddings [
34
]
that combine information from the graph structure of ConceptNet
with other word embedding techniques such as Word2vec [
20
] and
GloVe [
25
]. Let
e𝑖
denote the embedding vector of a given node
𝑖
in
the subgraph. The relatedness score
𝑟𝑖,𝑗
for a pair of nodes
𝑖
and
𝑗
in the graph is calculated as the cosine similarity between their
embedding vectors:
𝑟𝑖,𝑗 =
e𝑖·e𝑗
e𝑖e𝑗(1)
For a given path
𝑃={𝑛𝑠, 𝑛1, ..., 𝑛}
connecting nodes
𝑛1, ...𝑛
to the node
𝑛𝑠
corresponding to seed term
𝑠
, we calculate its mini-
mum path relatedness score, denoted
𝑃min
, as the minimum of the
pairwise relatedness scores between each node in the path and the
seed term:
𝑃min =min
𝑖{1,..,}𝑟𝑠,𝑖 (2)
Nodes terminating a path for which
𝑃min <
0
.
2are discarded
from the subgraph, where this threshold was chosen heuristically
and can be tuned based on the application. This path ltering crite-
ria helps disambiguate edges in ConceptNet for words that have
multiple dierent meanings. Higher values of
𝑃min
reduce the num-
ber of unrelated nodes in the subgraph at the cost of decreased
coverage.
To further expand the coverage of the domain-specic KGs, we
employ a generative commonsense model called COMET [
2
] to
automatically augment the KG with additional terms that are related
to those already present in the graph. Given a head and relation
𝑟
, COMET is trained to predict the tail
𝑡
completing an
(ℎ, 𝑟, 𝑡)
triple. We chose to use COMET for augmenting our KGs due to
the incompleteness of ConceptNet, which can vary signicantly in
coverage across domains as a result of its reliance on crowdsourcing
for knowledge acquisition.
The original implementation of COMET consisted of GPT [
31
]
ne-tuned to complete
(ℎ, 𝑟, 𝑡)
triples sampled from either Concept-
Net or ATOMIC [
32
]. Motivated by the observation that the original
COMET lacks coverage for certain concepts in our target domains,
we improve the relevancy of its predictions by ne-tuning COMET
on ConceptNet triples that are selectively chosen. For each target
domain, we identify all nouns and noun phrases occurring in its
text using spaCy [
14
] and then query ConceptNet for triples that
contain one of these nouns. A domain-specic instance of COMET
is then trained by ne-tuning GPT on the task of
(ℎ, 𝑟, 𝑡)
completion
using only the sampled set of domain-relevant triples. For each seed
term
𝑠𝑆𝑑
, we use our domain-tuned implementation of COMET
to generate 100 completions for the triple
(𝑠, RelatedTo, 𝑡)
and add
them to the domain-specic KG if they are not already present.
3.2 Determining when to Inject Knowledge
To determine when to inject knowledge, we identify tokens that
are potential aspects by rst using spaCy to extract POS and depen-
dency relations. Motivated by the observation that aspects tend to
be either individual nouns or noun phrases, we extract the candi-
date set of tokens by identifying noun forms in the input sequence.
摘要:

Cross-DomainAspectExtractionusingTransformersAugmentedwithKnowledgeGraphsPhillipHowardphillip.r.howard@intel.comIntelLabsChandler,Arizona,USAArdenMaIntelLabsSantaClara,California,USAVasudevLalIntelLabsHillsboro,Oregon,USAAnaPaulaSimoesIntelLabsSantaClara,California,USADanielKoratIntelLabsPetahTikva,...

展开>> 收起<<
Cross-Domain Aspect Extraction using Transformers Augmented with Knowledge Graphs.pdf

共11页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:11 页 大小:2.27MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 11
客服
关注