Cross-Domain Aspect Extraction using Transformers Augmented with Knowledge Graphs

2025-05-06 0 0 2.27MB 11 页 10玖币

侵权投诉

Cross-Domain Aspect Extraction using Transformers

Augmented with Knowledge Graphs

Phillip Howard

phillip.r.howard@intel.com

Intel Labs

Chandler, Arizona, USA

Arden Ma

Intel Labs

Santa Clara, California, USA

Vasudev Lal

Intel Labs

Hillsboro, Oregon, USA

Ana Paula Simoes

Intel Labs

Santa Clara, California, USA

Daniel Korat

Intel Labs

Petah Tikva, Israel

Oren Pereg

Intel Labs

Petah Tikva, Israel

Moshe Wasserblat

Intel Labs

Petah Tikva, Israel

Gadi Singer

Intel Labs

Santa Clara, California, USA

ABSTRACT

The extraction of aspect terms is a critical step in ne-grained senti-

ment analysis of text. Existing approaches for this task have yielded

impressive results when the training and testing data are from the

same domain. However, these methods show a drastic decrease

in performance when applied to cross-domain settings where the

domain of the testing data diers from that of the training data.

To address this lack of extensibility and robustness, we propose

a novel approach for automatically constructing domain-specic

knowledge graphs that contain information relevant to the identi-

cation of aspect terms. We introduce a methodology for injecting

information from these knowledge graphs into Transformer models,

including two alternative mechanisms for knowledge insertion: via

query enrichment and via manipulation of attention patterns. We

demonstrate state-of-the-art performance on benchmark datasets

for cross-domain aspect term extraction using our approach and

investigate how the amount of external knowledge available to the

Transformer impacts model performance.

CCS CONCEPTS

•Computing methodologies →Natural language processing

;

Supervised learning;Neural networks.

KEYWORDS

Knowledge graphs, transformers, aspect extraction, knowledge

injection, aspect-based sentiment analysis

ACM Reference Format:

Phillip Howard, Arden Ma, Vasudev Lal, Ana Paula Simoes, Daniel Korat,

Oren Pereg, Moshe Wasserblat, and Gadi Singer. 2022. Cross-Domain Aspect

Extraction using Transformers Augmented with Knowledge Graphs. In

Proceedings of the 31st ACM International Conference on Information and

Knowledge Management (CIKM ’22), October 17–21, 2022, Atlanta, GA, USA.

ACM, New York, NY, USA, 11 pages. https://doi.org/10.1145/3511808.3557275

CIKM ’22, October 17–21, 2022, Atlanta, GA, USA.

This is the author’s version of the work. It is posted here for your personal use. Not

for redistribution. The denitive Version of Record was published in Proceedings of the

31st ACM International Conference on Information and Knowledge Management (CIKM

’22), October 17–21, 2022, Atlanta, GA, USA, https://doi.org/10.1145/3511808.3557275.

1 INTRODUCTION

Sentiment analysis is a fundamental task in NLP which has been

widely studied in a variety of dierent settings. While the majority

of existing research has focused on sentence- and document-level

sentiment extraction, there is considerable interest in ne-grained

sentiment analysis that seeks to understand sentiment at a word

or phrase level. For example, in the sentence “The appetizer was

delicious”, it may be of interest to understand the author’s sentiment

regarding a specic aspect (appetizer) in the form of an expressed

opinion (delicious). This task is commonly referred to as Aspect-

Based Sentiment Analysis (ABSA).

ABSA is often formulated as a sequence tagging problem, where

the input to a model is a sequence of tokens

𝑋={𝑥1, 𝑥2, ..., 𝑥𝑛}

. For

each token

𝑥𝑖∈𝑋

, the objective is to correctly predict a label

𝑦𝑖∈

{𝐵𝐴, 𝐼𝐴, 𝐵𝑂, 𝐼𝑂, 𝑁 }

. The labels

𝐵𝐴

and

𝐼𝐴

denote the beginning

and inside tokens of aspect phrases while

𝐵𝑂

and

𝐼𝑂

indicate the

beginning and inside tokens of opinions. The class

𝑁

denotes tokens

that are neither aspects nor opinions. The focus of our work is

improving the identication of aspects within the context of the

ABSA sequence tagging problem.

Existing work on aspect term extraction has achieved promising

results in single-domain settings where both the training and test-

ing data arise from the same distribution. However, such methods

typically perform much worse when the training (or source) domain

diers from the testing (or target) domain. This cross-domain set-

ting for aspect extraction poses a greater challenge because there

is often very little overlap between aspects used in dierent do-

mains. For example, aspects prevalent in consumer reviews about

laptops (e.g., processor,hardware) are unrelated to common aspects

in restaurant reviews (e.g., food,appetizer).

To address this challenging task, we introduce a novel method for

enhancing pretrained Transformer models [

] with information

from domain-specic knowledge graphs that are automatically con-

structed from semantic knowledge sources. We show how injecting

information from these knowledge graphs into Transformer mod-

els improves domain transfer by providing contextual information

about potential aspects in the target domain.

This work consists of four primary contributions. First, we in-

troduce an approach for constructing domain-specic knowledge

arXiv:2210.10144v1 [cs.CL] 18 Oct 2022

CIKM ’22, October 17–21, 2022, Atlanta, GA, USA. Phillip Howard et al.

graphs from unlabeled text using an existing large-scale common-

sense knowledge graph (ConceptNet, Speer et al

. [34]

) and a Trans-

former -based generative knowledge source ne-tuned for the task

of predicting relations within a domain (COMET, Bosselut et al

[2]

). Second, we present a methodology for determining when it is

benecial to inject external knowledge into a Transformer model

for aspect extraction through the application of syntactic informa-

tion. Third, we explore two alternative approaches for injecting

knowledge into language models: via insertion of a pivot token for

query enrichment and through a disentangled attention mechanism.

Experimental results demonstrate how this methodology achieves

state-of-the-art performance on cross-domain aspect extraction us-

ing benchmark datasets from three dierent domains of consumer

reviews: restaurants, laptops and digital devices [

]. Finally,

we contribute an improved version of the benchmark digital devices

dataset to facilitate future work on aspect-based sentiment analysis.

2 RELATED WORK

2.1 Knowledge Graphs

A variety of knowledge graphs have been created in recent years to

store large quantities of factual and commonsense knowledge about

the world. ConceptNet is a widely-used and freely-available source

of commonsense knowledge that was constructed from both expert

sources and crowdsourcing. A variety of solutions that leverage

ConceptNet have been developed for NLP tasks in recent years,

including multi-hop generative QA [

], story completion [

], and

machine reading comprehension [41].

The main challenge in using ConceptNet is the selection and qual-

ity assessment of paths queried from the graph to produce relevant

subgraphs for downstream use. A variety of heuristic approaches

have been proposed for this task, including setting a maximum path

length [

], limiting the length of the path based on the number of

returned nodes [

], and utilizing measures of similarity calculated

over embeddings [

]. Auxiliary models that assess the naturalness

of paths have also been proposed for predicting path quality [42].

2.2 Domain Adaptation

Developing models that can generalize well to unseen and out-of-

domain examples is a fundamental challenge in robust solution

design. A key objective of many previous domain adaptation ap-

proaches has been to learn domain-invariant latent features that can

be used by a model for its nal predictions. Prior to the widespread

usage of Deep Neural Networks (DNNs) for domain adaptation

tasks, various methods were proposed that attempted to learn the

latent features by constructing a low-dimensional space where the

distance between features from the source and target domain is

minimized [23, 24].

With the recent introduction of DNNs for domain adaptation

tasks, there has been a shift towards monolithic approaches in

which the domain-invariant feature transformation is learned si-

multaneously with the task-specic classier as part of the training

process. These methods incorporate mechanisms such as a Gradient

Reversal Layer [

] and explicit partitioning of a DNN [

] to implic-

itly learn both domain-invariant and domain-specic features in

an end-to-end manner.

Such approaches have been applied to various problems in NLP,

including cross-domain sentiment analysis. Du et al

. [8]

and Gong

et al

. [11]

introduce additional training tasks for BERT [

] in an

eort to learn both domain-invariant and domain-specic feature

representations for sentiment analysis tasks. The utilization of

syntactic information has also been shown to be an eective way

of introducing domain-invariant knowledge, which can help bridge

the gap between domains [7, 16, 26, 37].

2.3 Knowledge Informed Architectures

An alternative paradigm for developing robust solutions is to aug-

ment models using external knowledge queried from a large non-

parametric memory store, commonly known as a Knowledge Base

(KB) or Knowledge Graph (KG). We refer to this class of models as

knowledge informed architectures. Much of the existing work on

knowledge informed architectures augments BERT [

] with exter-

nal knowledge from sources such as WordNet [

] and ConceptNet.

These approaches have led to a myriad of new BERT-like models

such as KnowBERT [

], K-BERT [

], and E-BERT [

] which

attempt to curate and inject knowledge from KBs in various ways.

How knowledge is acquired and used in these models is highly task

dependent.

Knowledge informed architectures have been shown to be eec-

tive at a variety of tasks, achieving superior performance in recent

challenges such as Ecient Question-Answering [

] and Open-

domain Question-Answering [

] where external knowledge is

used to enrich input queries with additional context that supple-

ments the implicit knowledge stored in the model’s parameters. To

the best of our knowledge, no previous knowledge informed archi-

tectures have been developed for cross-domain aspect extraction.

3 METHODOLOGY

Our approach consists of a three-step process: (1) preparing a

domain-specic KG for each of the target domains, (2) determin-

ing when the model can benet from external information, and (3)

injecting knowledge retrieved from the KG when applicable. We

explore two alternative methods for the nal knowledge injection

step of this process: insertion of a pivot token into the original

query, and knowledge injection into hidden state representations

via a disentangled attention mechanism. We provide an illustration

of our approach in Figure 1 and detail the methodology for each

step of the process in the subsequent sections.

3.1 Domain-Specic KG Preparation

In order to ground the model in concepts related to the target do-

main, we create a domain-specic KG by rst querying a subgraph

from ConceptNet using a list of seed terms that are related to the do-

main. For each target domain

𝑑

, the seed term list

𝑆𝑑={𝑠1, 𝑠2, ..., 𝑠𝑘}

is generated by applying TF-IDF to all of the unlabeled text in do-

main

𝑑

and identifying the top-

𝑘

scoring noun phrases. We use

𝑘=

7seed terms in this work but note that the number of seed

terms can be adjusted based on the desired size of the KG.

For each seed term

𝑠∈𝑆𝑑

, we query ConceptNet for all English-

language nodes connected by an edge to

𝑠

and add them to the

domain-specic subgraph along with the seed term

𝑠

. The subgraph

is further expanded by iteratively querying additional nodes that

Cross-Domain Aspect Extraction using Transformers Augmented with Knowledge Graphs CIKM ’22, October 17–21, 2022, Atlanta, GA, USA.

Figure 1: Illustration of our pivot token knowledge injection approach for aspect extraction

are connected by an edge to a node already present in the subgraph,

up to a maximum distance of

ℎ

edges from

𝑠

. In our experiments,

we utilized a maximum edge distance of

ℎ=

2for eciency and

based on the observation that querying beyond two edges from

a given node in ConceptNet does not signicantly increase the

identication of domain-relevant concepts.

To increase the relevancy of the queried subgraph to the target

domain, we prune nodes on paths in the graph that have a low relat-

edness score to the seed term

𝑠

from which they originated. While

our approach is compatible with various embedding methods, we

utilize pre-computed ConceptNet Numberbatch embeddings [

]

that combine information from the graph structure of ConceptNet

with other word embedding techniques such as Word2vec [

] and

GloVe [

]. Let

e𝑖

denote the embedding vector of a given node

𝑖

the subgraph. The relatedness score

𝑟𝑖,𝑗

for a pair of nodes

𝑖

and

𝑗

in the graph is calculated as the cosine similarity between their

embedding vectors:

𝑟𝑖,𝑗 =

e𝑖·e𝑗

∥e𝑖∥∥e𝑗∥(1)

For a given path

𝑃={𝑛𝑠, 𝑛1, ..., 𝑛ℎ}

connecting nodes

𝑛1, ...𝑛ℎ

to the node

𝑛𝑠

corresponding to seed term

𝑠

, we calculate its mini-

mum path relatedness score, denoted

𝑃min

, as the minimum of the

pairwise relatedness scores between each node in the path and the

seed term:

𝑃min =min

∀𝑖∈{1,..,ℎ }𝑟𝑠,𝑖 (2)

Nodes terminating a path for which

𝑃min <

2are discarded

from the subgraph, where this threshold was chosen heuristically

and can be tuned based on the application. This path ltering crite-

ria helps disambiguate edges in ConceptNet for words that have

multiple dierent meanings. Higher values of

𝑃min

reduce the num-

ber of unrelated nodes in the subgraph at the cost of decreased

coverage.

To further expand the coverage of the domain-specic KGs, we

employ a generative commonsense model called COMET [

] to

automatically augment the KG with additional terms that are related

to those already present in the graph. Given a head ℎand relation

𝑟

, COMET is trained to predict the tail

𝑡

completing an

(ℎ, 𝑟, 𝑡)

triple. We chose to use COMET for augmenting our KGs due to

the incompleteness of ConceptNet, which can vary signicantly in

coverage across domains as a result of its reliance on crowdsourcing

for knowledge acquisition.

The original implementation of COMET consisted of GPT [

]

ne-tuned to complete

(ℎ, 𝑟, 𝑡)

triples sampled from either Concept-

Net or ATOMIC [

]. Motivated by the observation that the original

COMET lacks coverage for certain concepts in our target domains,

we improve the relevancy of its predictions by ne-tuning COMET

on ConceptNet triples that are selectively chosen. For each target

domain, we identify all nouns and noun phrases occurring in its

text using spaCy [

] and then query ConceptNet for triples that

contain one of these nouns. A domain-specic instance of COMET

is then trained by ne-tuning GPT on the task of

(ℎ, 𝑟, 𝑡)

completion

using only the sampled set of domain-relevant triples. For each seed

term

𝑠∈𝑆𝑑

, we use our domain-tuned implementation of COMET

to generate 100 completions for the triple

(𝑠, RelatedTo, 𝑡)

and add

them to the domain-specic KG if they are not already present.

3.2 Determining when to Inject Knowledge

To determine when to inject knowledge, we identify tokens that

are potential aspects by rst using spaCy to extract POS and depen-

dency relations. Motivated by the observation that aspects tend to

be either individual nouns or noun phrases, we extract the candi-

date set of tokens by identifying noun forms in the input sequence.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Cross-DomainAspectExtractionusingTransformersAugmentedwithKnowledgeGraphsPhillipHowardphillip.r.howard@intel.comIntelLabsChandler,Arizona,USAArdenMaIntelLabsSantaClara,California,USAVasudevLalIntelLabsHillsboro,Oregon,USAAnaPaulaSimoesIntelLabsSantaClara,California,USADanielKoratIntelLabsPetahTikva,...

展开>> 收起<<

Cross-Domain Aspect Extraction using Transformers Augmented with Knowledge Graphs.pdf

共11页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Cross-Domain Aspect Extraction using Transformers Augmented with Knowledge Graphs

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: