Sentiment-Aware Word and Sentence Level Pre-training for Sentiment Analysis Shuai Fan1 Chen Lin1 Haonan Li2y Zhenghao Lin1y Jinsong Su1Hang Zhang3

2025-05-03 0 0 444.44KB 12 页 10玖币

侵权投诉

Sentiment-Aware Word and Sentence Level Pre-training for Sentiment

Analysis

Shuai Fan1, Chen Lin1∗

, Haonan Li2†

, Zhenghao Lin1†

, Jinsong Su1Hang Zhang3,

Yeyun Gong4, Jian Guo3, Nan Duan4

1School of Informatics, Xiamen University, China

2The University of Melbourne, Australia

3IDEA Research, China

4Microsoft Research Asia

Abstract

Most existing pre-trained language representa-

tion models (PLMs) are sub-optimal in sen-

timent analysis tasks, as they capture the

sentiment information from word-level while

under-considering sentence-level information.

In this paper, we propose SentiWSP, a novel

Sentiment-aware pre-trained language model

with combined Word-level and Sentence-

level Pre-training tasks. The word level

pre-training task detects replaced sentiment

words, via a generator-discriminator frame-

work, to enhance the PLM’s knowledge about

sentiment words. The sentence level pre-

training task further strengthens the discrim-

inator via a contrastive learning framework,

with similar sentences as negative samples,

to encode sentiments in a sentence. Exten-

sive experimental results show that SentiWSP

achieves new state-of-the-art performance on

various sentence-level and aspect-level sen-

timent classiﬁcation benchmarks. We have

made our code and model publicly available

at https://github.com/XMUDM/SentiWSP.

1 Introduction

Sentiment analysis plays a fundamental role in

natural language processing (NLP) and powers

a broad spectrum of important business applica-

tions such as marketing (HaCohen-Kerner,2019)

and campaign monitoring (Sandoval-Almazán and

Valle-Cruz,2020). Two typical sentiment analy-

sis tasks are sentence-level sentiment classiﬁca-

tion (Xu et al.,2019;Yin et al.,2020;Tang et al.,

2022) and aspect-level sentiment classiﬁcation (Li

et al.,2021b).

Recently, pre-trained language representation

models (PLMs) such as ELMo (Peters et al.,2018),

GPT (Radford et al.,2018,2019), BERT (Devlin

et al.,2019), RoBERTa (Liu et al.,2019) and XL-

Net (Yang et al.,2019) have brought impressive per-

*Corresponding author, chenlin@xmu.edu.cn

†Equal contribution

formance improvements in many NLP problems,

including sentiment analysis. PLMs learn a robust

encoder on a large unlabeled corpus, through care-

fully designed pre-training tasks, such as masked

token or next sentence prediction.

Despite their progress, the application of general

purposed PLMs in sentiment analysis is limited,

because they fail to distinguish the importance of

different words to a speciﬁc task. For example, it is

shown in (Kassner and Schütze,2020) that general

purposed PLMs have difﬁculties dealing with con-

tradictory sentiment words or negation expressions,

which are critical in sentiment analysis. To ad-

dress this problem, recent sentiment-aware PLMs

introduce word-level sentiment information, such

as token sentiments and emoticons (Zhou et al.,

2020), aspect word (Tian et al.,2020), word-level

linguistic knowledge (Ke et al.,2020), and im-

plicit sentiment-knowledge information (Li et al.,

2021b). These word-level pre-training tasks, e.g.,

sentiment word prediction and word polarity predic-

tion, mainly learn from the masked words and are

not efﬁcient to capture word-level information for

all input words. Furthermore, sentiment expressed

in a sentence is beyond the simple aggregation of

word-level sentiments. However, general purposed

PLMs and existing sentiment-aware PLMs under-

consider sentence-level sentiment information.

In this paper, we propose a novel sentiment-

aware pre-trained language model called

SentiWSP

, to combine word-level pre-training

and sentence-level pre-training. Inspired by

ELECTRA (Clark et al.,2020), which pre-trains

a masked language model with signiﬁcantly less

computation resource, we adopt a generator-

discriminator framework in the word-level

pre-training. The generator aims to replace

masked words with plausible alternatives; and the

discriminator aims to predict whether each word in

the sentence is an original word or a substitution.

To tailor this framework for sentiment analysis, we

arXiv:2210.09803v2 [cs.CL] 19 Oct 2022

mask two types of words for generation, sentiment

words and non-sentiment words. We increase the

portion of masked sentiment words so that the

model focuses more on the sentiment expressions.

For sentence-level pre-training, we design a con-

trastive learning framework to improve the encoded

embeddings by the discriminator. The query for

the contrastive learning is constructed by masking

sentiment expressions in a sentence. The positive

example is the original sentence. The negative ex-

amples are selected ﬁrstly from in-batch samples

and then from cross-batch similar samples using

an asynchronously updated approximate nearest

neighboring (ANN) index. In this way, the dis-

criminator, which will be used as the encoder for

downstream tasks, learns to distinguish different

sentiment polarities even if they are superﬁcially

similar.

Our main contributions are in three folds: 1).

SentiWSP

strengthens word-level pre-training via

masked sentiment word generation and detection,

which is more sample-efﬁcient and beneﬁts various

sentiment classiﬁcation tasks; 2).

SentiWSP

com-

bines word-level pretraining with sentence-level

pre-training, which has been underconsidered in

previous studies. SentiWSP adopts contrastive

learning in the pre-training, where sentences are

progressively contrasted with in-batch and cross-

batch hard negatives, so that the model is empow-

ered to encode detailed sentiment information of a

sentence; 3). We conduct extensive experiments on

sentence-level and aspect-level sentiment classiﬁ-

cation tasks, and show that

SentiWSP

achieves new

state-of-the-art performance on multiple bench-

marking datasets.

2 Related Work

Pre-training and Representation Learning

Pre-training models has shown great success

across various NLP tasks (Devlin et al.,2019;

Yang et al.,2019;Liu et al.,2019). Existing

studies mostly use a Transformer-based (Vaswani

et al.,2017) encoder to capture contextual features,

along with masked language model (MLM) and/or

next sentence prediction (Devlin et al.,2019)

as the pre-training tasks. Yang et al. (2019)

propose XLNet which is pre-trained using a

generalized autoregressive method that enables

learning bidirectional contexts by maximizing the

expected likelihood over all permutations of the

factorization order. ELECTRA (Clark et al.,2020)

is a generator-discriminator framework, where the

generator performs the masked token generation

and the discriminator performs replaced token

detection pre-training task. It is more efﬁcient than

MLM because the discriminator models over all

input tokens rather than the masked tokens only.

Our work improves ELECTRA’s performance on

sentiment analysis tasks, by specifying masked

sentiment words at word-level pre-training, and

combining sentence-level pre-training.

In addition to the pre-training models that en-

code token representations, sentence-level and

passage-level representation learning have under-

gone rapid development in recent years. A surge

of work demonstrates that contrastive learning is

an effective framework for sentence- and passage-

level representation learning (Meng et al.,2021;

Wei et al.,2021;Gao et al.,2021;Li et al.,2021a).

The common idea of contrastive learning is to pull

together an anchor and a “positive” sample in the

embedding space, and push apart the anchor from

“negative” samples. Recently, COCO-LM (Meng

et al.,2021) creates positive samples by masking

and cropping tokens from sentences. Gao et al.

(2021) demonstrate that constructing positive pairs

with only standard dropout as minimal data aug-

mentation works surprisingly well on the Natural

Language Inference (NLI) task. Karpukhin et al.

(2020) investigate the impact of different nega-

tive sampling strategies for passage representation

learning based on the task of passage retrieval and

question answering. ANCE (Xiong et al.,2021)

adopts approximate nearest neighbor negative con-

trastive learning, a learning mechanism that selects

hard negatives globally from the entire corpus, us-

ing an asynchronously updated Approximate Near-

est Neighbor (ANN) index. Inspired by COCO-

LM (Meng et al.,2021) and ANCE (Xiong et al.,

2021), we construct positive samples by masking

a span of words from a sentence, and construct

cross-batch hard negative samples to enhance the

discriminator at sentence-level pre-training.

Pre-trained Models for Sentiment Analysis

the ﬁeld of sentiment analysis, BERT-PT (Xu et al.,

2019) conducts post-training on the corpora which

belong to the same domain of the downstream

tasks to beneﬁt aspect-level sentiment classiﬁca-

tion. SKEP (Tian et al.,2020) constructs three

sentiment knowledge prediction objectives in or-

der to learn a uniﬁed sentiment representation for

multiple sentiment analysis tasks. SENTIX (Zhou

Inferencer

Trainer

ANN

Negatives

Query

[CLS] A smart sassy exceptionally charming romantic

comedy [SEP]

[CLS] A [MASK] sassy exceptionally [MASK] romantic

comedy [SEP]

f k-2

f k-1

f k

Inferencing Inferencing

Index

&Search

Index

&Search

Checkpoint k-1Checkpoint kCheckpoint k+1

Sentiment Word

Sentiment

Masking

Sample

O:original

R:replaced

Generator

Asmart [MASK] exceptionally [MASK] romantic [MASK]

Asmart sassy exceptionally charming romantic comedy

Asmart stylish exceptionally charming romantic book

Discriminator

O O R O O O R

Positive

...

Sentence-Level Pre-training Word-Level Pre-training

Figure 1: Framework overview of SentiWSP.

et al.,2020) investigates domain-invariant senti-

ment knowledge from large-scale review datasets,

and utilizes it for cross-domain sentiment classiﬁ-

cation tasks without ﬁne-tuning. SentiBERT (Yin

et al.,2020) proposes a two-level attention mech-

anism on top of the BERT representation to cap-

ture phrase-level compositional semantics. Senti-

LARE (Ke et al.,2020) devises a new pre-training

task called label-aware masked language model

to construct knowledge-aware language represen-

tation. SCAPT (Li et al.,2021b) captures both

implicit and explicit sentiment orientation from re-

views by aligning the representation of implicit

sentiment expressions to those with the same senti-

ment label.

3 Sentiment-Aware Word-Level and

Sentence-Level Pre-training

The overall framework of SentiWSP is depicted in

Figure 1. SentiWSP consists of two pre-training

phases, namely Word-level pre-training (Sec. 3.1,

and Sentence-level pre-training (Sec. 3.2), before

ﬁne-tuning (Sec. 3.3) on a downstream sentiment

analysis task.

word-level pre-training

, an input sentence

ﬂows through a word-masking step, followed by

a generator to replace the masked words, and a

discriminator to detect the replacements. The gen-

erator and discriminator are jointly trained in this

stage. Then, the training of discriminator continues

sentence-level pre-training

. Each input sen-

tence is masked at sentiment words to construct a

query, while the original sentence is treated as the

positive sample. Their embeddings encoded by the

discriminator are contrasted to two types of nega-

tive samples constructed in an in-batch warm-up

training step and a cross-batch approximate nearest

neighbor training step. Finally, the discriminator is

ﬁne-tuned on the downstream task.

Compared with previous studies, the discrimina-

tor in SentiWSP has three advantages. (1) Instead

of random token replacement and detection, Sen-

tiWSP masks a large portion of sentiment words,

and thus the discriminator pays more attention to

word-level sentiments. (2) Instead of pure masked

token prediction, SentiWSP incorporates context

information from all input words via a replacement

detection task. (3) SentiWSP combines sentence-

level sentiments with word-level sentiments by pro-

gressively contrasting a sentence with missing sen-

timents to a superﬁcially similar sentence.

3.1 Word-Level Pre-training

Word masking

Different from previous random

word masking (Devlin et al.,2019;Clark et al.,

2020), our goal is to corrupt the sentiment of the

input sentence.

In detail, we ﬁrst randomly mask

15%

words,

the same as ELECTRA (Clark et al.,2020). Then,

we use SentiWordNet (Baccianella et al.,2010) to

mark the positions of sentiment words in a sen-

tence, and mask the sentiment words until a certain

proportion

of sentiment words are hidden. We

empirically ﬁnd that the sentiment word masking

proportion pw= 50% achieves the best results.

In the example in Figure 1(left), the sentiment

words “sassy” and “charming” are masked while

“smart” is not masked, “comedy” is masked as a

random non-sentiment word.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Sentiment-AwareWordandSentenceLevelPre-trainingforSentimentAnalysisShuaiFan1,ChenLin1,HaonanLi2y,ZhenghaoLin1y,JinsongSu1HangZhang3,YeyunGong4,JianGuo3,NanDuan41SchoolofInformatics,XiamenUniversity,China2TheUniversityofMelbourne,Australia3IDEAResearch,China4MicrosoftResearchAsiaAbstractMostexisting...

展开>> 收起<<

Sentiment-Aware Word and Sentence Level Pre-training for Sentiment Analysis Shuai Fan1 Chen Lin1 Haonan Li2y Zhenghao Lin1y Jinsong Su1Hang Zhang3.pdf

共12页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Sentiment-Aware Word and Sentence Level Pre-training for Sentiment Analysis Shuai Fan1 Chen Lin1 Haonan Li2y Zhenghao Lin1y Jinsong Su1Hang Zhang3

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: