Sentiment-Aware Word and Sentence Level Pre-training for Sentiment Analysis Shuai Fan1 Chen Lin1 Haonan Li2y Zhenghao Lin1y Jinsong Su1Hang Zhang3

2025-05-03 0 0 444.44KB 12 页 10玖币
侵权投诉
Sentiment-Aware Word and Sentence Level Pre-training for Sentiment
Analysis
Shuai Fan1, Chen Lin1
, Haonan Li2
, Zhenghao Lin1
, Jinsong Su1Hang Zhang3,
Yeyun Gong4, Jian Guo3, Nan Duan4
1School of Informatics, Xiamen University, China
2The University of Melbourne, Australia
3IDEA Research, China
4Microsoft Research Asia
Abstract
Most existing pre-trained language representa-
tion models (PLMs) are sub-optimal in sen-
timent analysis tasks, as they capture the
sentiment information from word-level while
under-considering sentence-level information.
In this paper, we propose SentiWSP, a novel
Sentiment-aware pre-trained language model
with combined Word-level and Sentence-
level Pre-training tasks. The word level
pre-training task detects replaced sentiment
words, via a generator-discriminator frame-
work, to enhance the PLM’s knowledge about
sentiment words. The sentence level pre-
training task further strengthens the discrim-
inator via a contrastive learning framework,
with similar sentences as negative samples,
to encode sentiments in a sentence. Exten-
sive experimental results show that SentiWSP
achieves new state-of-the-art performance on
various sentence-level and aspect-level sen-
timent classification benchmarks. We have
made our code and model publicly available
at https://github.com/XMUDM/SentiWSP.
1 Introduction
Sentiment analysis plays a fundamental role in
natural language processing (NLP) and powers
a broad spectrum of important business applica-
tions such as marketing (HaCohen-Kerner,2019)
and campaign monitoring (Sandoval-Almazán and
Valle-Cruz,2020). Two typical sentiment analy-
sis tasks are sentence-level sentiment classifica-
tion (Xu et al.,2019;Yin et al.,2020;Tang et al.,
2022) and aspect-level sentiment classification (Li
et al.,2021b).
Recently, pre-trained language representation
models (PLMs) such as ELMo (Peters et al.,2018),
GPT (Radford et al.,2018,2019), BERT (Devlin
et al.,2019), RoBERTa (Liu et al.,2019) and XL-
Net (Yang et al.,2019) have brought impressive per-
*Corresponding author, chenlin@xmu.edu.cn
Equal contribution
formance improvements in many NLP problems,
including sentiment analysis. PLMs learn a robust
encoder on a large unlabeled corpus, through care-
fully designed pre-training tasks, such as masked
token or next sentence prediction.
Despite their progress, the application of general
purposed PLMs in sentiment analysis is limited,
because they fail to distinguish the importance of
different words to a specific task. For example, it is
shown in (Kassner and Schütze,2020) that general
purposed PLMs have difficulties dealing with con-
tradictory sentiment words or negation expressions,
which are critical in sentiment analysis. To ad-
dress this problem, recent sentiment-aware PLMs
introduce word-level sentiment information, such
as token sentiments and emoticons (Zhou et al.,
2020), aspect word (Tian et al.,2020), word-level
linguistic knowledge (Ke et al.,2020), and im-
plicit sentiment-knowledge information (Li et al.,
2021b). These word-level pre-training tasks, e.g.,
sentiment word prediction and word polarity predic-
tion, mainly learn from the masked words and are
not efficient to capture word-level information for
all input words. Furthermore, sentiment expressed
in a sentence is beyond the simple aggregation of
word-level sentiments. However, general purposed
PLMs and existing sentiment-aware PLMs under-
consider sentence-level sentiment information.
In this paper, we propose a novel sentiment-
aware pre-trained language model called
SentiWSP
, to combine word-level pre-training
and sentence-level pre-training. Inspired by
ELECTRA (Clark et al.,2020), which pre-trains
a masked language model with significantly less
computation resource, we adopt a generator-
discriminator framework in the word-level
pre-training. The generator aims to replace
masked words with plausible alternatives; and the
discriminator aims to predict whether each word in
the sentence is an original word or a substitution.
To tailor this framework for sentiment analysis, we
arXiv:2210.09803v2 [cs.CL] 19 Oct 2022
mask two types of words for generation, sentiment
words and non-sentiment words. We increase the
portion of masked sentiment words so that the
model focuses more on the sentiment expressions.
For sentence-level pre-training, we design a con-
trastive learning framework to improve the encoded
embeddings by the discriminator. The query for
the contrastive learning is constructed by masking
sentiment expressions in a sentence. The positive
example is the original sentence. The negative ex-
amples are selected firstly from in-batch samples
and then from cross-batch similar samples using
an asynchronously updated approximate nearest
neighboring (ANN) index. In this way, the dis-
criminator, which will be used as the encoder for
downstream tasks, learns to distinguish different
sentiment polarities even if they are superficially
similar.
Our main contributions are in three folds: 1).
SentiWSP
strengthens word-level pre-training via
masked sentiment word generation and detection,
which is more sample-efficient and benefits various
sentiment classification tasks; 2).
SentiWSP
com-
bines word-level pretraining with sentence-level
pre-training, which has been underconsidered in
previous studies. SentiWSP adopts contrastive
learning in the pre-training, where sentences are
progressively contrasted with in-batch and cross-
batch hard negatives, so that the model is empow-
ered to encode detailed sentiment information of a
sentence; 3). We conduct extensive experiments on
sentence-level and aspect-level sentiment classifi-
cation tasks, and show that
SentiWSP
achieves new
state-of-the-art performance on multiple bench-
marking datasets.
2 Related Work
Pre-training and Representation Learning
Pre-training models has shown great success
across various NLP tasks (Devlin et al.,2019;
Yang et al.,2019;Liu et al.,2019). Existing
studies mostly use a Transformer-based (Vaswani
et al.,2017) encoder to capture contextual features,
along with masked language model (MLM) and/or
next sentence prediction (Devlin et al.,2019)
as the pre-training tasks. Yang et al. (2019)
propose XLNet which is pre-trained using a
generalized autoregressive method that enables
learning bidirectional contexts by maximizing the
expected likelihood over all permutations of the
factorization order. ELECTRA (Clark et al.,2020)
is a generator-discriminator framework, where the
generator performs the masked token generation
and the discriminator performs replaced token
detection pre-training task. It is more efficient than
MLM because the discriminator models over all
input tokens rather than the masked tokens only.
Our work improves ELECTRAs performance on
sentiment analysis tasks, by specifying masked
sentiment words at word-level pre-training, and
combining sentence-level pre-training.
In addition to the pre-training models that en-
code token representations, sentence-level and
passage-level representation learning have under-
gone rapid development in recent years. A surge
of work demonstrates that contrastive learning is
an effective framework for sentence- and passage-
level representation learning (Meng et al.,2021;
Wei et al.,2021;Gao et al.,2021;Li et al.,2021a).
The common idea of contrastive learning is to pull
together an anchor and a “positive” sample in the
embedding space, and push apart the anchor from
“negative” samples. Recently, COCO-LM (Meng
et al.,2021) creates positive samples by masking
and cropping tokens from sentences. Gao et al.
(2021) demonstrate that constructing positive pairs
with only standard dropout as minimal data aug-
mentation works surprisingly well on the Natural
Language Inference (NLI) task. Karpukhin et al.
(2020) investigate the impact of different nega-
tive sampling strategies for passage representation
learning based on the task of passage retrieval and
question answering. ANCE (Xiong et al.,2021)
adopts approximate nearest neighbor negative con-
trastive learning, a learning mechanism that selects
hard negatives globally from the entire corpus, us-
ing an asynchronously updated Approximate Near-
est Neighbor (ANN) index. Inspired by COCO-
LM (Meng et al.,2021) and ANCE (Xiong et al.,
2021), we construct positive samples by masking
a span of words from a sentence, and construct
cross-batch hard negative samples to enhance the
discriminator at sentence-level pre-training.
Pre-trained Models for Sentiment Analysis
In
the field of sentiment analysis, BERT-PT (Xu et al.,
2019) conducts post-training on the corpora which
belong to the same domain of the downstream
tasks to benefit aspect-level sentiment classifica-
tion. SKEP (Tian et al.,2020) constructs three
sentiment knowledge prediction objectives in or-
der to learn a unified sentiment representation for
multiple sentiment analysis tasks. SENTIX (Zhou
Inferencer
Trainer
ANN
Negatives
Query
[CLS] A smart sassy exceptionally charming romantic
comedy [SEP]
[CLS] A [MASK] sassy exceptionally [MASK] romantic
comedy [SEP]
D
f k-2
D
f k-1
D
f k
Inferencing Inferencing
Index
&Search
Index
&Search
Checkpoint k-1Checkpoint kCheckpoint k+1
Sentiment Word
Sentiment
Masking
Sample
O:original
R:replaced
Generator
Asmart [MASK] exceptionally [MASK] romantic [MASK]
Asmart sassy exceptionally charming romantic comedy
Asmart stylish exceptionally charming romantic book
Discriminator
O O R O O O R
Positive
...
Sentence-Level Pre-training Word-Level Pre-training
Figure 1: Framework overview of SentiWSP.
et al.,2020) investigates domain-invariant senti-
ment knowledge from large-scale review datasets,
and utilizes it for cross-domain sentiment classifi-
cation tasks without fine-tuning. SentiBERT (Yin
et al.,2020) proposes a two-level attention mech-
anism on top of the BERT representation to cap-
ture phrase-level compositional semantics. Senti-
LARE (Ke et al.,2020) devises a new pre-training
task called label-aware masked language model
to construct knowledge-aware language represen-
tation. SCAPT (Li et al.,2021b) captures both
implicit and explicit sentiment orientation from re-
views by aligning the representation of implicit
sentiment expressions to those with the same senti-
ment label.
3 Sentiment-Aware Word-Level and
Sentence-Level Pre-training
The overall framework of SentiWSP is depicted in
Figure 1. SentiWSP consists of two pre-training
phases, namely Word-level pre-training (Sec. 3.1,
and Sentence-level pre-training (Sec. 3.2), before
fine-tuning (Sec. 3.3) on a downstream sentiment
analysis task.
In
word-level pre-training
, an input sentence
flows through a word-masking step, followed by
a generator to replace the masked words, and a
discriminator to detect the replacements. The gen-
erator and discriminator are jointly trained in this
stage. Then, the training of discriminator continues
in
sentence-level pre-training
. Each input sen-
tence is masked at sentiment words to construct a
query, while the original sentence is treated as the
positive sample. Their embeddings encoded by the
discriminator are contrasted to two types of nega-
tive samples constructed in an in-batch warm-up
training step and a cross-batch approximate nearest
neighbor training step. Finally, the discriminator is
fine-tuned on the downstream task.
Compared with previous studies, the discrimina-
tor in SentiWSP has three advantages. (1) Instead
of random token replacement and detection, Sen-
tiWSP masks a large portion of sentiment words,
and thus the discriminator pays more attention to
word-level sentiments. (2) Instead of pure masked
token prediction, SentiWSP incorporates context
information from all input words via a replacement
detection task. (3) SentiWSP combines sentence-
level sentiments with word-level sentiments by pro-
gressively contrasting a sentence with missing sen-
timents to a superficially similar sentence.
3.1 Word-Level Pre-training
Word masking
Different from previous random
word masking (Devlin et al.,2019;Clark et al.,
2020), our goal is to corrupt the sentiment of the
input sentence.
In detail, we first randomly mask
15%
words,
the same as ELECTRA (Clark et al.,2020). Then,
we use SentiWordNet (Baccianella et al.,2010) to
mark the positions of sentiment words in a sen-
tence, and mask the sentiment words until a certain
proportion
pw
of sentiment words are hidden. We
empirically find that the sentiment word masking
proportion pw= 50% achieves the best results.
In the example in Figure 1(left), the sentiment
words “sassy” and “charming” are masked while
“smart” is not masked, “comedy” is masked as a
random non-sentiment word.
摘要:

Sentiment-AwareWordandSentenceLevelPre-trainingforSentimentAnalysisShuaiFan1,ChenLin1,HaonanLi2y,ZhenghaoLin1y,JinsongSu1HangZhang3,YeyunGong4,JianGuo3,NanDuan41SchoolofInformatics,XiamenUniversity,China2TheUniversityofMelbourne,Australia3IDEAResearch,China4MicrosoftResearchAsiaAbstractMostexisting...

展开>> 收起<<
Sentiment-Aware Word and Sentence Level Pre-training for Sentiment Analysis Shuai Fan1 Chen Lin1 Haonan Li2y Zhenghao Lin1y Jinsong Su1Hang Zhang3.pdf

共12页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:12 页 大小:444.44KB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 12
客服
关注