mask two types of words for generation, sentiment
words and non-sentiment words. We increase the
portion of masked sentiment words so that the
model focuses more on the sentiment expressions.
For sentence-level pre-training, we design a con-
trastive learning framework to improve the encoded
embeddings by the discriminator. The query for
the contrastive learning is constructed by masking
sentiment expressions in a sentence. The positive
example is the original sentence. The negative ex-
amples are selected firstly from in-batch samples
and then from cross-batch similar samples using
an asynchronously updated approximate nearest
neighboring (ANN) index. In this way, the dis-
criminator, which will be used as the encoder for
downstream tasks, learns to distinguish different
sentiment polarities even if they are superficially
similar.
Our main contributions are in three folds: 1).
SentiWSP
strengthens word-level pre-training via
masked sentiment word generation and detection,
which is more sample-efficient and benefits various
sentiment classification tasks; 2).
SentiWSP
com-
bines word-level pretraining with sentence-level
pre-training, which has been underconsidered in
previous studies. SentiWSP adopts contrastive
learning in the pre-training, where sentences are
progressively contrasted with in-batch and cross-
batch hard negatives, so that the model is empow-
ered to encode detailed sentiment information of a
sentence; 3). We conduct extensive experiments on
sentence-level and aspect-level sentiment classifi-
cation tasks, and show that
SentiWSP
achieves new
state-of-the-art performance on multiple bench-
marking datasets.
2 Related Work
Pre-training and Representation Learning
Pre-training models has shown great success
across various NLP tasks (Devlin et al.,2019;
Yang et al.,2019;Liu et al.,2019). Existing
studies mostly use a Transformer-based (Vaswani
et al.,2017) encoder to capture contextual features,
along with masked language model (MLM) and/or
next sentence prediction (Devlin et al.,2019)
as the pre-training tasks. Yang et al. (2019)
propose XLNet which is pre-trained using a
generalized autoregressive method that enables
learning bidirectional contexts by maximizing the
expected likelihood over all permutations of the
factorization order. ELECTRA (Clark et al.,2020)
is a generator-discriminator framework, where the
generator performs the masked token generation
and the discriminator performs replaced token
detection pre-training task. It is more efficient than
MLM because the discriminator models over all
input tokens rather than the masked tokens only.
Our work improves ELECTRA’s performance on
sentiment analysis tasks, by specifying masked
sentiment words at word-level pre-training, and
combining sentence-level pre-training.
In addition to the pre-training models that en-
code token representations, sentence-level and
passage-level representation learning have under-
gone rapid development in recent years. A surge
of work demonstrates that contrastive learning is
an effective framework for sentence- and passage-
level representation learning (Meng et al.,2021;
Wei et al.,2021;Gao et al.,2021;Li et al.,2021a).
The common idea of contrastive learning is to pull
together an anchor and a “positive” sample in the
embedding space, and push apart the anchor from
“negative” samples. Recently, COCO-LM (Meng
et al.,2021) creates positive samples by masking
and cropping tokens from sentences. Gao et al.
(2021) demonstrate that constructing positive pairs
with only standard dropout as minimal data aug-
mentation works surprisingly well on the Natural
Language Inference (NLI) task. Karpukhin et al.
(2020) investigate the impact of different nega-
tive sampling strategies for passage representation
learning based on the task of passage retrieval and
question answering. ANCE (Xiong et al.,2021)
adopts approximate nearest neighbor negative con-
trastive learning, a learning mechanism that selects
hard negatives globally from the entire corpus, us-
ing an asynchronously updated Approximate Near-
est Neighbor (ANN) index. Inspired by COCO-
LM (Meng et al.,2021) and ANCE (Xiong et al.,
2021), we construct positive samples by masking
a span of words from a sentence, and construct
cross-batch hard negative samples to enhance the
discriminator at sentence-level pre-training.
Pre-trained Models for Sentiment Analysis
In
the field of sentiment analysis, BERT-PT (Xu et al.,
2019) conducts post-training on the corpora which
belong to the same domain of the downstream
tasks to benefit aspect-level sentiment classifica-
tion. SKEP (Tian et al.,2020) constructs three
sentiment knowledge prediction objectives in or-
der to learn a unified sentiment representation for
multiple sentiment analysis tasks. SENTIX (Zhou