
Zero-shot stance detection based on cross-domain
feature enhancement by contrastive learning
Xuechen Zhao∗Jiaying Zou∗Zhong Zhang∗Feng Xie∗Bin Zhou∗ † Lei Tian∗
Abstract
Zero-shot stance detection is challenging because it requires
detecting the stance of previously unseen targets in the infer-
ence phase. The ability to learn transferable target-invariant
features is critical for zero-shot stance detection. In this
work, we propose a stance detection approach that can effi-
ciently adapt to unseen targets, the core of which is to cap-
ture target-invariant syntactic expression patterns as trans-
ferable knowledge. Specifically, we first augment the data by
masking the topic words of sentences, and then feed the aug-
mented data to an unsupervised contrastive learning mod-
ule to capture transferable features. Then, to fit a specific
target, we encode the raw texts as target-specific features.
Finally, we adopt an attention mechanism, which combines
syntactic expression patterns with target-specific features to
obtain enhanced features for predicting previously unseen
targets. Experiments demonstrate that our model outper-
forms competitive baselines on four benchmark datasets.
1 Introduction
The goal of stance detection is to automatically identify
the attitude or stance (e.g., Favor, Against, or Neutral)
expressed in a text towards a specific target or topic1
[3, 23, 16]. The traditional target-specific stance detec-
tion assumes that the training and testing data belonged
to the same target [18]. However, due to the continuous
emergence of unseen targets, collecting data on all tar-
gets for training is infeasible in practice. Moreover, it is
expensive to obtain high-quality labels for a new target
[23]. Therefore, the study of zero-shot stance detection
for unseen targets goes beyond the target-specific task
and can help to predict stance more flexibly.
For the zero-shot stance detection task, some ex-
isting approaches try to improve the model’s predic-
tive ability for unseen targets by employing attention
mechanisms [1, 33] or fusing external knowledge [22].
∗School of Computer, National University of Defense Technol-
ogy, Changsha, China. {zhaoxuechen, zoujiaying20, zhangzhong,
xiefeng, binzhou, leitian129}@nudt.edu.cn
†Corresponding Author, Key Lab. of Software Engineering for
Complex Systems, Changsha, China.
1In the paper, we will use the terms: target and topic
interchangeably.
However, transferring knowledge directly from a spe-
cific target to an unseen target is often limited in its
predictive effectiveness due to the coupling of target-
specific features. [2, 32] use adversarial learning to
guide the model to learn target-invariant features via
discriminators, which may lead to degraded prediction
performance in the unbalanced distribution of targets.
[19] capture the target-invariant features by identifying
the stance feature categories and supervised contrastive
learning, so their model achieves a better generalization
capability. However, the data needs to be tagged with
soft labels by pretext tasks, which increases the com-
plexity of the model and brings some noise to the data.
Target-specific features are directly related to a specific
target, while target-invariant features are generic and
transferable, regardless of their targets. Consequently,
it is crucial to distinguish these two features when pre-
dicting the stance of a text on unseen targets.
Both linguistic and psychological fields divide lan-
guage into two aspects of representation: 1) syntactic
representation and 2) semantic representation. The for-
mer reflects the form of languages, such as word mor-
phology and sentence structure, and is the external rep-
resentation of language; the latter is the concept and
proposition, denoting the meaning referred to by the
form of language, which is abstract and is the inter-
nal representation of language [15]. Meanwhile, Event-
related Potentials (ERP) interaction theory [6] suggests
that text semantics is a fusion of syntactic and seman-
tic representations, where syntax and semantics inter-
act to complete the process of sentence comprehen-
sion and expression jointly. As shown in Table 1, it
is possible to use the same or similar syntactic expres-
sion patterns even for sentences with different targets,
i.e., although the targets of Example 1 and Example 2
are distinct, they both use rhetorical question expres-
sion patterns, so these syntactic expression patterns are
target-invariant. The target-invariant syntactic expres-
sion patterns and the target-specific features jointly de-
termine the sentence’s meaning. Inspired by this, we
acquire syntactic expression patterns, which are natu-
rally target-invariant and have an important impact on
semantics. Furthermore, these syntactic expression pat-
arXiv:2210.03380v1 [cs.CL] 7 Oct 2022