model (Mikolov et al.,2013) to the sentence level
and proposed Skip-Thoughts by using a sentence
to predict its surrounding sentences in an unsuper-
vised manner. InferSent (Conneau et al.,2017), on
the other hand, leveraged supervised learning to
train a general-purpose sentence encoder with BiL-
STM by taking advantage of natural language infer-
ence (NLI) datasets. Pre-trained language models
like BERT (Devlin et al.,2019) are widely used to
provide a single-vector representation for the given
sentence and demonstrate promising results across
a variety of NLP tasks. Inspired by InferSent,
Sentence-BERT (SBERT) (Reimers and Gurevych,
2019) produces general-purpose sentence embed-
dings by fine-tuning BERT on NLI datasets. How-
ever, as investigated by Li et al. (2020), sentence
embeddings produced by pre-trained models suffer
from anisotropy, which severely limits their expres-
siveness. They then proposed a post-processing
step to map sentence embeddings to an isotropic
distribution which largely improves the situation.
Similarly, Su et al. (2021) proposed a whitening
operation for post-process, which aims to alleviate
the anisotropy problem. Gao et al. (2021), on the
other hand, proposed the SimCSE model by fine-
tuning pre-trained sentence encoders with a con-
trastive learning objective (Chen et al.,2020) along
in-batch negatives (Henderson et al.,2017;Chen
et al.,2017) on NLI datasets, improving both the
performance and the anisotropy problem. Though
sentence encoders have achieved promising per-
formance, the current way of utilising them for
meaning comparison tasks has known drawbacks
and could benefit from the fruitful developments of
the alignment component, which have been widely
used in modelling sentence pair relations.
2.2 Alignment in Sentence Pair Tasks
Researchers have been investigating sentence
meaning comparison for years. One widely used
method involves decomposing the sentence-level
comparison into comparisons at a lower level. Mac-
Cartney et al. (2008) aligned phrases based on their
edit distance and applied the alignment to NLI tasks
by taking average of aligned scores. Shan et al.
(2009) decomposed sentence-level similarity score
into the direct comparison between events and con-
tent words based on WordNet (Miller,1995). Sul-
tan et al. (2014) proposed a complex alignment
pipeline based on various linguistic features, and
predicted the sentence-level semantic similarity by
taking the proportion of their aligned content words.
The alignment between two syntactic trees are used
along with other lexical and syntactic features to
determine whether two sentences are paraphrases
with SVM (Liang et al.,2016).
Similar ideas are combined with neural mod-
els to construct alignments based on the attention
mechanism (Bahdanau et al.,2015). They can be
seen as learning soft alignments between words
or phrases in two sentences. Pang et al. (2016)
proposed MatchPyramid where a word-level align-
ment matrix was learned, and convolutional net-
works were used to extract features for sentence-
level classification. More fine-grained comparisons
between words are introduced by PMWI (He and
Lin,2016) to better dissect the meaning difference.
Wang et al. (2016) put focus on both similar and
dissimilar alignments by decomposing and compos-
ing lexical semantics over sentences. ESIM (Chen
et al.,2017) further allowed richer interactions be-
tween tokens. These models are further improved
by incorporating context and structure information
(Liu et al.,2019), as well as character-level infor-
mation (Lan and Xu,2018). Recently, Pre-trained
models are exploited to provide contextualised rep-
resentations for the PMWI (Zhang et al.,2019a).
Instead of relying on soft alignments, some other
models tried to take the phrase alignment task as an
auxiliary task for sentence semantic assessments
(Arase and Tsujii,2019,2021), and to embed the
Hungarian algorithm into trainable end-to-end neu-
ral networks to provide better aligned parts (Xiao,
2020). Considering pre-trained sentence encoders
are often directly used to provide fixed embeddings
for meaning comparison, in this work, we propose
to combine them with the alignment component at
inference time so that it can be used with enhanced
structure-awareness without re-training.
3 Our Approach
Instead of generating a single-vector representa-
tion for meaning comparison based on sentence
encoders, we propose to represent each sentence as
a list of predicate-argument spans and use sentence
encoders to provide its span representations. The
comparison between two sentences is then based
on the alignment between their predicate-argument
spans. As depicted in Figure 1, the approach can be
considered as a post-processing step and consists
of the following main components: