its incorporation of slots in constructions that
may be filled by specific word types and its focus
on learning without an innate, universal grammar,
may be beneficial to understanding the learning
process of PLMs as their capabilities advance
further.
–
Many constructions present an interesting chal-
lenge for PLMs. In fact, recent work in challenge
datasets (Ribeiro et al.,2020) has already started
using what could be considered constructions,
in an attempt to identify types of sentences that
models struggle with, and to point out a potential
direction for improvement. One of the central
tenets of CxG is the relation between the form of
a construction and its meaning, or to put it in NLP
terms, a model must learn to infer parts of the
sentence meaning from patterns that are present
in it, as opposed to words. We believe this to be
an interesting challenge for future PLMs.
2.3 The English Comparative Correlative
The English comparative correlative (CC) is one
of the most commonly studied constructions in lin-
guistics, for several reasons. Firstly, it constitutes
a clear example of a linguistic phenomenon that
is challenging to explain in the framework of gen-
erative grammar (Culicover and Jackendoff,1999;
Abeillé and Borsley,2008), even though there have
been approaches following that school of thought
(Den Dikken,2005;Iwasaki and Radford,2009).
Secondly, it exhibits a range of interesting syntactic
and semantic features, as detailed below. These
reasons, we believe, also make the CC an ideal
testbed for a first study attempting to extend the
current trend of syntax probing for rules by devel-
oping methods for probing according to CxG.
The CC can take many different forms, some of
which are exemplified here:
(1) The more, the merrier.
(2) The longer the bake, the browner the colour.
(3)
The more she practiced, the better she became.
Semantically, the CC consists of two clauses, where
the second clause can be seen as the dependent vari-
able for the independent variable specified in the
first one (Goldberg,2003). It can be seen on the one
hand as a statement of a general cause-and-effect
relationship, as in a general conditional statement
(e.g., (2) could be paraphrased as “If the bake is
longer, the colour will be more brown”), and on the
other as a temporal development in a comparative
sentence (paraphrasing (3) as “She became better
over time, and she practiced more over time”). Us-
age of the CC typically implies both readings at the
same time. Syntactically, the CC is characterised
in both clauses by an instance of “the” followed
by an adverb or an adjective in the comparative,
either with “-er” for some adjectives and adverbs,
or with “more” for others, or special forms like
“better”. Special features of the comparative sen-
tences following this are the optional omission of
the future “will” and of “be”, as in (1). Crucially,
“the” in this construction does not function as a de-
terminer of noun phrases (Goldberg,2003); rather,
it has a function specific to the CC and has vari-
ously been called a “degree word” (Den Dikken,
2005) or “fixed material” (Hoffmann et al.,2019).
3 Syntax
Our investigation of PLMs’ knowledge of the CC
is split into two parts. First, we probe for the PLMs’
knowledge of the syntactic aspects of the CC, to
determine if they recognise its structure. Then we
devise a test of their understanding of its semantic
aspects by investigating their ability to apply, in a
given context, information conveyed by a CC.
3.1 Probing Methods
As the first half of our analysis of PLMs’ know-
ledge of the CC, we investigate its syntactic aspects.
Translated into probing questions, this means that
we ask: can a PLM recognise an instance of the
CC? Can it distinguish instances of the CC from
similar-looking non-instances? Is it able to go bey-
ond the simple recognition of its fixed parts (“The
COMP-ADJ
/
ADV
, the ...”) and group all ways of com-
pleting the sentences that are instances of the CC
separately from all those that are not? And to frame
all of these questions in a syntactic probing frame-
work: will we be able to recover, using a logistic
regression as the probe, this distinguishing inform-
ation from a PLM’s embeddings?
The established way of testing a PLM for its
syntactic knowledge has in recent years become
minimal pairs (e.g., Warstadt et al.,2020,Dem-
szky et al.,2021). This would mean pairs of sen-
tences which are indistinguishable except for the
fact that one of them is an instance of the CC and
the other is not, allowing us to perfectly separate
a model’s knowledge of the CC from other con-
founding factors. While this is indeed possible for
simpler syntactic phenomena such as verb-noun