2 Background
For the present task, we are provided with a list of
sentences following a set of patterns, all of which
have two slots for noun phrases. One such sentence
might be:
I don’t like beer, a special kind of drink
.
The pattern corresponding to this sentence would
be: I don’t like [blank], a special kind of [blank].
Sentences are labeled according to whether the
taxonomic relation between the two nouns makes
sense. In sub-task 1, labels are binary; a sen-
tence such as that shown above has a label of
1, while this sentence would have a label of 0:
I like huskies, and dogs too
. In sub-task 2, labels
are continuous, ranging from 1 to 7; these scores
are based on a seven-point Likert scale, judged by
humans via crowdsourcing. The same dataset is
presented in English, Italian and French. For sub-
task 1, the training and test sets consist of 5 838
and 14 556 sentences, respectively; for sub-task 2,
the training and test sets consist of 524 and 1 009
sentences, respectively.
There are two challenges to this dataset: (i) The
test dataset is much bigger than the training dataset,
and (ii) There are unseen patterns and noun pairs in
the test set. The combination of these hampers the
ability of machine learning (ML) models trained
on the training set to generalize well to the test set.
Indeed, that is the aim of this task: to evaluate the
ability of language models to generalize to new
data when it comes to inferring taxonomies.
One way to conceptualize the PreTENS task is
to reformulate it as a taxonomy extraction task with
pattern classification and distributed word represen-
tations. For a given sentence, extract the noun pair
and the pattern from the sentence, and then deter-
mine if the taxonomic relation between the nouns
matches the relations allowed by the pattern. This
formulation is motivated by previous work in taxon-
omy construction that relied on various approaches
ranging from pattern-based methods and syntactic
features to word embeddings (Huang et al.,2019;
Luu et al.,2016;Roller et al.,2018). As promising
as this approach sounds for PreTENS, it involves
manual labeling of the noun-pair taxonomic rela-
tions in the training set, as we are not allowed to
use resources such as WordNet (Fellbaum,1998)
or BabelNet (Navigli and Ponzetto,2012).
A different approach is to tackle PreTENS as
a cross-over task between extraction of lexico-
semantic relations and commonsense validation.
There have been SemEval tasks to extract and iden-
tify taxonomic relationships between given terms
(SemEval-2016 task 13) (Bordea et al.,2016), and
to validate sentences for commonsense (SemEval-
2020 task 4, sub-task A) (Wang et al.,2020). The
aim of the common-sense validation task is to iden-
tify which of two natural language statements with
similar wordings makes sense.
In the SemEval-2016 task 13, approaches re-
lated to extracting hypernym-hyponym relations
to construct a taxonomy involved both pattern-
based methods and distributional methods. TAXI
relied on extracting Hearst-style lexico-syntactic
patterns by first crawling domain-specific corpora
based on the terminology of the target domain and
later using substring matching to extract candidate
hypernym-hyponym relations (Panchenko et al.,
2016). Another team designed a semi-supervised
model based on the hypothesis that hypernyms may
be induced by adding a vector offset to the corre-
sponding hyponym word embedding (Pocostales,
2016).
Participants in the SemEval 2020 commonsense
validation task had an advantage over PreTENS
participants: they were allowed to integrate taxo-
nomic information from external resources such
as ConceptNet (Wang et al.,2020), which eased
the process of fine-tuning the language models on
the down-stream task. As an example, the CN-
HIT-IT.NLP team (Zhang et al.,2020) and ECNU-
SenseMaker (Zhao et al.,2020) both used a variant
of K-BERT (Liu et al.,2020a) with additional data;
the former injects relevant triples from ConceptNet
to the language model, while the later also uses
ConceptNet’s unstructured text to pre-train the lan-
guage model. Other systems relied on ensemble
models consisting of different language models
such as RoBERTa and XLNet (Liu,2020;Altiti
et al.,2020).
In Section 3we outline the architectures cho-
sen to tackle the two sub-tasks of PreTENS. We
draw on previous work, as outlined above, and pro-
vide novel combinations of datasets and algorithms
to improve the performance of out-of-the box lan-
guage models.
3 System Description
The systems we propose for both PreTENS sub-
tasks are based on language models. In sub-task 1
we use the ELECTRA (Efficiently Learning an
Encoder that Classifies Token Replacements Ac-
curately) transformer (Clark et al.,2020), while in