NoisyAnnot@ Causal News Corpus 2022: Causality Detection using
Multiple Annotation Decisions
Quynh Anh Nguyen1, 2 Arka Mitra2
1University of Milan 2ETH Zürich
quynguyen@ethz.ch, amitra@ethz.ch
Abstract
The paper describes the work that has been
submitted to the 5th workshop on Challenges
and Applications of Automated Extraction of
socio-political events from text (CASE 2022).
The work is associated with Subtask 1 of
Shared Task 3 that aims to detect causality in
protest news corpus. The authors used dif-
ferent large language models with customized
cross-entropy loss functions that exploit anno-
tation information. The experiments showed
that bert-based-uncased with refined cross-
entropy outperformed the others, achieving a
F1 score of 0.8501 on the Causal News Cor-
pus dataset.
1 Introduction
A causal relationship in a sentence implies an un-
derlying semantic dependency between the two
main clauses. The clauses in these sentences are
generally connected by markers which can have
different parts of tags in the sentence. Moreover,
the markers can be either implicit or explicit and
for these reasons, one cannot rely on regex or
dictionary-based systems. Thus, there is a need
to investigate the context of the sentences. For the
given task, we exploited different large language
models that provide a contextual representation of
sentences to tackle causality detection.
Shared task 3 in CASE-2022 (Tan et al.,2022a)
aims for causality detection in news corpus, which
can be structured as a text classification problem
with binary labels. Pre-trained transformer-based
models (Vaswani et al.,2017) have shown success
on tackling a wide range of NLP tasks including
text generation, text classification, etc. The authors
look into inter-annotation agreements and number
of experts and how they can be included in the
loss to improve the performance of the pre-trained
models.
The main contributions of the paper are as fol-
lows:
1.
Extensive experimentation with different large
language models.
2.
Incorporation of additional annotation infor-
mation, i.e inter-annotation agreement and the
number of annotators, to the loss.
The remaining paper is formulated as follows:
Section 2reviews the related work, section 3de-
scribes the dataset on which the work has been
done, section 4discusses the methodology used
in the paper, the following section discusses the
results and provides an ablation of the various loss
functions introduced and finally, section 6con-
cludes the paper and suggests future works.
2 Related Work
Multiple annotations on a single sample reduce
the chances of the labelling to be incorrect or bias
being incorporated into the dataset (Snow et al.,
2008). Including multiple annotators also leads
to disagreement among the labels that have been
provided by them. The final or gold annotation is
then usually determined by majority voting (Sabou
et al.,2014) or by using the label of an "expert"
(Waseem and Hovy,2016). There are also different
methodologies which do not use majority voting to
select the "ground truth".
Expectation Maximization algorithm has been
used to account for the annotator error (Dawid and
Skene,1979). Entropy metrics have been devel-
oped to identify the performance of the annota-
tors(Waterhouse,2012;Hovy et al.,2013;Gordon
et al.,2021). Multi-task learning is also used to deal
with disagreement in the labels (Fornaciari et al.,
2021;Liu et al.,2019;Cohn and Specia,2013;
Davani et al.,2022). There are methods which
include the annotation disagreement into the loss
function for part of speech tagging (Plank et al.,
2014;Prabhakaran et al.,2012) on SVMs and per-
ceptron model. The present work considers the
inter-annotator agreement as well as the number
arXiv:2210.14852v2 [cs.CL] 1 Dec 2022