Assessing Neural Referential Form Selectors on a Realistic Multilingual Dataset Guanyi Chen Fahime Sameand Kees van Deemter

2025-05-02 0 0 747.42KB 12 页 10玖币
侵权投诉
Assessing Neural Referential Form Selectors on
a Realistic Multilingual Dataset
Guanyi Chen, Fahime Same,and Kees van Deemter
Department of Information and Computing Sciences, Utrecht University
Department of Linguistics, University of Cologne
g.chen@uu.nl, f.same@uni-koeln.de, c.j.vandeemter@uu.nl
Abstract
Previous work on Neural Referring Expression
Generation (REG) all uses WebNLG, an En-
glish dataset that has been shown to reflect a
very limited range of referring expression (RE)
use. To tackle this issue, we build a dataset
based on the OntoNotes corpus that contains
a broader range of RE use in both English
and Chinese (a language that uses zero pro-
nouns). We build neural Referential Form Se-
lection (RFS) models accordingly, assess them
on the dataset and conduct probing experi-
ments. The experiments suggest that, com-
pared to WebNLG,OntoNotes is better for as-
sessing REG/RFS models. We compare En-
glish and Chinese RFS and confirm that in
both languages BERT has the highest perfor-
mance. Also, our results suggest that in line
with linguistic theories, Chinese RFS depends
more on discourse context than English.
1 Introduction
Referring Expression Generation (REG) In Context
is a key task in the classic Natural Language Gen-
eration pipeline (Reiter and Dale,2000;Gatt and
Krahmer,2018). Given a discourse whose refer-
ring expressions (REs) have yet to be realised and
given their intended referents, it aims to develop an
algorithm that generates all these REs.
Traditionally, REG In Context (hereafter REG)
is a two-step process. In the first step, the Refer-
ential Form (RF) is determined, e.g. whether to
use a proper name, a description, a demonstrative
or a pronoun. This step is the focus of this work
and will be hereafter called Referential Form Se-
lection (RFS). In the second step, the content of
the RE is determined. For example, to refer to Joe
Biden, one needs to choose from options such as
the president”, “the 46th president of US”.
In recent years, many works on REG have started
to use neural networks. For example, Castro Fer-
reira et al. (2018a); Cao and Cheung (2019); Cunha
et al. (2020) have proposed to generate REs in
an End2End manner, i.e., to tackle the selection
of form and content simultaneously. Chen et al.
(2021) used
BERT
(Devlin et al.,2019) to perform
RFS. One commonality between these studies is
that they were all tested on a benchmark dataset,
namely WebNLG (Gardent et al.,2017;Castro Fer-
reira et al.,2018b).
However, Chen et al. (2021) and Same et al.
(2022) found that WebNLG is not ideal for assess-
ing REG/RFS algorithms because (1) it consists
of rather formal texts that may not reflect every-
day RE use; (2) its texts are very short and have
a simple syntactic structure; and (3) most of its
REs are first-mentions. These limitations led to
some unexpected results when they tested their
RFS models on WebNLG. For example, advanced
pre-trained models (i.e.,
BERT
) performed worse
than simpler models (i.e., single-layer GRU (Cho
et al.,2014)) without any pre-training. By prob-
ing
1
various RFS models, they found that though
BERT
encodes more linguistic information, which
is crucial for RFS, it still performs worse than GRU.
In this study, we are interested in how well each
RFS model performs when tested on a dataset that
addresses the above limitations – in what follows,
we call this a “realistic" dataset, for short.
Additionally, all the above studies were con-
ducted on English only. It has been pointed out
that speakers of East Asian languages (e.g. Chi-
nese and Japanese) use REs differently from speak-
ers of Western European languages (e.g. English
and Dutch; Newnham (1971)). Theoretical lin-
guists (Huang,1984) have suggested that East
Asian languages rely more heavily on context than
Western European languages (see Chen (2022) for
empirical testing and computational modelling).
As a result, speakers of East Asian languages fre-
quently use Zero Pronouns (ZPs), i.e. REs that
contain no words and are resolved based merely
1
Probing is an established method to analyse whether the
latent representations of a model encode certain information.
arXiv:2210.04828v2 [cs.CL] 11 Oct 2022
Text
: Amatriciana sauce is made with Tomato. It is a tra-
ditional Italian sauce. Amatriciana is a sauce containing
Tomato that comes from Italy.
Delexicialised Text
: Amatriciana_sauce is made with
Tomato. Amatriciana_sauce is a traditional Italy sauce.
Amatriciana_sauce is a sauce containing Tomato that
comes from Italy.
Table 1: An example data from the WebNLG corpus. In
the delexicalised text, every entity is
hlhighlighted.
on their context.
2
This poses two challenges for
REG/RFS models: (1) they need to be better able
to encode contextual information; (2) they need to
account for an additional RF (i.e. ZP). Therefore,
we are curious to see how well each RFS model
performs when tested on a language that has more
RFs and relies more on context than English.
To answer the research questions above, we con-
struct a “realistic" multilingual dataset of English
and Chinese and try different model architectures,
such as models with/without pre-trained word em-
beddings, and models incorporating
BERT
. We re-
port the results and compare model behaviours on
English and Chinese subsets. The code used in this
study is available at:
https://github.com/
a-quei/probe-neuralreg.
2 Referential Form Selection (RFS)
Using WebNLG,Castro Ferreira et al. (2018a) re-
defined the REG task in order to accommodate
deep learning techniques. Subsequently, Chen et al.
(2021) adapted the definition to fit the RFS task.
The first step is to remove from each RE all in-
formation about the RF of that RE. Concretely, as
shown in Table 1,Castro Ferreira et al. (2018a) first
“delexicalised" each text in WebNLG by assigning a
general entity tag to each entity and replacing all
REs referring to that entity with that tag. In most
cases, a tag is assigned to an entity by replacing
whitespaces in its proper name with underscores,
e.g. “Amatriciana sauce” to “Amatriciana_sauce”.
For a target referent
x(r)
(e.g. the second “Am-
atriciana_sauce” in Table 1), given the referent,
its pre-context in the discourse
x(pre)
(e.g. “Am-
atriciana_sauce is made with Tomato.”) and its
post-context
x(post)
(e.g. “is a traditional Italy
2
For example, consider the question in Chinese:
” (Have you see Bill?). A Chinese speaker can
reply “
” (
saw
.) where the two
are ZPs that
refer to the speaker himself/herself and “Bill” respectively.
EN
4-Way
Demonstrative, Description, Proper
Name, Pronoun
3-Way Description, Proper Name, Pronoun
2-Way Non-pronominal, Pronominal
ZH
5-Way
Demonstrative, Description, Proper
Name, Pronoun, ZP
4-Way
Description, Proper Name, Pronoun, ZP
3-Way Non-pronominal, Pronoun, ZP
2-Way Overt Referring Expression, ZP
Table 2: Types of RF classification and possible classes.
Demonstratives are grouped with descriptions in 3-way
EN and 4-way ZH classifications under the category
Description. The category Non-pronominal contains
proper names, descriptions, and demonstratives.
sauce. Amatriciana_sauce is a sauce containing
Tomato that comes from Italy.”), the RFS task is to
decide the proper RF ˆ
f(e.g., pronoun).
3 Dataset Construction
To construct a realistic multilingual REG/RFS
dataset, we used the Chinese and English por-
tions of the OntoNotes dataset
3
whose contents
come from six sources, namely broadcast news,
newswires, broadcast conversations, telephone con-
versations, web blogs, and magazines. We call the
resulting Chinese subset OntoNotes-ZH and the En-
glish subset OntoNotes-EN. In the following, we
describe the construction process.
First, for each RE in OntoNotes, we used the
3 previous sentences as the pre-context and the 3
subsequent sentences as the post-context. Similar
to Chen et al. (2021), we are interested in different
RF classification tasks. For Chinese, for exam-
ple, we not only have a 2-way classification task
where models have to decide whether to use a ZP
or an overt RE, but also a 5-way task where mod-
els have to choose from a more fine-grained list of
possible RFs. Table 2lists all categories in both
OntoNotes-EN and OntoNotes-ZH. Using the con-
stituency syntax tree of the sentence containing
the target referent and the surface form of the tar-
get, we automatically annotated each RE with its
RF category. For example, an RE is considered a
demonstrative if it is annotated in the syntax tree
as a noun phrase and its surface form contains a
demonstrative determiner.
Second, we excluded all coreferential chains con-
sisting only of pronouns and ZPs. The pronominal
3
OntoNotes is licensed under the Linguistic Data
Consortium:
https://catalog.ldc.upenn.edu/
LDC2013T19.
WebNLG O-EN O-ZH
Percentage of First Mentions 85% 43% 43%
Percentage of Proper Names 71% 21% 15%
Average Number of Tokens 18.62 106.44 139.55
Table 3: Statistics of WebNLG and OntoNotes. O-EN
and O-ZH stand for OntoNotes-EN and OntoNotes-ZH.
chains consist mainly of first/second-person ref-
erents, and we do not expect much variation in
referential form in these cases. In other words, we
only included the chains that have at least one overt
non-pronominal RE.
Third, we delexicalised the corpus following
Castro Ferreira et al. (2018a). Additionally, since
we used the Chinese
BERT
as one of our RFS mod-
els and it only accepts input shorter than 512 char-
acters, we removed all samples in OntoNotes-ZH
whose total length (calculated by removing all un-
derscores introduced during delexicalisation and
summing the length of pre-contexts, post-contexts,
and target referents) is longer than 512 characters.
Experiments with models other than
BERT
on the
original OntoNotes-ZH show that this does not bias
the conclusions of this study (see Appendix A).
Last, we split the whole dataset into a training
set and a test set in accordance with the CoNLL
2012 Shared Task (Pradhan et al.,2012). Since ZPs
in Chinese are only annotated in the training and
development sets, following Chen and Ng (2016),
Chen et al. (2018), and Yin et al. (2018), we used
the development set as the test set and sampled 10%
of the documents from the training set as the de-
velopment data. Thus, we obtained OntoNotes-EN,
where the training, development, and test sets con-
tain 71667, 8149, and 7619 samples, respectively,
and OntoNotes-ZH, where the training, develop-
ment, and test sets contain 70428, 9217, and 11607
samples, respectively.
OntoNotes vs. WebNLG.
Based on the nature
of OntoNotes and the statistics in Table 3, we ob-
serve that: (1) the WebNLG data is all from DBPe-
dia, while the OntoNotes data is multi-genre; (2)
OntoNotes has a much smaller proportion of first
mentions and proper names; and (3) the documents
in OntoNotes are on average much longer than those
in WebNLG.
Another difference between WebNLG and
OntoNotes is in the ratio of seen and unseen en-
tities in their test sets. Castro Ferreira et al. (2018b)
divided the documents in the WebNLGs test set
into seen (where all the data come from the same
domains as the training data) and unseen (where
all the data come from different domains than the
training data). Almost all referents from the seen
test set appear in the training set (9580 out of 9644),
while only a few referents from the unseen test set
appear in the training set (688 out of 9644).
4
In
OntoNotes, 38.44% and 41.45% of the referents in
the test sets of OntoNotes-EN and OntoNotes-ZH
also appear in the training sets.
Having said this, OntoNotes largely mitigates the
problems of WebNLG discussed in §1. If OntoNotes
is a “better” and more “representative" corpus for
assessing REG/RFS models, we can expect more
“expected” results: models with pre-training out-
perform those without, and models that learn more
useful linguistic information outperform those that
learn less. We will detail our expectations in §5.
4 Modelling RFS
We introduce how we represent entities and how
we adapt the RFS models of Chen et al. (2021).
4.1 Entity Representation
Unlike WebNLG, whose 99.34% of referents in
the test set appear in the training set, the majority
of referents in OntoNotes do not appear in both
training and test sets. This means that RFS mod-
els should be able to handle unseen referents, but
mapping each entity to a general entity tag with
underscores would prevent the models from doing
so (Cao and Cheung,2019;Cunha et al.,2020) be-
cause entity tags of unseen entities are usually out-
of-vocabulary (OOV) words. Additionally, when
incorporating pre-trained word embeddings and
language models, using entity tags prevents en-
tity representations from benefiting from these pre-
trained models (again since the entity tags of un-
seen entities are usually OOV words).
Similar to Cunha et al. (2020), we replaced
underscores in general entity tags (e.g. Amatri-
ciana_sauce”) with whitespaces (henceforth, lex-
ical tags, e.g. Amatriciana sauce”). Arguably,
there is a trade-off between using entity tags and
using lexical tags. In contrast to lexical tags, the
use of entity tags helps models identify mentions
of the same entity in discourse, which has been
shown to be a crucial feature for RFS. However, us-
ing entity tags prevents models from dealing with
4
Chen et al. (2021) used only seen entities because the size
of the underlying triples of the unseen test set differs from
both the training set and seen test set.
摘要:

AssessingNeuralReferentialFormSelectorsonaRealisticMultilingualDatasetGuanyiChen,FahimeSame~,andKeesvanDeemterDepartmentofInformationandComputingSciences,UtrechtUniversity~DepartmentofLinguistics,UniversityofCologneg.chen@uu.nl,f.same@uni-koeln.de,c.j.vandeemter@uu.nlAbstractPreviousworkonNeuralR...

展开>> 收起<<
Assessing Neural Referential Form Selectors on a Realistic Multilingual Dataset Guanyi Chen Fahime Sameand Kees van Deemter.pdf

共12页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:12 页 大小:747.42KB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 12
客服
关注