Parsing linearizations appreciate PoS tags - but some are fussy about errors Alberto Mu noz-Ortiz1 Mark Anderson2 David Vilares1 Carlos G omez-Rodr ıguez1

2025-05-02 0 0 686.42KB 12 页 10玖币
侵权投诉
Parsing linearizations appreciate PoS tags - but some are fussy about
errors
Alberto Mu˜
noz-Ortiz1, Mark Anderson2, David Vilares1, Carlos G´
omez-Rodr´
ıguez1
1Universidade da Coru˜
na, CITIC, Spain
2PIN Caerdydd, Prifysgol Caerdydd, United Kingdom
alberto.munoz.ortiz@udc.es, andersonm8@caerdydd.ac.uk,
david.vilares@udc.es, carlos.gomez@udc.es
Abstract
PoS tags, once taken for granted as a use-
ful resource for syntactic parsing, have be-
come more situational with the populariza-
tion of deep learning. Recent work on the
impact of PoS tags on graph- and transition-
based parsers suggests that they are only useful
when tagging accuracy is prohibitively high,
or in low-resource scenarios. However, such
an analysis is lacking for the emerging se-
quence labeling parsing paradigm, where it is
especially relevant as some models explicitly
use PoS tags for encoding and decoding. We
undertake a study and uncover some trends.
Among them, PoS tags are generally more use-
ful for sequence labeling parsers than for other
paradigms, but the impact of their accuracy
is highly encoding-dependent, with the PoS-
based head-selection encoding being best only
when both tagging accuracy and resource avail-
ability are high.
1 Introduction
PoS tags have long been considered a useful fea-
ture for parsers, especially prior to the prevalence
of neural networks (Voutilainen,1998;Dalrym-
ple,2006;Alfared and B
´
echet,2012). For neural
parsers, it is less clear if they are useful or not.
Work has shown that when using word and charac-
ter embeddings, PoS tags become much less useful
(Ballesteros et al.,2015;de Lhoneux et al.,2017).
However, Dozat et al. (2017) found using universal
PoS (UPoS) tags to be somewhat helpful, but im-
provements are typically quite small (Smith et al.,
2018). Similarly, for multi-task systems, small im-
provements have been observed for both UPoS and
finer-grained tags (Zhang et al.,2020).
A limiting factor when using predicted PoS tags
is the apparent need for very high accuracy from
taggers (Anderson and G
´
omez-Rodr
´
ıguez,2020).
This is particularly problematic in a low-resource
setting where using gold tags gives unreasonably
high performance (Tiedemann,2015) and high ac-
curacy taggers are difficult to obtain (Kann et al.,
2020). However, some work has suggested that in a
low-resource setting even low accuracy taggers can
be beneficial for parsing performance, especially
when there is more PoS tag annotations than depen-
dency tree annotations (Anderson et al.,2021).
These findings relate to transition-based (TB)
and graph-based (GB) parsers, but recently sev-
eral encodings have been proposed to frame depen-
dency parsing as a sequence labeling task (Strzyz
et al.,2019;Lacroix,2019;G
´
omez-Rodr
´
ıguez
et al.,2020), providing an alternative to GB and
TB models when efficiency is a priority (Anderson
and G
´
omez-Rodr
´
ıguez,2021). Mu
˜
noz-Ortiz et al.
(2021) found that the amount of data required for
different encodings varied and that some were im-
pacted by predicted PoS tag use more than others.
Here, we evaluate the impact of PoS tagging ac-
curacy on different encodings and also the interplay
of this potential relation and the amount of avail-
able data (using low-, mid-, high-, and very-high-
resource treebanks). This is done by artificially
controlling the accuracy of PoS taggers by using
the nature of errors generated by robust taggers.1
2 Sequence labeling parsing
In dependency parsing as sequence labeling, the
goal is to assign a single label of the form
(xi, li)
to every input token
wi
of a sequence, where
xi
encodes a subset of the arcs related to
wi
and
li
is
the dependency type. Below, we review the existing
families of linearizations used in this work.
Head-selection
(Spoustov
´
a and Spousta,2010),
where
xi
encodes the head of
wi
using an abso-
lute index or a relative offset, that can be based
on some word property (usually PoS tags, which
is also the property we use in this work due to its
1
All source code available at
https://www.
grupolys.org/software/aacl2022/.
arXiv:2210.15219v1 [cs.CL] 27 Oct 2022
strong performance in previous work). So for in-
stance, if
xi
= (+n,X), this would indicate that the
head of
wi
is the
n
th word to the right of
wi
with
the word property X. Some desirable properties of
this encoding family are a direct correspondence
between words and arcs and the capacity to encode
any non-projective tree. However, a major weak-
ness is its dependency on the chosen property (in
our case, PoS tags) to decode trees.
Bracketing-based xi
represents the dependency
arcs using a string of brackets, with each arc rep-
resented by a bracket pair. Its main advantage is
that it is independent of external features, but re-
garding projectivity it cannot represent arcs that
cross in the same direction. To alleviate this, we
use the encoding proposed by Strzyz et al. (2020),
that adds a second independent plane of brackets
(2pb), inspired by multiplanarity (Yli-Jyr¨
a,2003).
Transition-based
(G
´
omez-Rodr
´
ıguez et al.,2020),
wheregiven a sequence of transitions generated by
a left-to-right transition-based parser, it splits it in
labels based on read transitions (e.g. SHIFT), such
that each word receives a label
xi
with a subset
of transition actions. For this work, we consider
mappings from a projective algorithm, arc-hybrid
(
ahtb
;Kuhlmann et al.,2011) and a non projective
algorithm, Covington (ctb;Covington,2001).
2.1 Parser systems
We use a 2-layer bidirectional long short-term mem-
ory (biLSTM) network with a feed-forward net-
work to predict the labels using softmaxes. We
use hard-sharing multi-task learning to predict
xi
and
li
.
2
The inputs to the network are randomly
initialized word embeddings and LSTM charac-
ter embeddings and optionally (see
§
4), PoS tag
embeddings. The appendix specifies the hyperpa-
rameters. For a homogeneous comparison against
work on the usefulness of PoS tags for transition
and graph-based models, and focused on efficiency,
we do not use large language models.
3 Controlling PoS tag accuracy
We purposefully change the accuracy of the PoS
tags in a treebank, effectively treating this accu-
racy as the independent variable in a controlled
experiment and LAS as the dependent variable,
i.e.
LAS =f(AccP oS )
where
f
is some function.
Rather than randomly altering the gold label of
2
We use a 2-task setup for all encodings, except
2pb
for
which we use 3 tasks, as each plane is predicted independently.
PoS tags, we alter them based on the actual errors
that PoS taggers make for a given treebank. This
means PoS tags that are more likely to be incorrect
for a given treebank will be more likely to be al-
tered when changing the overall PoS accuracy of
that treebank. We refer to this as the error rate for
PoS tags. The incorrect label is also based on the
most likely incorrect label for the PoS tag error for
that treebank based on the incorrect labeling from
the tagger. We refer to this as the error type, e.g.
NOUNVERB.
We trained BiLSTM taggers for each of the tree-
banks to get the error rates for each PoS tag type
and rate of each error type for each tag. Their
generally high performances, even for the smaller
treebanks, are shown in Table 5in the Appendix.
From the errors of these taggers, we first need
the estimated probability that a given PoS tag
t
is
tagged erroneously:
p(error |t) = Et
Ct
(1)
where
Et
is the error count for tag
t
and
Ct
is the
total count for tag
t
. Then we need the probability
of applying an erroneous tag
e
to a ground-truth
tag t:
p(e|t, error ) = Ete
Et
(2)
where
Ete
is the error count when labeling
t
as
e
.
This estimated probability remains fixed, whereas
p(error |t)is adjusted to vary the overall accuracy.
We adjust these values by applying a weight, γ:
γ=EA
E(3)
where
E
is the global error count and
EA
is the
adjusted global error count such that the resulting
tagging error is A.p(error |t)is then adjusted:
p(error |t) = γEt
Ct
(4)
It is possible that
γEt> Ct
. When this occurs
to tag
t
we cap
γEt
at
Ct
and then recalculate
γ
,
removing the counts associated with this tag:
γ=EACt
ECt
(5)
This is then done iteratively for each tag where
EtCt
until we obtain an error count for each tag
such that the total error count reaches EA.
These are all derived and applied as such to the
test set of treebanks as this is where we evaluate
Treebank Family # Trees # Tokens
LOW
Skolt SamiGiellagas Uralic (Sami) 200 2 461
GuajajaraTuDeT Tupian (Tupi-Guarani) 284 2 052
LigurianGLT IE (Romance) 316 6 928
BhojpuriBHTB IE (Indic) 357 6 665
MID
KicheIU Mayan 1 435 10 013
WelshCCG IE (Celtic) 2 111 41 208
ArmenianArmTDP IE (Armenian) 2 502 52 630
VietnameseVTB Austro-Asiatic (Viet-Muong) 3 000 43 754
HIGH
BasqueBDT Basque 8 993 121 443
TurkishBOUN Turkic (Southwestern) 9 761 122 383
BulgarianBTB IE (Slavic) 11 138 146 159
Ancient GreekPerseus IE (Greek) 13 919 202 989
V. HIGH
NorwegianBokm˚
al IE (Germanic) 20 044 310 221
KoreanKaist Korean 27 363 350 090
PersianPerDT IE (Iranian) 29 107 501 776
EstonianEDT Uralic (Finnic) 30 972 437 769
Table 1: Details of the treebanks used in this work.
the impact of PoS tag errors. To further echo the
erroneous nature of these taggers, when
EA
E
only the subset of real errors are used when
generating errors. When
EA> E
this subset of
real errors is maintained and subtracted such that:
p(error |t) = (γ1)Et
CtEt
(6)
and this is only applied on the tokens which were
not erroneously tagged by the taggers.
For every eligible token, based on its tag
t
an
error is generated based on
p(error |t)
and if an er-
ror is to be generated, the erroneous tag is selected
based on the distribution over p(e|t, error ).
This is also applied to the training and dev set as
it seems better to use predicted tags when training
(Anderson and G
´
omez-Rodr
´
ıguez,2020). There
are differences in the distribution of PoS tags and
as the algorithm is based on the test data, at times
it isn’t possible to get exactly
EA
. We therefore
allow a small variation of ±0.05 on EA.
We then selected a set of PoS tag accuracies
to test a range of values (75, 80, 85, 95, 97.5,
100). We included the 97.5% accuracy to evaluate
the findings of Anderson and G
´
omez-Rodr
´
ıguez
(2020), where they observed a severe increase in
performance between high scoring taggers and gold
tags, otherwise we use increments of 5%.
4 Experiments
We now present the experimental setup to deter-
mine how parsing scores evolve for the chosen
linearizations when the tagging accuracy degrades.
As evaluation metrics, we use Labeled (LAS) and
Unlabeled Attachment Scores (UAS).
Figure 1: Average LAS across all treebanks against
PoS tagging accuracies for different linearizations,
compared to the no-tags baselines.
Data
Treebanks from Table 1were selected using
a number of criteria. We chose treebanks that were
all from different language families and therefore
exhibit a range of linguistic behaviors. We also se-
lected treebanks such that we used 4 low-resource,
4 mid-resource, 4 high-resource and 4 very high-
resource treebanks. Within each of those categories,
we also selected treebanks with slightly different
amounts of data, so as to obtain an incremental
range of treebank sizes across low, mid, high and
very high boundaries. Moreover, we ensured the
quality of the treebanks by selecting treebanks that
were either manually annotated in the UD frame-
work or manually checked after automatic conver-
sions. When a treebank did not contain a develop-
ment set, we re-split the data by collecting the data
across the training and test data and split the full
data such that 60% was allocated to the training set,
10% to the development, and 30% to the test.
Setup
We train and test parsers on sets of pre-
dicted tags, as explained in
§
3.We consider two
baselines: (i) parsers trained without PoS tags
3
(
base-no-tags
), (ii) parsers trained with gold
tags on a multi-task setup (base-mtl).
4.1 Results
Table 2shows the average LAS scores across all
treebank setups for all encodings and tagging ac-
curacies, together with both baselines. To better
interpret the results and tendencies, we will also
visualize the results in different figures.
4
Note that
3Forced setup for rph, as PoS tags are needed to decode.
4
UAS results are shown in Figures 3and 4in the Appendix.
摘要:

ParsinglinearizationsappreciatePoStags-butsomearefussyabouterrorsAlbertoMu˜noz-Ortiz1,MarkAnderson2,DavidVilares1,CarlosG´omez-Rodr´guez11UniversidadedaCoru˜na,CITIC,Spain2PINCaerdydd,PrifysgolCaerdydd,UnitedKingdomalberto.munoz.ortiz@udc.es,andersonm8@caerdydd.ac.uk,david.vilares@udc.es,carlos.gom...

展开>> 收起<<
Parsing linearizations appreciate PoS tags - but some are fussy about errors Alberto Mu noz-Ortiz1 Mark Anderson2 David Vilares1 Carlos G omez-Rodr ıguez1.pdf

共12页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:12 页 大小:686.42KB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 12
客服
关注