Parsing linearizations appreciate PoS tags - but some are fussy about errors Alberto Mu noz-Ortiz1 Mark Anderson2 David Vilares1 Carlos G omez-Rodr ıguez1

2025-05-02 0 0 686.42KB 12 页 10玖币

侵权投诉

Parsing linearizations appreciate PoS tags - but some are fussy about

errors

Alberto Mu˜

noz-Ortiz1, Mark Anderson2, David Vilares1, Carlos G´

omez-Rodr´

ıguez1

1Universidade da Coru˜

na, CITIC, Spain

2PIN Caerdydd, Prifysgol Caerdydd, United Kingdom

alberto.munoz.ortiz@udc.es, andersonm8@caerdydd.ac.uk,

david.vilares@udc.es, carlos.gomez@udc.es

Abstract

PoS tags, once taken for granted as a use-

ful resource for syntactic parsing, have be-

come more situational with the populariza-

tion of deep learning. Recent work on the

impact of PoS tags on graph- and transition-

based parsers suggests that they are only useful

when tagging accuracy is prohibitively high,

or in low-resource scenarios. However, such

an analysis is lacking for the emerging se-

quence labeling parsing paradigm, where it is

especially relevant as some models explicitly

use PoS tags for encoding and decoding. We

undertake a study and uncover some trends.

Among them, PoS tags are generally more use-

ful for sequence labeling parsers than for other

paradigms, but the impact of their accuracy

is highly encoding-dependent, with the PoS-

based head-selection encoding being best only

when both tagging accuracy and resource avail-

ability are high.

1 Introduction

PoS tags have long been considered a useful fea-

ture for parsers, especially prior to the prevalence

of neural networks (Voutilainen,1998;Dalrym-

ple,2006;Alfared and B

echet,2012). For neural

parsers, it is less clear if they are useful or not.

Work has shown that when using word and charac-

ter embeddings, PoS tags become much less useful

(Ballesteros et al.,2015;de Lhoneux et al.,2017).

However, Dozat et al. (2017) found using universal

PoS (UPoS) tags to be somewhat helpful, but im-

provements are typically quite small (Smith et al.,

2018). Similarly, for multi-task systems, small im-

provements have been observed for both UPoS and

ﬁner-grained tags (Zhang et al.,2020).

A limiting factor when using predicted PoS tags

is the apparent need for very high accuracy from

taggers (Anderson and G

omez-Rodr

ıguez,2020).

This is particularly problematic in a low-resource

setting where using gold tags gives unreasonably

high performance (Tiedemann,2015) and high ac-

curacy taggers are difﬁcult to obtain (Kann et al.,

2020). However, some work has suggested that in a

low-resource setting even low accuracy taggers can

be beneﬁcial for parsing performance, especially

when there is more PoS tag annotations than depen-

dency tree annotations (Anderson et al.,2021).

These ﬁndings relate to transition-based (TB)

and graph-based (GB) parsers, but recently sev-

eral encodings have been proposed to frame depen-

dency parsing as a sequence labeling task (Strzyz

et al.,2019;Lacroix,2019;G

omez-Rodr

ıguez

et al.,2020), providing an alternative to GB and

TB models when efﬁciency is a priority (Anderson

and G

omez-Rodr

ıguez,2021). Mu

noz-Ortiz et al.

(2021) found that the amount of data required for

different encodings varied and that some were im-

pacted by predicted PoS tag use more than others.

Here, we evaluate the impact of PoS tagging ac-

curacy on different encodings and also the interplay

of this potential relation and the amount of avail-

able data (using low-, mid-, high-, and very-high-

resource treebanks). This is done by artiﬁcially

controlling the accuracy of PoS taggers by using

the nature of errors generated by robust taggers.1

2 Sequence labeling parsing

In dependency parsing as sequence labeling, the

goal is to assign a single label of the form

(xi, li)

to every input token

of a sequence, where

encodes a subset of the arcs related to

and

the dependency type. Below, we review the existing

families of linearizations used in this work.

Head-selection

(Spoustov

a and Spousta,2010),

where

encodes the head of

using an abso-

lute index or a relative offset, that can be based

on some word property (usually PoS tags, which

is also the property we use in this work due to its

All source code available at

https://www.

grupolys.org/software/aacl2022/.

arXiv:2210.15219v1 [cs.CL] 27 Oct 2022

strong performance in previous work). So for in-

stance, if

= (+n,X), this would indicate that the

head of

is the

th word to the right of

with

the word property X. Some desirable properties of

this encoding family are a direct correspondence

between words and arcs and the capacity to encode

any non-projective tree. However, a major weak-

ness is its dependency on the chosen property (in

our case, PoS tags) to decode trees.

Bracketing-based xi

represents the dependency

arcs using a string of brackets, with each arc rep-

resented by a bracket pair. Its main advantage is

that it is independent of external features, but re-

garding projectivity it cannot represent arcs that

cross in the same direction. To alleviate this, we

use the encoding proposed by Strzyz et al. (2020),

that adds a second independent plane of brackets

(2pb), inspired by multiplanarity (Yli-Jyr¨

a,2003).

Transition-based

omez-Rodr

ıguez et al.,2020),

wheregiven a sequence of transitions generated by

a left-to-right transition-based parser, it splits it in

labels based on read transitions (e.g. SHIFT), such

that each word receives a label

with a subset

of transition actions. For this work, we consider

mappings from a projective algorithm, arc-hybrid

(

ahtb

;Kuhlmann et al.,2011) and a non projective

algorithm, Covington (ctb;Covington,2001).

2.1 Parser systems

We use a 2-layer bidirectional long short-term mem-

ory (biLSTM) network with a feed-forward net-

work to predict the labels using softmaxes. We

use hard-sharing multi-task learning to predict

and

The inputs to the network are randomly

initialized word embeddings and LSTM charac-

ter embeddings and optionally (see

4), PoS tag

embeddings. The appendix speciﬁes the hyperpa-

rameters. For a homogeneous comparison against

work on the usefulness of PoS tags for transition

and graph-based models, and focused on efﬁciency,

we do not use large language models.

3 Controlling PoS tag accuracy

We purposefully change the accuracy of the PoS

tags in a treebank, effectively treating this accu-

racy as the independent variable in a controlled

experiment and LAS as the dependent variable,

i.e.

LAS =f(AccP oS )

where

is some function.

Rather than randomly altering the gold label of

We use a 2-task setup for all encodings, except

2pb

for

which we use 3 tasks, as each plane is predicted independently.

PoS tags, we alter them based on the actual errors

that PoS taggers make for a given treebank. This

means PoS tags that are more likely to be incorrect

for a given treebank will be more likely to be al-

tered when changing the overall PoS accuracy of

that treebank. We refer to this as the error rate for

PoS tags. The incorrect label is also based on the

most likely incorrect label for the PoS tag error for

that treebank based on the incorrect labeling from

the tagger. We refer to this as the error type, e.g.

NOUN→VERB.

We trained BiLSTM taggers for each of the tree-

banks to get the error rates for each PoS tag type

and rate of each error type for each tag. Their

generally high performances, even for the smaller

treebanks, are shown in Table 5in the Appendix.

From the errors of these taggers, we ﬁrst need

the estimated probability that a given PoS tag

tagged erroneously:

p(error |t) = Et

(1)

where

is the error count for tag

and

is the

total count for tag

. Then we need the probability

of applying an erroneous tag

to a ground-truth

tag t:

p(e|t, error ) = Et→e

(2)

where

Et→e

is the error count when labeling

This estimated probability remains ﬁxed, whereas

p(error |t)is adjusted to vary the overall accuracy.

We adjust these values by applying a weight, γ:

γ=EA

E(3)

where

is the global error count and

is the

adjusted global error count such that the resulting

tagging error is A.p(error |t)is then adjusted:

p(error |t) = γEt

(4)

It is possible that

γEt> Ct

. When this occurs

to tag

we cap

γEt

and then recalculate

removing the counts associated with this tag:

γ=EA−Ct

E−Ct

(5)

This is then done iteratively for each tag where

Et≥Ct

until we obtain an error count for each tag

such that the total error count reaches EA.

These are all derived and applied as such to the

test set of treebanks as this is where we evaluate

Treebank Family # Trees # Tokens

LOW

Skolt SamiGiellagas Uralic (Sami) 200 2 461

GuajajaraTuDeT Tupian (Tupi-Guarani) 284 2 052

LigurianGLT IE (Romance) 316 6 928

BhojpuriBHTB IE (Indic) 357 6 665

MID

KicheIU Mayan 1 435 10 013

WelshCCG IE (Celtic) 2 111 41 208

ArmenianArmTDP IE (Armenian) 2 502 52 630

VietnameseVTB Austro-Asiatic (Viet-Muong) 3 000 43 754

HIGH

BasqueBDT Basque 8 993 121 443

TurkishBOUN Turkic (Southwestern) 9 761 122 383

BulgarianBTB IE (Slavic) 11 138 146 159

Ancient GreekPerseus IE (Greek) 13 919 202 989

V. HIGH

NorwegianBokm˚

al IE (Germanic) 20 044 310 221

KoreanKaist Korean 27 363 350 090

PersianPerDT IE (Iranian) 29 107 501 776

EstonianEDT Uralic (Finnic) 30 972 437 769

Table 1: Details of the treebanks used in this work.

the impact of PoS tag errors. To further echo the

erroneous nature of these taggers, when

EA≤

only the subset of real errors are used when

generating errors. When

EA> E

this subset of

real errors is maintained and subtracted such that:

p(error |t) = (γ−1)Et

Ct−Et

(6)

and this is only applied on the tokens which were

not erroneously tagged by the taggers.

For every eligible token, based on its tag

error is generated based on

p(error |t)

and if an er-

ror is to be generated, the erroneous tag is selected

based on the distribution over p(e|t, error ).

This is also applied to the training and dev set as

it seems better to use predicted tags when training

(Anderson and G

omez-Rodr

ıguez,2020). There

are differences in the distribution of PoS tags and

as the algorithm is based on the test data, at times

it isn’t possible to get exactly

. We therefore

allow a small variation of ±0.05 on EA.

We then selected a set of PoS tag accuracies

to test a range of values (75, 80, 85, 95, 97.5,

100). We included the 97.5% accuracy to evaluate

the ﬁndings of Anderson and G

omez-Rodr

ıguez

(2020), where they observed a severe increase in

performance between high scoring taggers and gold

tags, otherwise we use increments of 5%.

4 Experiments

We now present the experimental setup to deter-

mine how parsing scores evolve for the chosen

linearizations when the tagging accuracy degrades.

As evaluation metrics, we use Labeled (LAS) and

Unlabeled Attachment Scores (UAS).

Figure 1: Average LAS across all treebanks against

PoS tagging accuracies for different linearizations,

compared to the no-tags baselines.

Data

Treebanks from Table 1were selected using

a number of criteria. We chose treebanks that were

all from different language families and therefore

exhibit a range of linguistic behaviors. We also se-

lected treebanks such that we used 4 low-resource,

4 mid-resource, 4 high-resource and 4 very high-

resource treebanks. Within each of those categories,

we also selected treebanks with slightly different

amounts of data, so as to obtain an incremental

range of treebank sizes across low, mid, high and

very high boundaries. Moreover, we ensured the

quality of the treebanks by selecting treebanks that

were either manually annotated in the UD frame-

work or manually checked after automatic conver-

sions. When a treebank did not contain a develop-

ment set, we re-split the data by collecting the data

across the training and test data and split the full

data such that 60% was allocated to the training set,

10% to the development, and 30% to the test.

Setup

We train and test parsers on sets of pre-

dicted tags, as explained in

3.We consider two

baselines: (i) parsers trained without PoS tags

(

base-no-tags

), (ii) parsers trained with gold

tags on a multi-task setup (base-mtl).

4.1 Results

Table 2shows the average LAS scores across all

treebank setups for all encodings and tagging ac-

curacies, together with both baselines. To better

interpret the results and tendencies, we will also

visualize the results in different ﬁgures.

Note that

3Forced setup for rph, as PoS tags are needed to decode.

UAS results are shown in Figures 3and 4in the Appendix.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ParsinglinearizationsappreciatePoStags-butsomearefussyabouterrorsAlbertoMunoz-Ortiz1,MarkAnderson2,DavidVilares1,CarlosG´omez-Rodr´guez11UniversidadedaCoruna,CITIC,Spain2PINCaerdydd,PrifysgolCaerdydd,UnitedKingdomalberto.munoz.ortiz@udc.es,andersonm8@caerdydd.ac.uk,david.vilares@udc.es,carlos.gom...

展开>> 收起<<

Parsing linearizations appreciate PoS tags - but some are fussy about errors Alberto Mu noz-Ortiz1 Mark Anderson2 David Vilares1 Carlos G omez-Rodr ıguez1.pdf

共12页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Parsing linearizations appreciate PoS tags - but some are fussy about errors Alberto Mu noz-Ortiz1 Mark Anderson2 David Vilares1 Carlos G omez-Rodr ıguez1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: