
Parsing linearizations appreciate PoS tags - but some are fussy about
errors
Alberto Mu˜
noz-Ortiz1, Mark Anderson2, David Vilares1, Carlos G´
omez-Rodr´
ıguez1
1Universidade da Coru˜
na, CITIC, Spain
2PIN Caerdydd, Prifysgol Caerdydd, United Kingdom
alberto.munoz.ortiz@udc.es, andersonm8@caerdydd.ac.uk,
david.vilares@udc.es, carlos.gomez@udc.es
Abstract
PoS tags, once taken for granted as a use-
ful resource for syntactic parsing, have be-
come more situational with the populariza-
tion of deep learning. Recent work on the
impact of PoS tags on graph- and transition-
based parsers suggests that they are only useful
when tagging accuracy is prohibitively high,
or in low-resource scenarios. However, such
an analysis is lacking for the emerging se-
quence labeling parsing paradigm, where it is
especially relevant as some models explicitly
use PoS tags for encoding and decoding. We
undertake a study and uncover some trends.
Among them, PoS tags are generally more use-
ful for sequence labeling parsers than for other
paradigms, but the impact of their accuracy
is highly encoding-dependent, with the PoS-
based head-selection encoding being best only
when both tagging accuracy and resource avail-
ability are high.
1 Introduction
PoS tags have long been considered a useful fea-
ture for parsers, especially prior to the prevalence
of neural networks (Voutilainen,1998;Dalrym-
ple,2006;Alfared and B
´
echet,2012). For neural
parsers, it is less clear if they are useful or not.
Work has shown that when using word and charac-
ter embeddings, PoS tags become much less useful
(Ballesteros et al.,2015;de Lhoneux et al.,2017).
However, Dozat et al. (2017) found using universal
PoS (UPoS) tags to be somewhat helpful, but im-
provements are typically quite small (Smith et al.,
2018). Similarly, for multi-task systems, small im-
provements have been observed for both UPoS and
finer-grained tags (Zhang et al.,2020).
A limiting factor when using predicted PoS tags
is the apparent need for very high accuracy from
taggers (Anderson and G
´
omez-Rodr
´
ıguez,2020).
This is particularly problematic in a low-resource
setting where using gold tags gives unreasonably
high performance (Tiedemann,2015) and high ac-
curacy taggers are difficult to obtain (Kann et al.,
2020). However, some work has suggested that in a
low-resource setting even low accuracy taggers can
be beneficial for parsing performance, especially
when there is more PoS tag annotations than depen-
dency tree annotations (Anderson et al.,2021).
These findings relate to transition-based (TB)
and graph-based (GB) parsers, but recently sev-
eral encodings have been proposed to frame depen-
dency parsing as a sequence labeling task (Strzyz
et al.,2019;Lacroix,2019;G
´
omez-Rodr
´
ıguez
et al.,2020), providing an alternative to GB and
TB models when efficiency is a priority (Anderson
and G
´
omez-Rodr
´
ıguez,2021). Mu
˜
noz-Ortiz et al.
(2021) found that the amount of data required for
different encodings varied and that some were im-
pacted by predicted PoS tag use more than others.
Here, we evaluate the impact of PoS tagging ac-
curacy on different encodings and also the interplay
of this potential relation and the amount of avail-
able data (using low-, mid-, high-, and very-high-
resource treebanks). This is done by artificially
controlling the accuracy of PoS taggers by using
the nature of errors generated by robust taggers.1
2 Sequence labeling parsing
In dependency parsing as sequence labeling, the
goal is to assign a single label of the form
(xi, li)
to every input token
wi
of a sequence, where
xi
encodes a subset of the arcs related to
wi
and
li
is
the dependency type. Below, we review the existing
families of linearizations used in this work.
Head-selection
(Spoustov
´
a and Spousta,2010),
where
xi
encodes the head of
wi
using an abso-
lute index or a relative offset, that can be based
on some word property (usually PoS tags, which
is also the property we use in this work due to its
1
All source code available at
https://www.
grupolys.org/software/aacl2022/.
arXiv:2210.15219v1 [cs.CL] 27 Oct 2022