
Autoregressive Structured Prediction with Language Models
Tianyu LiuζYuchen Eleanor Jiangζ
Nicholas MonathγRyan CotterellζMrinmaya Sachanζ
ζETH Zürich γGoogle Research
{tianyu.liu,yuchen.jiang}@inf.ethz.ch
nmonath@google.com {ryan.cotterell,mrinmaya.sachan}@inf.ethz.ch
Abstract
In recent years, NLP has moved towards the
application of language models to a more
diverse set of tasks. However, applying
language models to structured prediction, e.g.,
predicting parse trees, taggings, and coref-
erence chains, is not straightforward. Prior
work on language model-based structured
prediction typically flattens the target structure
into a string to easily fit it into the language
modeling framework. Such flattening limits
the accessibility of structural information and
can lead to inferior performance compared to
approaches that overtly model the structure.
In this work, we propose to construct a
conditional language model over sequences
of structure-building actions, rather than over
strings in a way that makes it easier for the
model to pick up on intra-structure dependen-
cies. Our method sets the new state of the
art on named entity recognition, end-to-end
relation extraction, and coreference resolution.
https://github.com/lyutyuh/ASP
1 Introduction
Many common NLP tasks, e.g., named entity
recognition, relation extraction, and coreference
resolution are naturally taxonomized as structured
prediction, the supervised machine-learning task of
predicting a structure from a large
1
set. To general-
ize well to held-out data in a structured prediction
problem, the received wisdom has been that it
is necessary to correctly model complex depen-
dencies between different pieces of the structure.
However, a recent trend in structured prediction
for language has been to forgo explicitly modeling
such dependencies (Ma and Hovy,2016;Lee et al.,
2017;He et al.,2017,inter alia), and, instead, to
apply an expressive black-box model, e.g., a neural
network, with the hope that the model picks up on
the dependencies without explicit instruction.
1
Typically, large means exponential in the size of the input.
Framing structured prediction as conditional
language modeling is an increasingly common
black-box technique for building structured predic-
tors that has led to empirical success (Vinyals et al.,
2015;Raffel et al.,2020;Athiwaratkun et al.,2020;
De Cao et al.,2021;Paolini et al.,2021,inter alia).
The idea behind the framework is to encode the tar-
get structure as a string, flattening out the structure.
Then, one uses a conditional language model to
predict the flattened string encoding the structure.
For instance, Vinyals et al. (2015) flatten parse
trees into strings and predict the strings encoding
the flattened trees from the sentence with a machine
translation architecture. The hope is that the au-
toregressive nature of the language model allows it
to learn to model the intra-structure dependencies
and the necessary hard constraints that ensure the
model even produces well-formed structures. Ad-
ditionally, many modelers make use of pre-trained
language models (Lewis et al.,2020;Raffel et al.,
2020) to further improve the language models.
However, despite their empirical success, simply
hoping that a black-box approach correctly models
intricate intra-structure dependencies is often
insufficient for highly structured tasks (Paolini
et al.,2021, §1). Indeed, the act of flattening a
structured object into a string makes properly mod-
eling the intra-structure dependencies harder for
many tasks, e.g., those that involve nested spans or
long-distance dependencies. For instance, in coref-
erence resolution, a conference link between two
mentions can stretch across thousands of words,
and a coreference chain can also contain over a
hundred mentions (Pradhan et al.,2012). Flatten-
ing such a large amount of structured information
into a string makes the task more difficult to model.
In this paper, we propose a simple framework
that augments a conditional language model with
explicit modeling of structure. Instead of modeling
strings that encode a flattened representation of
the target structure, we model a constrained set
of actions that build the target structure step by
step; see Fig. 1for an example of our proposed
arXiv:2210.14698v2 [cs.CL] 17 Nov 2022