ASDOT Any-Shot Data-to-Text Generation with Pretrained Language Models Jiannan Xiang1 Zhengzhong Liu13 Yucheng Zhou2 Eric P. Xing134 Zhiting Hu2

2025-05-06 0 0 601.15KB 14 页 10玖币
侵权投诉
ASDOT: Any-Shot Data-to-Text Generation
with Pretrained Language Models
Jiannan Xiang1, Zhengzhong Liu1,3, Yucheng Zhou2, Eric P. Xing1,3,4, Zhiting Hu2
1Carnegie Mellon University, 2UC San Diego,
3Petuum Inc., 4Mohamed Bin Zayed University of Artificial Intelligence
{jiannanx,liu,epxing}@andrew.cmu.edu, {yuz172,zhh019}@ucsd.edu
Abstract
Data-to-text generation is challenging due to
the great variety of the input data in terms of
domains (e.g., finance vs sports) or schemata
(e.g., diverse predicates). Recent end-to-end
neural methods thus require substantial train-
ing examples to learn to disambiguate and de-
scribe the data. Yet, real-world data-to-text
problems often suffer from various data-scarce
issues: one may have access to only a hand-
ful of or no training examples, and/or have
to rely on examples in a different domain or
schema. To fill this gap, we propose Any-Shot
Data-to-Text (ASDOT), a new approach flexi-
bly applicable to diverse settings by making ef-
ficient use of any given (or no) examples. AS-
DOT consists of two steps, data disambigua-
tion and sentence fusion, both of which are
amenable to be solved with off-the-shelf pre-
trained language models (LMs) with optional
finetuning. In the data disambiguation stage,
we employ the prompted GPT-3 model to un-
derstand possibly ambiguous triples from the
input data and convert each into a short sen-
tence with reduced ambiguity. The sentence
fusion stage then uses an LM like T5 to fuse
all the resulting sentences into a coherent para-
graph as the final description. We evaluate ex-
tensively on various datasets in different sce-
narios, including the zero-/few-/full-shot set-
tings, and generalization to unseen predicates
and out-of-domain data. Experimental results
show that ASDOT consistently achieves signif-
icant improvement over baselines, e.g., a 30.81
BLEU gain on the DART dataset under the
zero-shot setting.1
1 Introduction
Data-to-text generation (Kukich,1983a;Reiter and
Dale,1997) aims at generating natural language
text conditioned on structured data content such as
tables and graphs. The task has a broad range of ap-
plications such as task-oriented dialog (Wen et al.,
1
Code available at
https://github.com/szxiangjn/
any-shot-data2text
2015), weather forecasting (Goldberg et al.,1994;
Sripada et al.,2003), sports news reporting (Wise-
man et al.,2017), and biography generation (Lebret
et al.,2016a;Wang et al.,2018).
The problem is challenging in practice due to
the vast diversity of the input data in terms of the
domains (e.g., finance vs sports), schemata (e.g.,
the set of predicates, table structures), etc. The
inherent ambiguity makes it particularly difficult
to learn to understand and describe the data. For
instance, in the tuple
<Fearless, time, 2008>
from a music domain, the predicate word
time
means the release time of an album, while in
<100
metres, time, 9.58>
from sports it expresses
the world record time. Recent approaches based on
end-to-end neural models, e.g., by finetuning pre-
trained language models (LMs) (Puduppully et al.,
2019a;Koncel-Kedziorski et al.,2019;Zhao et al.,
2020), typically require massive training instances
to resolve the ambiguity and are not applicable to
many data-scarce scenarios.
In practice, a data-to-text problem of interest
may have a varying number of training examples,
ranging from a (small) set to only a few shots,
or even no examples at all, and sometimes may
rely on available examples out of the current do-
main to facilitate the generation. We refer to the
diverse practical scenarios as the any-shot data-to-
text problems. Recent work has studied data-to-text
solutions when limited examples are available, but
is often restricted to single specific settings. For
instance, Chen et al. (2020b) and Su et al. (2021) fo-
cused on few-shot problems but fail to apply when
no examples are accessible, while the zero-shot
neural pipeline by Kasner and Dusek (2022) re-
lies on human-crafted templates and thus could not
handle out-of-domain data.
In this paper, we develop Any-Shot Data-to-
Text (ASDOT), a new flexible approach that makes
efficient use of any given (or no) examples and
achieves stronger generation quality compared to
arXiv:2210.04325v3 [cs.CL] 22 Oct 2022
the prior specific methods. ASDOT draws inspira-
tion from how humans describe data, namely by
first disambiguating and understanding the data
content, and then fusing and organizing the infor-
mation together into text paragraphs. As a result,
given input data (e.g., a table or graph), ASDOT
consists of two intuitive steps, i.e., data disam-
biguation and sentence fusion. Importantly, each
of the two steps is amenable to be solved with the
appropriate off-the-shelf pretrained LMs with op-
tional finetuning, enabling the unique flexibility
of ASDOT in the presence of any-shot training ex-
amples. More specifically, in data disambiguation
aiming to understand each data entry (e.g., triple
<Fearless, time, 2008>
), we use the prompted
GPT-3 model (Radford et al.,2019), which has en-
coded rich commonsense and world knowledge, to
convert the triple into a short sentence (
Fearless
was released in 2008
) with greatly reduced
ambiguity. The subsequent sentence fusion stage
then uses another LM, such as T5 (Raffel et al.,
2020), to combine all the resulting sentences into
a coherent paragraph as the final description. The
sentence fusion as a sub-task allows us to incor-
porate any available in-/out-of-domain training ex-
amples as well as existing large weakly supervised
corpus (Kasner and Dusek,2022) to finetune the
LM and boost the performance.
We evaluate the proposed approach in a wide
range of practical any-shot scenarios, including
(1) the zero-/few-/full-shot setting where we have
access to a varying number of training examples,
(2) the unseen-predicates setting where we describe
the data of new predicates that are never seen in
the training examples, and (3) the out-of-domain
setting where we are presented only with examples
from other domains. Extensive experiments show
that our approach consistently achieves significant
gains over the diverse previous methods specifically
designed for each of the different scenarios.
2 Related Work
Data-to-text (D2T) generation is a long-standing
problem in natural language processing with broad
applications in practice. Early research on this task
focused on rule-based and pipeline approaches (Ku-
kich,1983b;Reiter and Dale,1997), decomposing
the task into text planning, sentence planning, and
linguistic realisation. Recent work has developed
various neural approaches. Lebret et al. (2016b)
used a neural encoder-decoder for the task, fol-
lowed by attention (Bahdanau et al.,2015), content
selection (Puduppully et al.,2019a), entity mod-
eling (Puduppully et al.,2019b), and style imita-
tion (Lin et al.,2020) for further improved per-
formance. Recent studies have also incorporated
pretrained LMs (Kale and Rastogi,2020b;Ribeiro
et al.,2021;Clive et al.,2021). Although previous
fully-supervised methods have achieved remark-
able performances, most of them require a large
amount of in-domain training examples, leading
to limited applicability to the common low-data
scenarios in practice.
Recent interests are aroused in zero-/few-shot
data-to-text generation problems. Chen et al.
(2020b) first formulated the few-shot setting and
incorporated a pretrained model with a pointer gen-
erator as a solution. Chen et al. (2020a) developed a
knowledge-grounded pretrained LM for both zero-
and few-shot data-to-text generation. Gong et al.
(2020) and Chen et al. (2020b) proposed to solve
the few-shot task with content matching and pro-
totype memory, respectively. There are also stud-
ies on combining templates and pretrained LM for
zero-/few-shot generation. For example, Kale and
Rastogi (2020a) trained a neural model to rewrite
templates for few-shot task-oriented dialogue. Hei-
dari et al. (2021) applied the idea of template rewrit-
ing to build a practical few-shot data-to-text system.
Most of the previous methods have each focused on
a specific setting (e.g., either zero- or few-shot). In
comparison, our work studies a wide spectrum of
any-shot scenarios with a varying number of train-
ing examples from current or different domains. Of
particular relevance to our work is the approach by
Kasner and Dusek (2022), which performs zero-
shot data-to-text generation by rephrasing given
templates. However, the approach relies on human-
written templates for data disambiguation and thus
has limited applicability to wide domains. Besides,
the approach involves several components (order-
ing, aggregation, compression) to fuse sentences,
which restricts the use of any-shot examples for
improvement. The approach thus studies only in
zero-shot settings, while our work makes a compre-
hensive study on the diverse any-shot problems.
3 Any-Shot Data-to-Text Generation
We propose ASDOT for any-shot data-to-text gen-
eration. §3.1 describes the any-shot problems. We
then provide an overview of our method (§3.2) and
Prompt 2
Table: Michael |birth Place |USA
Text: Michael was born in the USA.
……
Table: Buzz Aldrin |birthPlace |Glen Ridge
New Jersey
Text:
Prompt 1
Table: Michael |birth Place |USA
Text: Michael was born in the USA.
……
Table: Apollo 11 |operator |NASA
Text:
birthPlace
SelectedByNASA
Buzz Aldrin
Glen Ridge
New Jersey
Essex County
New Jersey
isPartOf
crewMember
operator
Apollo 11
backupPilot
William
Anders 1963 NASA
<Apollo 11,operator, NASA>
<Buzz Aldrin,birthPlace, Glen Ridge
New Jersey>
……
Data triples
Input Data
Apollo is
operated by
NASA.
Buzz Aldrin was
born in Glen
Ridge, New
Jersey.
……
……
GPT-3
Data Disambiguation Sentence Fusion
Final output
Buzz Aldrin was
born in Glen Ridge,
Essex County, New
Jersey. He went on
to become a crew
member on Apollo
11 which was
operated by NASA
in 1963. William
Anders was the
backup pilot for
Apollo 11.
Short sentences
PLM
(e.g. T5)
Weakly-
supervised
finetuning
Any-shot
finetuning
Figure 1: An overview of our method. Our approach consists of two core steps, i.e., data disambiguation 3.3)
and sentence fusion 3.4). The approach first leverages a prompted GPT-3 to convert each data triple into short sen-
tences with reduced ambiguity. The resulting sentences are then fused by a pretrained LM with optional finetuning
using public weakly-supervised corpus or available training examples.
give details of each of the components (§3.3,3.4).
Figure 1illustrates our method.
3.1 The Any-Shot Data-to-Text Problems
In the data-to-text generation task, we are given
structured data (e.g., a table or graph) as in-
put, which can be represented as a set of triples
{x1,x2, ..., xn}
. Each triple
xi=hsi, pi, oii
,
such as
<Apollo 11, operator, NASA>
as in Fig-
ure 1, consists of a subject
si
, a predicate
pi
, and
an object
oi
, which expresses a relation between
the subject and the object. The goal of the task is
to generate a paragraph consisting of a sequence of
words
y={y1, y2, ..., ym}
that can describe the
input data faithfully and fluently.
Due to the vast diversity of the content domains,
data structures, and predicate sets, etc., building
a data-to-text solution often suffers from insuf-
ficient training examples for learning to under-
stand/describe the target data. In practice, most
often we are presented with a varying number of la-
beled examples, directly or remotely related to the
target data. For instance, we may need to describe
a table from a financial report on a new website,
where we have no access to any labeled examples
(i.e., zero-shot) or have access to only a few de-
scription examples (i.e., few-shot). Besides, the
available examples may not even be in the finan-
cial domain (out of domain), or uses different table
structures (different schemata) and different table
headers (different predicates). We refer to the data-
to-text training in the various practical scenarios as
the any-shot problem. It is highly desirable to de-
velop a general approach that is widely applicable
to the different settings.
3.2 Method Overview
Intuitively, a data-to-text generation process con-
sists of two core steps, namely, (1) disambiguating
and understanding the data triples, and (2) produc-
ing the text description. Previous neural approaches
typically model the task in an end-to-end manner
and require a large number of training examples to
learn the data-to-text mapping. In contrast, we take
advantage of the task structure by formulating the
two stages and solving each with appropriate re-
sources (e.g., pretrained LMs) that are readily avail-
able. Figure 1offers an overview of the approach.
Specifically, since each data triple is inherently am-
biguous given the compact predicate words, rich
commonsense and world knowledge is required to
correctly understand the content. For instance, in
<Apollo 11, operator, NASA>
, a model would
need knowledge to determine that
NASA
operates
Apollo 11
rather than the other way around. There-
fore, in the data disambiguation stage, we leverage
a powerful LM—GPT-3 in our case—that contains
massive implicit knowledge in the parameters, to
convert each triple into short sentences with re-
duced ambiguity (e.g.,
Apollo is operated by
NASA
). Once we collect a set of short sentences,
in the sentence fusion stage, we use another pre-
trained LM with optional finetuning to compose the
sentences into a well-formed paragraph. The stage
offers the flexibility to make use of any available
training example to boost performance.
3.3 Data Disambiguation
In this stage, the goal is to generate a short sentence
to describe each data triple precisely. As above, a
triple can be highly abstract and ambiguous as it
摘要:

ASDOT:Any-ShotData-to-TextGenerationwithPretrainedLanguageModelsJiannanXiang1,ZhengzhongLiu1;3,YuchengZhou2,EricP.Xing1;3;4,ZhitingHu21CarnegieMellonUniversity,2UCSanDiego,3PetuumInc.,4MohamedBinZayedUniversityofArticialIntelligence{jiannanx,liu,epxing}@andrew.cmu.edu,{yuz172,zhh019}@ucsd.eduAbstra...

展开>> 收起<<
ASDOT Any-Shot Data-to-Text Generation with Pretrained Language Models Jiannan Xiang1 Zhengzhong Liu13 Yucheng Zhou2 Eric P. Xing134 Zhiting Hu2.pdf

共14页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:14 页 大小:601.15KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 14
客服
关注