ASDOT Any-Shot Data-to-Text Generation with Pretrained Language Models Jiannan Xiang1 Zhengzhong Liu13 Yucheng Zhou2 Eric P. Xing134 Zhiting Hu2

2025-05-06 0 0 601.15KB 14 页 10玖币

侵权投诉

ASDOT: Any-Shot Data-to-Text Generation

with Pretrained Language Models

Jiannan Xiang1, Zhengzhong Liu1,3, Yucheng Zhou2, Eric P. Xing1,3,4, Zhiting Hu2

1Carnegie Mellon University, 2UC San Diego,

3Petuum Inc., 4Mohamed Bin Zayed University of Artiﬁcial Intelligence

{jiannanx,liu,epxing}@andrew.cmu.edu, {yuz172,zhh019}@ucsd.edu

Abstract

Data-to-text generation is challenging due to

the great variety of the input data in terms of

domains (e.g., ﬁnance vs sports) or schemata

(e.g., diverse predicates). Recent end-to-end

neural methods thus require substantial train-

ing examples to learn to disambiguate and de-

scribe the data. Yet, real-world data-to-text

problems often suffer from various data-scarce

issues: one may have access to only a hand-

ful of or no training examples, and/or have

to rely on examples in a different domain or

schema. To ﬁll this gap, we propose Any-Shot

Data-to-Text (ASDOT), a new approach ﬂexi-

bly applicable to diverse settings by making ef-

ﬁcient use of any given (or no) examples. AS-

DOT consists of two steps, data disambigua-

tion and sentence fusion, both of which are

amenable to be solved with off-the-shelf pre-

trained language models (LMs) with optional

ﬁnetuning. In the data disambiguation stage,

we employ the prompted GPT-3 model to un-

derstand possibly ambiguous triples from the

input data and convert each into a short sen-

tence with reduced ambiguity. The sentence

fusion stage then uses an LM like T5 to fuse

all the resulting sentences into a coherent para-

graph as the ﬁnal description. We evaluate ex-

tensively on various datasets in different sce-

narios, including the zero-/few-/full-shot set-

tings, and generalization to unseen predicates

and out-of-domain data. Experimental results

show that ASDOT consistently achieves signif-

icant improvement over baselines, e.g., a 30.81

BLEU gain on the DART dataset under the

zero-shot setting.1

1 Introduction

Data-to-text generation (Kukich,1983a;Reiter and

Dale,1997) aims at generating natural language

text conditioned on structured data content such as

tables and graphs. The task has a broad range of ap-

plications such as task-oriented dialog (Wen et al.,

Code available at

https://github.com/szxiangjn/

any-shot-data2text

2015), weather forecasting (Goldberg et al.,1994;

Sripada et al.,2003), sports news reporting (Wise-

man et al.,2017), and biography generation (Lebret

et al.,2016a;Wang et al.,2018).

The problem is challenging in practice due to

the vast diversity of the input data in terms of the

domains (e.g., ﬁnance vs sports), schemata (e.g.,

the set of predicates, table structures), etc. The

inherent ambiguity makes it particularly difﬁcult

to learn to understand and describe the data. For

instance, in the tuple

<Fearless, time, 2008>

from a music domain, the predicate word

time

means the release time of an album, while in

<100

metres, time, 9.58>

from sports it expresses

the world record time. Recent approaches based on

end-to-end neural models, e.g., by ﬁnetuning pre-

trained language models (LMs) (Puduppully et al.,

2019a;Koncel-Kedziorski et al.,2019;Zhao et al.,

2020), typically require massive training instances

to resolve the ambiguity and are not applicable to

many data-scarce scenarios.

In practice, a data-to-text problem of interest

may have a varying number of training examples,

ranging from a (small) set to only a few shots,

or even no examples at all, and sometimes may

rely on available examples out of the current do-

main to facilitate the generation. We refer to the

diverse practical scenarios as the any-shot data-to-

text problems. Recent work has studied data-to-text

solutions when limited examples are available, but

is often restricted to single speciﬁc settings. For

instance, Chen et al. (2020b) and Su et al. (2021) fo-

cused on few-shot problems but fail to apply when

no examples are accessible, while the zero-shot

neural pipeline by Kasner and Dusek (2022) re-

lies on human-crafted templates and thus could not

handle out-of-domain data.

In this paper, we develop Any-Shot Data-to-

Text (ASDOT), a new ﬂexible approach that makes

efﬁcient use of any given (or no) examples and

achieves stronger generation quality compared to

arXiv:2210.04325v3 [cs.CL] 22 Oct 2022

the prior speciﬁc methods. ASDOT draws inspira-

tion from how humans describe data, namely by

ﬁrst disambiguating and understanding the data

content, and then fusing and organizing the infor-

mation together into text paragraphs. As a result,

given input data (e.g., a table or graph), ASDOT

consists of two intuitive steps, i.e., data disam-

biguation and sentence fusion. Importantly, each

of the two steps is amenable to be solved with the

appropriate off-the-shelf pretrained LMs with op-

tional ﬁnetuning, enabling the unique ﬂexibility

of ASDOT in the presence of any-shot training ex-

amples. More speciﬁcally, in data disambiguation

aiming to understand each data entry (e.g., triple

<Fearless, time, 2008>

), we use the prompted

GPT-3 model (Radford et al.,2019), which has en-

coded rich commonsense and world knowledge, to

convert the triple into a short sentence (

Fearless

was released in 2008

) with greatly reduced

ambiguity. The subsequent sentence fusion stage

then uses another LM, such as T5 (Raffel et al.,

2020), to combine all the resulting sentences into

a coherent paragraph as the ﬁnal description. The

sentence fusion as a sub-task allows us to incor-

porate any available in-/out-of-domain training ex-

amples as well as existing large weakly supervised

corpus (Kasner and Dusek,2022) to ﬁnetune the

LM and boost the performance.

We evaluate the proposed approach in a wide

range of practical any-shot scenarios, including

(1) the zero-/few-/full-shot setting where we have

access to a varying number of training examples,

(2) the unseen-predicates setting where we describe

the data of new predicates that are never seen in

the training examples, and (3) the out-of-domain

setting where we are presented only with examples

from other domains. Extensive experiments show

that our approach consistently achieves signiﬁcant

gains over the diverse previous methods speciﬁcally

designed for each of the different scenarios.

2 Related Work

Data-to-text (D2T) generation is a long-standing

problem in natural language processing with broad

applications in practice. Early research on this task

focused on rule-based and pipeline approaches (Ku-

kich,1983b;Reiter and Dale,1997), decomposing

the task into text planning, sentence planning, and

linguistic realisation. Recent work has developed

various neural approaches. Lebret et al. (2016b)

used a neural encoder-decoder for the task, fol-

lowed by attention (Bahdanau et al.,2015), content

selection (Puduppully et al.,2019a), entity mod-

eling (Puduppully et al.,2019b), and style imita-

tion (Lin et al.,2020) for further improved per-

formance. Recent studies have also incorporated

pretrained LMs (Kale and Rastogi,2020b;Ribeiro

et al.,2021;Clive et al.,2021). Although previous

fully-supervised methods have achieved remark-

able performances, most of them require a large

amount of in-domain training examples, leading

to limited applicability to the common low-data

scenarios in practice.

Recent interests are aroused in zero-/few-shot

data-to-text generation problems. Chen et al.

(2020b) ﬁrst formulated the few-shot setting and

incorporated a pretrained model with a pointer gen-

erator as a solution. Chen et al. (2020a) developed a

knowledge-grounded pretrained LM for both zero-

and few-shot data-to-text generation. Gong et al.

(2020) and Chen et al. (2020b) proposed to solve

the few-shot task with content matching and pro-

totype memory, respectively. There are also stud-

ies on combining templates and pretrained LM for

zero-/few-shot generation. For example, Kale and

Rastogi (2020a) trained a neural model to rewrite

templates for few-shot task-oriented dialogue. Hei-

dari et al. (2021) applied the idea of template rewrit-

ing to build a practical few-shot data-to-text system.

Most of the previous methods have each focused on

a speciﬁc setting (e.g., either zero- or few-shot). In

comparison, our work studies a wide spectrum of

any-shot scenarios with a varying number of train-

ing examples from current or different domains. Of

particular relevance to our work is the approach by

Kasner and Dusek (2022), which performs zero-

shot data-to-text generation by rephrasing given

templates. However, the approach relies on human-

written templates for data disambiguation and thus

has limited applicability to wide domains. Besides,

the approach involves several components (order-

ing, aggregation, compression) to fuse sentences,

which restricts the use of any-shot examples for

improvement. The approach thus studies only in

zero-shot settings, while our work makes a compre-

hensive study on the diverse any-shot problems.

3 Any-Shot Data-to-Text Generation

We propose ASDOT for any-shot data-to-text gen-

eration. §3.1 describes the any-shot problems. We

then provide an overview of our method (§3.2) and

Prompt 2

Table: Michael |birth Place |USA

Text: Michael was born in the USA.

……

Table: Buzz Aldrin |birthPlace |Glen Ridge

New Jersey

Text:

Prompt 1

Table: Michael |birth Place |USA

Text: Michael was born in the USA.

……

Table: Apollo 11 |operator |NASA

Text:

birthPlace

SelectedByNASA

Buzz Aldrin

Glen Ridge

New Jersey

Essex County

New Jersey

isPartOf

crewMember

operator

Apollo 11

backupPilot

William

Anders 1963 NASA

<Buzz Aldrin,birthPlace, Glen Ridge

New Jersey>

……

Data triples

Input Data

Apollo is

operated by

NASA.

Buzz Aldrin was

born in Glen

Ridge, New

Jersey.

……

GPT-3

Data Disambiguation Sentence Fusion

Final output

Buzz Aldrin was

born in Glen Ridge,

Essex County, New

Jersey. He went on

to become a crew

member on Apollo

11 which was

operated by NASA

in 1963. William

Anders was the

backup pilot for

Apollo 11.…

Short sentences

PLM

(e.g. T5)

Weakly-

supervised

finetuning

Any-shot

finetuning

Figure 1: An overview of our method. Our approach consists of two core steps, i.e., data disambiguation (§3.3)

and sentence fusion (§3.4). The approach ﬁrst leverages a prompted GPT-3 to convert each data triple into short sen-

tences with reduced ambiguity. The resulting sentences are then fused by a pretrained LM with optional ﬁnetuning

using public weakly-supervised corpus or available training examples.

give details of each of the components (§3.3,3.4).

Figure 1illustrates our method.

3.1 The Any-Shot Data-to-Text Problems

In the data-to-text generation task, we are given

structured data (e.g., a table or graph) as in-

put, which can be represented as a set of triples

{x1,x2, ..., xn}

. Each triple

xi=hsi, pi, oii

such as

as in Fig-

ure 1, consists of a subject

, a predicate

, and

an object

, which expresses a relation between

the subject and the object. The goal of the task is

to generate a paragraph consisting of a sequence of

words

y={y1, y2, ..., ym}

that can describe the

input data faithfully and ﬂuently.

Due to the vast diversity of the content domains,

data structures, and predicate sets, etc., building

a data-to-text solution often suffers from insuf-

ﬁcient training examples for learning to under-

stand/describe the target data. In practice, most

often we are presented with a varying number of la-

beled examples, directly or remotely related to the

target data. For instance, we may need to describe

a table from a ﬁnancial report on a new website,

where we have no access to any labeled examples

(i.e., zero-shot) or have access to only a few de-

scription examples (i.e., few-shot). Besides, the

available examples may not even be in the ﬁnan-

cial domain (out of domain), or uses different table

structures (different schemata) and different table

headers (different predicates). We refer to the data-

to-text training in the various practical scenarios as

the any-shot problem. It is highly desirable to de-

velop a general approach that is widely applicable

to the different settings.

3.2 Method Overview

Intuitively, a data-to-text generation process con-

sists of two core steps, namely, (1) disambiguating

and understanding the data triples, and (2) produc-

ing the text description. Previous neural approaches

typically model the task in an end-to-end manner

and require a large number of training examples to

learn the data-to-text mapping. In contrast, we take

advantage of the task structure by formulating the

two stages and solving each with appropriate re-

sources (e.g., pretrained LMs) that are readily avail-

able. Figure 1offers an overview of the approach.

Speciﬁcally, since each data triple is inherently am-

biguous given the compact predicate words, rich

commonsense and world knowledge is required to

correctly understand the content. For instance, in

, a model would

need knowledge to determine that

NASA

operates

Apollo 11

rather than the other way around. There-

fore, in the data disambiguation stage, we leverage

a powerful LM—GPT-3 in our case—that contains

massive implicit knowledge in the parameters, to

convert each triple into short sentences with re-

duced ambiguity (e.g.,

Apollo is operated by

NASA

). Once we collect a set of short sentences,

in the sentence fusion stage, we use another pre-

trained LM with optional ﬁnetuning to compose the

sentences into a well-formed paragraph. The stage

offers the ﬂexibility to make use of any available

training example to boost performance.

3.3 Data Disambiguation

In this stage, the goal is to generate a short sentence

to describe each data triple precisely. As above, a

triple can be highly abstract and ambiguous as it

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ASDOT:Any-ShotData-to-TextGenerationwithPretrainedLanguageModelsJiannanXiang1,ZhengzhongLiu1;3,YuchengZhou2,EricP.Xing1;3;4,ZhitingHu21CarnegieMellonUniversity,2UCSanDiego,3PetuumInc.,4MohamedBinZayedUniversityofArticialIntelligence{jiannanx,liu,epxing}@andrew.cmu.edu,{yuz172,zhh019}@ucsd.eduAbstra...

展开>> 收起<<

ASDOT Any-Shot Data-to-Text Generation with Pretrained Language Models Jiannan Xiang1 Zhengzhong Liu13 Yucheng Zhou2 Eric P. Xing134 Zhiting Hu2.pdf

共14页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

ASDOT Any-Shot Data-to-Text Generation with Pretrained Language Models Jiannan Xiang1 Zhengzhong Liu13 Yucheng Zhou2 Eric P. Xing134 Zhiting Hu2

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: