Towards Generalizable and Robust Text-to-SQL Parsing Chang Gao1 Bowen Li2 Wenxuan Zhang2 Wai Lam1y Binhua Li2 Fei Huang2 Luo Si2and Yongbin Li2y

2025-05-06 3 0 452.23KB 13 页 10玖币

侵权投诉

Towards Generalizable and Robust Text-to-SQL Parsing∗

Chang Gao1, Bowen Li2, Wenxuan Zhang2, Wai Lam1†

, Binhua Li2,

Fei Huang2, Luo Si2and Yongbin Li2†

1The Chinese University of Hong Kong

2DAMO Academy, Alibaba Group

{gaochang,wlam}@se.cuhk.edu.hk, libowen.ne@gmail.com

{saike.zwx,binhua.lbh,shuide.lyb}@alibaba-inc.com

Abstract

Text-to-SQL parsing tackles the problem of

mapping natural language questions to exe-

cutable SQL queries. In practice, text-to-SQL

parsers often encounter various challenging

scenarios, requiring them to be generalizable

and robust. While most existing work ad-

dresses a particular generalization or robust-

ness challenge, we aim to study it in a more

comprehensive manner. In speciﬁc, we be-

lieve that text-to-SQL parsers should be (1)

generalizable at three levels of generaliza-

tion, namely i.i.d.,zero-shot, and composi-

tional, and (2) robust against input perturba-

tions. To enhance these capabilities of the

parser, we propose a novel TKK framework

consisting of Task decomposition, Knowledge

acquisition, and Knowledge composition to

learn text-to-SQL parsing in stages. By divid-

ing the learning process into multiple stages,

our framework improves the parser’s ability

to acquire general SQL knowledge instead of

capturing spurious patterns, making it more

generalizable and robust. Experimental re-

sults under various generalization and robust-

ness settings show that our framework is ef-

fective in all scenarios and achieves state-

of-the-art performance on the Spider, SParC,

and CoSQL datasets. Code can be found

at https://github.com/AlibabaResearch/

DAMO-ConvAI/tree/main/tkk.

1 Introduction

Text-to-SQL parsing aims to translate natural lan-

guage questions to SQL queries that can be exe-

cuted on databases to produce answers (Lin et al.,

2020), which bridges the gap between expert pro-

grammers and ordinary users who are not proﬁcient

in writing SQL queries. Thus, it has drawn great

∗

Work done when Chang Gao was an intern at Alibaba.

The work described in this paper is substantially supported

by a grant from the Research Grant Council of the Hong

Kong Special Administrative Region, China (Project Code:

14204418).

†Corresponding authors.

attention in recent years (Zhong et al.,2017;Suhr

et al.,2020;Scholak et al.,2021;Hui et al.,2022;

Qin et al.,2022a,b).

Early work in this ﬁeld (Zelle and Mooney,1996;

Yaghmazadeh et al.,2017;Iyer et al.,2017) mainly

focuses on i.i.d. generalization. They only use

a single database, and the exact same target SQL

query may appear in both the training and test sets.

However, it is difﬁcult to collect sufﬁcient training

data to cover all the questions users may ask (Gu

et al.,2021) and the predictions of test examples

might be obtained by semantic matching instead

of semantic parsing (Yu et al.,2018b), limiting the

generalization ability of parsers. Subsequent work

further focuses on generalizable text-to-SQL pars-

ing in terms of two aspects: zero-shot generaliza-

tion and compositional generalization. Zero-shot

generalization requires the parser to generalize to

unseen database schemas. Thanks to large-scale

datasets such as Spider (Yu et al.,2018b), SParC

(Yu et al.,2019b), and CoSQL (Yu et al.,2019a),

zero-shot generalization has been the most popu-

lar setting for text-to-SQL parsing in recent years.

Various methods involving designing graph-based

encoders (Wang et al.,2020;Cao et al.,2021) and

syntax tree decoders (Yu et al.,2018a;Rubin and

Berant,2021) have been developed to tackle this

challenge. Compositional generalization is the de-

sired ability to generalize to test examples consist-

ing of novel combinations of components observed

during training. Finegan-Dollak et al. (2018) ex-

plore compositional generalization in text-to-SQL

parsing focusing on template-based query splits.

Shaw et al. (2021) provide new splits of Spider

considering length, query template, and query com-

pound divergence to create challenging evaluations

of compositional generalization.

Another challenge of conducting text-to-SQL

parsing in practice is robustness. Existing text-to-

SQL models have been found vulnerable to input

perturbations (Deng et al.,2021;Gan et al.,2021a;

arXiv:2210.12674v1 [cs.CL] 23 Oct 2022

[SELECT] What are the names of documents that

have both one of the three most common types and

one of three most common structures? ; {S} ;{C}

[FROM] What are the names of documents that

have both one of the three most common types and

one of three most common structures? ; {S} ; {C}

[WHERE] What are the names of documents that

have both one of the three most common types and

one of three most common structures? ; {S} ;{C}

[GROUP_BY] [HAVING] [ORDER_BY] [LIMIT]

What are the names of documents that have both

one of the three most common types and one of

three most common structures? ; {S} ;{C}

[SQL]What are the names of documents that have

both one of the three most common types and one

of three most common structures? ; {S} ;{C}

[SELECT]document_name

[FROM]documents

[WHERE]

[GROUP_BY]document_type_code [HAVING] [ORDER_BY]

count(*) desc [LIMIT]3

[INTERSECT] [SELECT]document_name [FROM]documents

[GROUP_BY]document_structure_code [ORDER_BY]count(*)

desc [LIMIT]3

TKK

[SELECT] [FROM] [WHERE] [GROUP_BY]

[HAVING] [ORDER_BY] [LIMIT] [SQL]What are

the names of documents that have both one of the

three most common types and one of three most

common structures? ; {S} ;{C}

[SELECT]document_name [FROM]documents [GROUP_BY]

document_type_code [ORDER_BY]count(*) desc [LIMIT]3

[INTERSECT] [SELECT]document_name [FROM]documents

[GROUP_BY]document_structure_code [ORDER_BY]count(*)

desc [LIMIT]3

TKK

Task Decomposition

Knowledge Acquisition

Knowledge Composition

SELECT document_name FROM documents GROUP BY

document_type_code ORDER BY count(*) desc LIMIT 3

INTERSECT SELECT document_name FROM documents GROUP

BY document_structure_code ORDER BY count(*) desc LIMIT 3

Text-to-SQL

Parsing

What are the names of documents that have both

one of the three most common types and one of

three most common structures? ; {S} ;{C}

SELECT

Subtask

FROM

Subtask

WHERE

Subtask

GHOL

Subtask

SQL

Subtask

Main

task

Figure 1: Overview of our TKK framework. {S} and {C} denote the database schema and context, respectively.

Pi et al.,2022). For example, Gan et al. (2021a)

replace schema-related words in natural language

questions with manually selected synonyms and ob-

serve a dramatic performance drop. They propose

two approaches, namely multi-annotation selection

and adversarial training, to improve model robust-

ness against synonym substitution.

Although specialized model architectures and

training approaches have been proposed to address

a particular generalization or robustness challenge,

we believe that practical text-to-SQL parsers should

be built with strong generalizability in terms of

all

three levels of generalization

, namely i.i.d.,zero-

shot, and compositional, and

robustness

against

input perturbations. To obtain such capabilities,

it can be noticed that humans often learn to write

each clause, such as

SELECT

WHERE

, for a ba-

sic operation, before composing them to fulﬁll a

more challenging goal, i.e., writing the entire SQL

query. In contrast, most existing methods adopt a

one-stage learning paradigm, i.e., learning to write

each SQL clause and the dependency between dif-

ferent clauses simultaneously. This may lead the

model to capture spurious patterns between the

question, database schema, and SQL query instead

of learning general SQL knowledge.

To this end, we propose a novel framework

consisting of three learning stages including

ask decomposition,

nowledge acquisition, and

nowledge composition (TKK) for text-to-SQL

parsing, which mimics the human learning pro-

cedure to learn to handle the task in stages. Specif-

ically, in the task decomposition stage, TKK de-

composes the original task into several subtasks.

Each subtask corresponds to mapping the natural

language question to one or more clauses of the

SQL query, as shown in the top portion of Figure 1.

Afterwards, TKK features a prompt-based learning

strategy to separately acquire the knowledge of sub-

tasks and employ the learned knowledge to tackle

the main task, i.e., generating the entire SQL query.

In the knowledge acquisition stage, TKK trains the

model with all the subtasks in a multi-task learning

manner; in the knowledge composition stage, TKK

ﬁne-tunes the model with the main task to combine

the acquired knowledge of subtasks and learn the

dependency between them.

The advantages of our three-stage framework

over previous one-stage learning methods are three-

fold: (1) it reduces the difﬁculty of model learn-

ing by dividing the learning process into multiple

easier-to-learn stages; (2) it explicitly forces the

model to learn the alignment between the ques-

tion, database schema, and each SQL clause as it

needs to identify the intent expressed in the ques-

tion based on the schema to generate a speciﬁc

clause; (3) by explicitly constructing the training

data for each subtask, it is easier for the model to

learn the knowledge required to translate the ques-

tion into each SQL clause. These advantages help

the model to learn general SQL knowledge rather

than some dataset-speciﬁc patterns, making it more

generalizable and robust.

To verify the effectiveness of our framework,

we conduct comprehensive evaluations on repre-

sentative benchmarks covering all three levels of

generalization and robustness scenarios with pre-

trained sequence-to-sequence models. Experimen-

tal results and analysis show that: (1) we achieve

state-of-the-art performance on the Spider, SParC,

and CoSQL datasets; (2) our method outperforms

vanilla sequence-to-sequence models in all scenar-

ios; (3) our framework signiﬁcantly improves the

model’s ability to generate complex SQL queries;

(4) our framework is also effective in the low-

resource setting.

2 Background

Notations

We use the lowercase letter

denote a natural language question and denote

its corresponding database schema, context, and

SQL query as

, and

, respectively. We

represent the set of training examples

(q, sq, cq, lq)

Dtrain

and test set as

Dtest

. A perturbed test

set

test

could be constructed by perturbations

to questions such as synonym substitution to

form

(q0, sq, cq, lq)

. We denote

Strain

as the set

of database schemas of

Dtrain

Ltrain

as the set

of SQL queries of

Dtrain

, and

Qtest

as the set of

questions of Dtest.

Problem Deﬁnition

Given

(q, sq, cq)

, where

the database schema

consists of tables and

columns, and context

is the interaction history

consisting of previous questions and system

clariﬁcation in the multi-turn setting or empty in

the single-turn setting, the goal is to generate the

correct SQL query lq.

Generalization and Robustness

Following

Gu et al. (2021) and Wang et al. (2022b), we

formalize three levels of generalization and

robustness as follows:

Zero-shot generalization:

∀q∈ Qtest, sq6∈

Strain.

Compositional generalization:

∀q∈

Qtest, sq∈ Strain, lq6∈ Ltrain.

I.I.D. generalization:

∀q∈ Qtest, sq∈ Strain

Dtrain and Dtest follow the same distribution.

Robustness: training with

Dtrain

but adopting

test instead of Dtest for evaluation.

3 Our TKK Framework

TKK consists of three learning stages: task decom-

position, knowledge acquisition, and knowledge

composition. In this section, we ﬁrst introduce

each stage in detail. Then we describe the training

and inference of TKK.

3.1 Three Stages of TKK

Task Decomposition

As shown in Figure 1, we

decompose the text-to-SQL parsing task into ﬁve

subtasks, namely

SELECT

FROM

WHERE

GHOL

and

SQL

. Basically, a subtask aims to translate the

natural language question to one or more clauses of

the SQL query. For example, the

GHOL

subtask

aims to generate the the

GROUP_BY

HAVING

ORDER_BY

, and

LIMIT

clauses given the ques-

tion and its corresponding database schema and

context. For queries involving set operators such as

INTERSECT

UNION

, and

EXCEPT

to combine

two SQL queries, we treat the ﬁrst query as usual

and the second query as the

SQL

clause of the ﬁrst

query. The

SQL

subtask targets at mapping the

question to the SQL clause.

There are two considerations behind construct-

ing a subtask: (1) the number of classiﬁcation

examples; (2) the dependency between different

clauses. First, according to the SQL syntax, every

SQL has the

SELECT

and

FROM

clauses. How-

ever, clauses such as

GROUP_BY

and

ORDER_BY

appear only in relatively complicated SQL queries.

It implies that the number of these clauses is much

smaller than that of the

SELECT

FROM

clause.

Trivially considering generating each clause as

a subtask is problematic. If a speciﬁc clause

does not exist, the generation task degenerates

to a classiﬁcation task because the model only

needs to judge its existence. We denote these

examples as classiﬁcation examples. Too many

classiﬁcation examples are harmful to model

learning. Second, the

GROUP_BY

and

HAVING

clauses are usually bundled together, which is also

the case of the

ORDER_BY

and

LIMIT

clauses.

The

ORDER_BY

clause is often dependent on the

GROUP_BY

clause if they appear in a SQL query

simultaneously. Based on the above observations,

combining these clauses to construct a single

subtask is more appropriate. We do not further

decompose the

SQL

clause because there will be

more subtasks, and most training examples of

these subtasks are classiﬁcation examples.

Knowledge Acquisition

In this stage, we

train the sequence-to-sequence model with all

subtasks using multi-task learning. We assign

each SQL keyword a special token, which is also

used to denote its corresponding clause. Then

we construct a task prompt for each subtask

based on the clauses it contains. For example,

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

TowardsGeneralizableandRobustText-to-SQLParsingChangGao1,BowenLi2,WenxuanZhang2,WaiLam1y,BinhuaLi2,FeiHuang2,LuoSi2andYongbinLi2y1TheChineseUniversityofHongKong2DAMOAcademy,AlibabaGroup{gaochang,wlam}@se.cuhk.edu.hk,libowen.ne@gmail.com{saike.zwx,binhua.lbh,shuide.lyb}@alibaba-inc.comAbstractText-t...

展开>> 收起<<

Towards Generalizable and Robust Text-to-SQL Parsing Chang Gao1 Bowen Li2 Wenxuan Zhang2 Wai Lam1y Binhua Li2 Fei Huang2 Luo Si2and Yongbin Li2y.pdf

共13页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Towards Generalizable and Robust Text-to-SQL Parsing Chang Gao1 Bowen Li2 Wenxuan Zhang2 Wai Lam1y Binhua Li2 Fei Huang2 Luo Si2and Yongbin Li2y

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: