Towards Generalizable and Robust Text-to-SQL Parsing Chang Gao1 Bowen Li2 Wenxuan Zhang2 Wai Lam1y Binhua Li2 Fei Huang2 Luo Si2and Yongbin Li2y

2025-05-06 0 0 452.23KB 13 页 10玖币
侵权投诉
Towards Generalizable and Robust Text-to-SQL Parsing
Chang Gao1, Bowen Li2, Wenxuan Zhang2, Wai Lam1
, Binhua Li2,
Fei Huang2, Luo Si2and Yongbin Li2
1The Chinese University of Hong Kong
2DAMO Academy, Alibaba Group
{gaochang,wlam}@se.cuhk.edu.hk, libowen.ne@gmail.com
{saike.zwx,binhua.lbh,shuide.lyb}@alibaba-inc.com
Abstract
Text-to-SQL parsing tackles the problem of
mapping natural language questions to exe-
cutable SQL queries. In practice, text-to-SQL
parsers often encounter various challenging
scenarios, requiring them to be generalizable
and robust. While most existing work ad-
dresses a particular generalization or robust-
ness challenge, we aim to study it in a more
comprehensive manner. In specific, we be-
lieve that text-to-SQL parsers should be (1)
generalizable at three levels of generaliza-
tion, namely i.i.d.,zero-shot, and composi-
tional, and (2) robust against input perturba-
tions. To enhance these capabilities of the
parser, we propose a novel TKK framework
consisting of Task decomposition, Knowledge
acquisition, and Knowledge composition to
learn text-to-SQL parsing in stages. By divid-
ing the learning process into multiple stages,
our framework improves the parser’s ability
to acquire general SQL knowledge instead of
capturing spurious patterns, making it more
generalizable and robust. Experimental re-
sults under various generalization and robust-
ness settings show that our framework is ef-
fective in all scenarios and achieves state-
of-the-art performance on the Spider, SParC,
and CoSQL datasets. Code can be found
at https://github.com/AlibabaResearch/
DAMO-ConvAI/tree/main/tkk.
1 Introduction
Text-to-SQL parsing aims to translate natural lan-
guage questions to SQL queries that can be exe-
cuted on databases to produce answers (Lin et al.,
2020), which bridges the gap between expert pro-
grammers and ordinary users who are not proficient
in writing SQL queries. Thus, it has drawn great
Work done when Chang Gao was an intern at Alibaba.
The work described in this paper is substantially supported
by a grant from the Research Grant Council of the Hong
Kong Special Administrative Region, China (Project Code:
14204418).
Corresponding authors.
attention in recent years (Zhong et al.,2017;Suhr
et al.,2020;Scholak et al.,2021;Hui et al.,2022;
Qin et al.,2022a,b).
Early work in this field (Zelle and Mooney,1996;
Yaghmazadeh et al.,2017;Iyer et al.,2017) mainly
focuses on i.i.d. generalization. They only use
a single database, and the exact same target SQL
query may appear in both the training and test sets.
However, it is difficult to collect sufficient training
data to cover all the questions users may ask (Gu
et al.,2021) and the predictions of test examples
might be obtained by semantic matching instead
of semantic parsing (Yu et al.,2018b), limiting the
generalization ability of parsers. Subsequent work
further focuses on generalizable text-to-SQL pars-
ing in terms of two aspects: zero-shot generaliza-
tion and compositional generalization. Zero-shot
generalization requires the parser to generalize to
unseen database schemas. Thanks to large-scale
datasets such as Spider (Yu et al.,2018b), SParC
(Yu et al.,2019b), and CoSQL (Yu et al.,2019a),
zero-shot generalization has been the most popu-
lar setting for text-to-SQL parsing in recent years.
Various methods involving designing graph-based
encoders (Wang et al.,2020;Cao et al.,2021) and
syntax tree decoders (Yu et al.,2018a;Rubin and
Berant,2021) have been developed to tackle this
challenge. Compositional generalization is the de-
sired ability to generalize to test examples consist-
ing of novel combinations of components observed
during training. Finegan-Dollak et al. (2018) ex-
plore compositional generalization in text-to-SQL
parsing focusing on template-based query splits.
Shaw et al. (2021) provide new splits of Spider
considering length, query template, and query com-
pound divergence to create challenging evaluations
of compositional generalization.
Another challenge of conducting text-to-SQL
parsing in practice is robustness. Existing text-to-
SQL models have been found vulnerable to input
perturbations (Deng et al.,2021;Gan et al.,2021a;
arXiv:2210.12674v1 [cs.CL] 23 Oct 2022
[SELECT] What are the names of documents that
have both one of the three most common types and
one of three most common structures? ; {S} ;{C}
[FROM] What are the names of documents that
have both one of the three most common types and
one of three most common structures? ; {S} ; {C}
[WHERE] What are the names of documents that
have both one of the three most common types and
one of three most common structures? ; {S} ;{C}
[GROUP_BY] [HAVING] [ORDER_BY] [LIMIT]
What are the names of documents that have both
one of the three most common types and one of
three most common structures? ; {S} ;{C}
[SQL]What are the names of documents that have
both one of the three most common types and one
of three most common structures? ; {S} ;{C}
[SELECT]document_name
[FROM]documents
[WHERE]
[GROUP_BY]document_type_code [HAVING] [ORDER_BY]
count(*) desc [LIMIT]3
[INTERSECT] [SELECT]document_name [FROM]documents
[GROUP_BY]document_structure_code [ORDER_BY]count(*)
desc [LIMIT]3
TKK
[SELECT] [FROM] [WHERE] [GROUP_BY]
[HAVING] [ORDER_BY] [LIMIT] [SQL]What are
the names of documents that have both one of the
three most common types and one of three most
common structures? ; {S} ;{C}
[SELECT]document_name [FROM]documents [GROUP_BY]
document_type_code [ORDER_BY]count(*) desc [LIMIT]3
[INTERSECT] [SELECT]document_name [FROM]documents
[GROUP_BY]document_structure_code [ORDER_BY]count(*)
desc [LIMIT]3
TKK
Task Decomposition
Knowledge Acquisition
Knowledge Composition
SELECT document_name FROM documents GROUP BY
document_type_code ORDER BY count(*) desc LIMIT 3
INTERSECT SELECT document_name FROM documents GROUP
BY document_structure_code ORDER BY count(*) desc LIMIT 3
Text-to-SQL
Parsing
What are the names of documents that have both
one of the three most common types and one of
three most common structures? ; {S} ;{C}
1
2
3
SELECT
Subtask
FROM
Subtask
WHERE
Subtask
GHOL
Subtask
SQL
Subtask
Main
task
Figure 1: Overview of our TKK framework. {S} and {C} denote the database schema and context, respectively.
Pi et al.,2022). For example, Gan et al. (2021a)
replace schema-related words in natural language
questions with manually selected synonyms and ob-
serve a dramatic performance drop. They propose
two approaches, namely multi-annotation selection
and adversarial training, to improve model robust-
ness against synonym substitution.
Although specialized model architectures and
training approaches have been proposed to address
a particular generalization or robustness challenge,
we believe that practical text-to-SQL parsers should
be built with strong generalizability in terms of
all
three levels of generalization
, namely i.i.d.,zero-
shot, and compositional, and
robustness
against
input perturbations. To obtain such capabilities,
it can be noticed that humans often learn to write
each clause, such as
SELECT
or
WHERE
, for a ba-
sic operation, before composing them to fulfill a
more challenging goal, i.e., writing the entire SQL
query. In contrast, most existing methods adopt a
one-stage learning paradigm, i.e., learning to write
each SQL clause and the dependency between dif-
ferent clauses simultaneously. This may lead the
model to capture spurious patterns between the
question, database schema, and SQL query instead
of learning general SQL knowledge.
To this end, we propose a novel framework
consisting of three learning stages including
T
ask decomposition,
K
nowledge acquisition, and
K
nowledge composition (TKK) for text-to-SQL
parsing, which mimics the human learning pro-
cedure to learn to handle the task in stages. Specif-
ically, in the task decomposition stage, TKK de-
composes the original task into several subtasks.
Each subtask corresponds to mapping the natural
language question to one or more clauses of the
SQL query, as shown in the top portion of Figure 1.
Afterwards, TKK features a prompt-based learning
strategy to separately acquire the knowledge of sub-
tasks and employ the learned knowledge to tackle
the main task, i.e., generating the entire SQL query.
In the knowledge acquisition stage, TKK trains the
model with all the subtasks in a multi-task learning
manner; in the knowledge composition stage, TKK
fine-tunes the model with the main task to combine
the acquired knowledge of subtasks and learn the
dependency between them.
The advantages of our three-stage framework
over previous one-stage learning methods are three-
fold: (1) it reduces the difficulty of model learn-
ing by dividing the learning process into multiple
easier-to-learn stages; (2) it explicitly forces the
model to learn the alignment between the ques-
tion, database schema, and each SQL clause as it
needs to identify the intent expressed in the ques-
tion based on the schema to generate a specific
clause; (3) by explicitly constructing the training
data for each subtask, it is easier for the model to
learn the knowledge required to translate the ques-
tion into each SQL clause. These advantages help
the model to learn general SQL knowledge rather
than some dataset-specific patterns, making it more
generalizable and robust.
To verify the effectiveness of our framework,
we conduct comprehensive evaluations on repre-
sentative benchmarks covering all three levels of
generalization and robustness scenarios with pre-
trained sequence-to-sequence models. Experimen-
tal results and analysis show that: (1) we achieve
state-of-the-art performance on the Spider, SParC,
and CoSQL datasets; (2) our method outperforms
vanilla sequence-to-sequence models in all scenar-
ios; (3) our framework significantly improves the
model’s ability to generate complex SQL queries;
(4) our framework is also effective in the low-
resource setting.
2 Background
Notations
We use the lowercase letter
q
to
denote a natural language question and denote
its corresponding database schema, context, and
SQL query as
sq
,
cq
, and
lq
, respectively. We
represent the set of training examples
(q, sq, cq, lq)
as
Dtrain
and test set as
Dtest
. A perturbed test
set
D0
test
could be constructed by perturbations
to questions such as synonym substitution to
form
(q0, sq, cq, lq)
. We denote
Strain
as the set
of database schemas of
Dtrain
,
Ltrain
as the set
of SQL queries of
Dtrain
, and
Qtest
as the set of
questions of Dtest.
Problem Definition
Given
(q, sq, cq)
, where
the database schema
sq
consists of tables and
columns, and context
cq
is the interaction history
consisting of previous questions and system
clarification in the multi-turn setting or empty in
the single-turn setting, the goal is to generate the
correct SQL query lq.
Generalization and Robustness
Following
Gu et al. (2021) and Wang et al. (2022b), we
formalize three levels of generalization and
robustness as follows:
Zero-shot generalization:
q∈ Qtest, sq6∈
Strain.
Compositional generalization:
q
Qtest, sq∈ Strain, lq6∈ Ltrain.
I.I.D. generalization:
q∈ Qtest, sq∈ Strain
.
Dtrain and Dtest follow the same distribution.
Robustness: training with
Dtrain
but adopting
D0
test instead of Dtest for evaluation.
3 Our TKK Framework
TKK consists of three learning stages: task decom-
position, knowledge acquisition, and knowledge
composition. In this section, we first introduce
each stage in detail. Then we describe the training
and inference of TKK.
3.1 Three Stages of TKK
Task Decomposition
As shown in Figure 1, we
decompose the text-to-SQL parsing task into five
subtasks, namely
SELECT
,
FROM
,
WHERE
,
GHOL
,
and
SQL
. Basically, a subtask aims to translate the
natural language question to one or more clauses of
the SQL query. For example, the
GHOL
subtask
aims to generate the the
GROUP_BY
,
HAVING
,
ORDER_BY
, and
LIMIT
clauses given the ques-
tion and its corresponding database schema and
context. For queries involving set operators such as
INTERSECT
,
UNION
, and
EXCEPT
to combine
two SQL queries, we treat the first query as usual
and the second query as the
SQL
clause of the first
query. The
SQL
subtask targets at mapping the
question to the SQL clause.
There are two considerations behind construct-
ing a subtask: (1) the number of classification
examples; (2) the dependency between different
clauses. First, according to the SQL syntax, every
SQL has the
SELECT
and
FROM
clauses. How-
ever, clauses such as
GROUP_BY
and
ORDER_BY
appear only in relatively complicated SQL queries.
It implies that the number of these clauses is much
smaller than that of the
SELECT
or
FROM
clause.
Trivially considering generating each clause as
a subtask is problematic. If a specific clause
does not exist, the generation task degenerates
to a classification task because the model only
needs to judge its existence. We denote these
examples as classification examples. Too many
classification examples are harmful to model
learning. Second, the
GROUP_BY
and
HAVING
clauses are usually bundled together, which is also
the case of the
ORDER_BY
and
LIMIT
clauses.
The
ORDER_BY
clause is often dependent on the
GROUP_BY
clause if they appear in a SQL query
simultaneously. Based on the above observations,
combining these clauses to construct a single
subtask is more appropriate. We do not further
decompose the
SQL
clause because there will be
more subtasks, and most training examples of
these subtasks are classification examples.
Knowledge Acquisition
In this stage, we
train the sequence-to-sequence model with all
subtasks using multi-task learning. We assign
each SQL keyword a special token, which is also
used to denote its corresponding clause. Then
we construct a task prompt for each subtask
based on the clauses it contains. For example,
摘要:

TowardsGeneralizableandRobustText-to-SQLParsingChangGao1,BowenLi2,WenxuanZhang2,WaiLam1y,BinhuaLi2,FeiHuang2,LuoSi2andYongbinLi2y1TheChineseUniversityofHongKong2DAMOAcademy,AlibabaGroup{gaochang,wlam}@se.cuhk.edu.hk,libowen.ne@gmail.com{saike.zwx,binhua.lbh,shuide.lyb}@alibaba-inc.comAbstractText-t...

展开>> 收起<<
Towards Generalizable and Robust Text-to-SQL Parsing Chang Gao1 Bowen Li2 Wenxuan Zhang2 Wai Lam1y Binhua Li2 Fei Huang2 Luo Si2and Yongbin Li2y.pdf

共13页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:13 页 大小:452.23KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 13
客服
关注