STAR SQL Guided Pre-Training for Context-dependent Text-to-SQL Parsing Zefeng Cai12

2025-05-03 0 0 905.83KB 13 页 10玖币

侵权投诉

STAR: SQL Guided Pre-Training for Context-dependent

Text-to-SQL Parsing

Zefeng Cai1,2,, Xiangyu Li1,2,, Binyuan Hui3, Min Yang2†

, Bowen Li3,

Binhua Li3, Zheng Cao3, Weijie Li3, Fei Huang3, Luo Si3, Yongbin Li3†

1University of Science and Technology of China

2Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences

3DAMO Academy, Alibaba Group

{zf.cai, xy.li3, min.yang}@siat.ac.cn

{binyuan.hby, binhua.lbh, shuide.lyb}@alibaba-inc.com

Abstract

In this paper, we propose a novel SQL

guided pre-training framework STAR for

context-dependent text-to-SQL parsing, which

leverages contextual information to enrich

natural language (NL) utterance and table

schema representations for text-to-SQL con-

versations. Concretely, we propose two novel

pre-training objectives which respectively ex-

plore the context-dependent interactions of

NL utterances and SQL queries within each

text-to-SQL conversation: (i) schema state

tracking (SST) objective that tracks and ex-

plores the schema states of context-dependent

SQL queries in the form of schema-states

by predicting and updating the value of

each schema slot during interaction; (ii) ut-

terance dependency tracking (UDT) objec-

tive that employs weighted contrastive learn-

ing to pull together two semantically simi-

lar NL utterances and push away the repre-

sentations of semantically dissimilar NL utter-

ances within each conversation. In addition,

we construct a high-quality large-scale context-

dependent text-to-SQL conversation corpus to

pre-train STAR. Extensive experiments show

that STAR achieves new state-of-the-art per-

formance on two downstream benchmarks

(SPARC and COSQL), signiﬁcantly outper-

forming previous pre-training methods and

ranking ﬁrst on the leaderboard. We believe

the release of the constructed corpus, code-

base and pre-trained STAR checkpoints would

push forward the research in this area. For

reproducibility, we release our code and data

at https://github.com/AlibabaResearch/

DAMO-ConvAI/tree/main/star.

1 Introduction

Text-to-SQL parsing (Zhong et al.,2017;Yu et al.,

2018;Wang et al.,2022;Qin et al.,2022b) aims

to translate natural language (NL) questions into

Equal contribution.

†Corresponding authors.

Dialogs SQLs Schema States

Can you show me campuses

in year 2000?

Can you also show me county

a6er year 2000?

Turn 1

Turn 2

Turn 3

Turn 4

SELECT campus FROM Campuses

WHERE year = 2000

SELECT campus, county FROM

Campuses WHERE year > 2000

slot value

Campuses.county NONE

What are the degrees on the

campuses list?

SELECT degrees FROM Campuses

JOIN Degrees

Which one in the university

conferred the least number in

year 2000 ?

SELECT degrees FROM Campuses

JOIN Degrees WHERE

Campuses.year = 2000 ORDER BY

Degrees.degrees LIMIT 1

Degrees.campus

WHERE =

NONE

······ ······

Degrees.degrees

Campuses.year

SELECT

Campuses.county NONE

Degrees.campus

WHERE =

SELECT

Degrees.degrees

Campuses.year

NONE

······ ······

Campuses.county SELECT

Degrees.campus

WHERE >

SELECT

Degrees.degrees

Campuses.year

NONE

······ ······

Campuses.county NONE

Degrees.campus

NONE

Degrees.degrees

Campuses.year

SELECT

······ ······

Database Schema

campus

county

year

…

year

campus

degrees

…

Degrees

Campuses

Figure 1: An example of cross-domain context-

dependent Text-to-SQL conversation. Here, each

database schema refers to the table/column names of

databases and each schema state refers to a slot-value

pair, whose slot is a column/table name (e.g., De-

grees.campus) and its value is a SQL keyword (e.g.,

SELECT). “x” indicates that the semantic/intent is

switched between Turn2 and Turn3 utterances.

executable SQL queries, which enables the users

who are unfamiliar with SQL to query databases

with natural language. Pre-trained language mod-

els (PLMs) have proved to be powerful in enhanc-

ing text-to-SQL parsing and yield impressive per-

formances, which beneﬁt from the rich linguistic

knowledge in large-scale corpora. However, as re-

vealed in previous works (Yin et al.,2020;Yu et al.,

2021a;Qin et al.,2022a), there are intrinsic dis-

crepancy between the distributions of tables and

plain texts, leading to sub-optimal performances of

general PLMs such as BERT (Devlin et al.,2019),

ROBERTA (Liu et al.,2019), ELECTRA (Clark

et al.,2020). Recently, some studies (Yu et al.,

2021a,b;Shi et al.,2021;Deng et al.,2021;Liu

et al.,2021a,b) alleviate the above limitation by de-

signing tailored tabular language models (TaLMs)

for text-to-SQL parsing, which simultaneously en-

code NL questions and tables.

Despite the remarkable progress of previous

TaLMs, they still suffer from technical challenges

in the context-dependent setting.

First

, existing

TaLMs merely explore contextual information to

enrich utterance representations without consid-

arXiv:2210.11888v2 [cs.CL] 28 Oct 2022

ering the interaction states determined by history

SQL queries, which are relevant to the user in-

tent of current utterance. Nevertheless, the trace

and usage of historical SQL information can con-

tribute greatly to model the current SQL query, as

SQL conveys user intent in a compact and precise

manner. As shown in Figure 1, the second SQL

query is more likely to select the contents from

the “Compuses” table since the ﬁrst SQL query

mentioned that table. Although tracking schema

states is essential to keep track of user requests

for context-dependent text-to-SQL parsing, how

to model, track and utilize schema states through-

out a conversation has not yet been explored in

previous TaLMs.

Second

, context-dependent text-

to-SQL parsing needs to effectively process context

information so as to help the system better parse

current NL utterance, since users may omit previ-

ously mentioned entities as well as constraints and

introduce substitutions to what has already been

stated. Taking Figure 1as an example, the sec-

ond utterance omit the implicit constraint of “cam-

puses in year 2000” as mentioned in the ﬁrst utter-

ance. However, most prior TaLMs primarily model

stand-alone NL utterances without considering the

context-dependent interactions, which result in sub-

optimal performance. Although SCORE(Yu et al.,

2021b) model the turn contextual switch by pre-

dicting the context switch label between two con-

secutive user utterances, it ignores the complex

interactions of context utterances and cannot track

the dependence between distant utterances. For

instance, in Figure 1, SCOREfails to capture the

long term dependency between the ﬁrst and the

fourth utterances since there is a switch between

the second and the third utterances.

In this paper, we propose a novel pre-training

framework STAR for context-dependent text-to-

SQL parsing, which explores the multi-turn inter-

actions of NL utterances and SQL queries within

each conversation, respectively.

First

, we propose

a schema state tracking (SST) objective to keep

track of SQL queries in the form of schema-states,

which predicts the value (a SQL keyword) of each

schema slot of the current SQL query given the

schema-state representation of previously predicted

SQL query. By introducing the schema-states to

represent SQL queries, we can better capture the

alignment between the the historical and current

SQL queries, especially for the long and complex

SQL queries.

Second

, we propose an utterance de-

pendency tracking (UDT) objective to capture com-

plex semantic dependency of sequential NL ques-

tions, which employs weighted contrastive learning

to pull together semantically similar NL utterances

and push away dissimilar NL utterances within

each conversation. A key insight is that the utter-

ance corresponding to similar SQL will be more

semantically relevant, as SQL is a highly structured

indication of user intent. Concretely, we propose

two novel similarity functions (SQL semantic simi-

larity and SQL structure similarity) to comprehen-

sively construct appropriate positive and negative

NL question pairs.

We summarize our main contributions as follows.

(1) To the best of our knowledge, we are the ﬁrst to

propose a schema state tracking (SST) objective for

context-dependent TaLM, which tracks and updates

the schema states of the context-dependent SQL

queries in the form of schema states. (2) We pro-

pose an utterance dependency tracking (UDT) ob-

jective to capture complex semantic information of

sequential NL questions, which employs weighted

contrastive learning with two novel SQL-oriented

similarity functions to pull together two seman-

tically similar NL utterances and push away the

representations of dissimilar NL utterances within

each conversation. (3) We construct a high-quality

large-scale context-dependent text-to-SQL conver-

sation corpus to pre-train STAR. Experiments show

that STAR achieves new state-of-the-art perfor-

mance on two downstream benchmarks (SPARC

and COSQL) and ranking ﬁrst on the leaderboard.

2 Task Deﬁnition

In this section, we ﬁrst provide the formal task def-

inition for context-dependent text-to-SQL parsing.

Let

U={u1, . . . , uT}

denote the utterances in a

context-dependent text-to-SQL conversation with

turns, where

represents the

-th NL question.

Each NL sentence

contains

tokens, denoted

ui= [w1, . . . , wni]

. In addition, there is a cor-

responding database schema

, which consists of

tables

{Ti}N

i=1

. The number of columns of all

tables in the schema is

. We use

to denote the

name of the

-th item in schema

. At current turn

, the goal of text-to-SQL parsing is to generate the

SQL query

given the current utterance

, histor-

ical utterances

{u1, . . . , ut−1}

, schema

, and the

last predicted SQL query

ot−1

. STAR primarily

consists of a stack of Transformer layer, which con-

verts a sequence of

input tokens

x= [x1, ..., xL]

!"#$ %&'&()

STAR

Last Schema State

(a) Utterance Dependency Tracking (b) Schema State Tracking

Current Schema State

pull

pull pull

push

*;,-%&'&()-??-??

*N,-%&'&()-??-??

*O,-%&'&()-!"#$-/012-3$"45$6-107&0-89-".$-':2:)-;

*+,-%&'&()-".$-/012-3$"45$6-107&0-89-".$-':2:)-;

SQL Similarity

Structure Metric

Semantic Metric

SQL

Weighted Contrastive Learning

Similarity

Calculate

History Question

(u3｜u2｜u1)

Current Question

(u4)

"<<6$== >1>& ".$ 107&0-89-':2:)

!"#$ >1>& "<<6$== >1>& ".$ %&'&()-107&0-89-':2:)

?? ??

@('%A-B5"3-C=-5C=-".$-？D-B5*-C=-35$-E*F!.$=3-3$"45$6-G-D-B5"3-?-D-B5*-?-@%&HA

u4 u3 u2 u1

%45$#"I=3"3$-%J*3

%45$#"I=3"3$-K"JF$

LF$=M*!

%L'

Figure 2: The overview of the proposed STAR framework consisting of two novel pre-training objectives: (a) the

utterance dependency tracking and (b) the schema state tracking. For brevity, we do not show the masked language

modeling objective here.

into a sequence of contextualized vector represen-

tations h= [h1,...,hL].

3 Pre-training Objectives

As illustrated in Figure 2, we propose two

novel pre-training objectives

SST

(

chema

tate

racking) and

UDT

(

tterance

ependency

racking) to explore the complex context interac-

tions of NL utterances and SQL queries within each

text-to-SQL conversation, respectively. In addi-

tion, we also employ the

MLM

(

asked

anguage

odeling) objective to help learn better contextual

representations of the conversations. Next, we will

introduce the pre-training objectives in detail.

3.1 Schema State Tracking

The usage of context SQL information contributes

greatly to model the current SQL query. Inspired

by the dialogue state tracking (Ouyang et al.,2020;

Wang et al.,2021a) which keeps track of user in-

tentions in the form of a set of dialogue states (i.e. ,

slot-value pairs) in task-oriented dialogue systems,

we propose a schema state tracking (SST) objective

in a self-supervised manner to keep track of schema

states (or user requests) of context-dependent SQL

queries, which aims to predict the values of the

schema slots. Concretely, we track the interaction

states of the text-to-SQL conversation in the form

of schema-states whose slots are column names

of all tables in the schema and their values are

from SQL keywords. Taking the SQL query in

Figure 3as example, the value of the schema slot

[cars_data] is the SQL keyword [SELECT].

Formally, we ﬁrst convert the last predicted SQL

query

ot−1

into a set of schema states. Since

the names of schema states are names of all

schema, the values of those schema states that

do not appear in the last SQL query

ot−1

are

set to

[NONE]

, as shown in Figure 3. We repre-

sent the SQL query

ot−1

with

schema-states

{(si

t−1, vi

t−1)}m

i=1

, where

t−1

denotes the schema-

state slot,

t−1

denotes the schema-state value of

the slot

t−1

, and

represents the number of

schema. At the

-th turn, the goal of SST is to

predict the value

of each schema-state slot

the

-th SQL query given all the history utterances

{u1, . . . , ut−1}

, the current utterance

and the

schema-states

{(si

t−1, vi

t−1)}m

i=1

of the late query

ot−1

. That is, at the

-th turn, the input

of the

SST task is as:

It={u1, . . . , ut};{(si

t−1, vi

t−1)}m

i=1(1)

Note that the SQL queries within a conversation

share the same schema

, thus the schema-states of

the

-th and

t−1

-th SQL queries have the same

schema-state slots (i.e., si

t−1=si

t=si).

Since each schema state

t−1= (si

t−1, vi

t−1)

contains multiple words, we apply an attentive

layer to obtain the representation of

t−1=

(si

t−1, vi

t−1)

. Concretely, given the output contextu-

alized representation

hci

t−1

t= [hl

t,...,hl+|ci

t−1|−1

(

is the start index of

t−1

) of each schema state

t−1

, the attentive schema-state representation

t−1

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

STAR:SQLGuidedPre-TrainingforContext-dependentText-to-SQLParsingZefengCai1;2;,XiangyuLi1;2;,BinyuanHui3,MinYang2y,BowenLi3,BinhuaLi3,ZhengCao3,WeijieLi3,FeiHuang3,LuoSi3,YongbinLi3y1UniversityofScienceandTechnologyofChina2ShenzhenInstituteofAdvancedTechnology,ChineseAcademyofSciences3DAMOAcademy,Ali...

展开>> 收起<<

STAR SQL Guided Pre-Training for Context-dependent Text-to-SQL Parsing Zefeng Cai12.pdf

共13页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

STAR SQL Guided Pre-Training for Context-dependent Text-to-SQL Parsing Zefeng Cai12

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: