
STAR: SQL Guided Pre-Training for Context-dependent
Text-to-SQL Parsing
Zefeng Cai1,2,, Xiangyu Li1,2,, Binyuan Hui3, Min Yang2†
, Bowen Li3,
Binhua Li3, Zheng Cao3, Weijie Li3, Fei Huang3, Luo Si3, Yongbin Li3†
1University of Science and Technology of China
2Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences
3DAMO Academy, Alibaba Group
{zf.cai, xy.li3, min.yang}@siat.ac.cn
{binyuan.hby, binhua.lbh, shuide.lyb}@alibaba-inc.com
Abstract
In this paper, we propose a novel SQL
guided pre-training framework STAR for
context-dependent text-to-SQL parsing, which
leverages contextual information to enrich
natural language (NL) utterance and table
schema representations for text-to-SQL con-
versations. Concretely, we propose two novel
pre-training objectives which respectively ex-
plore the context-dependent interactions of
NL utterances and SQL queries within each
text-to-SQL conversation: (i) schema state
tracking (SST) objective that tracks and ex-
plores the schema states of context-dependent
SQL queries in the form of schema-states
by predicting and updating the value of
each schema slot during interaction; (ii) ut-
terance dependency tracking (UDT) objec-
tive that employs weighted contrastive learn-
ing to pull together two semantically simi-
lar NL utterances and push away the repre-
sentations of semantically dissimilar NL utter-
ances within each conversation. In addition,
we construct a high-quality large-scale context-
dependent text-to-SQL conversation corpus to
pre-train STAR. Extensive experiments show
that STAR achieves new state-of-the-art per-
formance on two downstream benchmarks
(SPARC and COSQL), significantly outper-
forming previous pre-training methods and
ranking first on the leaderboard. We believe
the release of the constructed corpus, code-
base and pre-trained STAR checkpoints would
push forward the research in this area. For
reproducibility, we release our code and data
at https://github.com/AlibabaResearch/
DAMO-ConvAI/tree/main/star.
1 Introduction
Text-to-SQL parsing (Zhong et al.,2017;Yu et al.,
2018;Wang et al.,2022;Qin et al.,2022b) aims
to translate natural language (NL) questions into
Equal contribution.
†Corresponding authors.
Dialogs SQLs Schema States
Can you show me campuses
in year 2000?
Can you also show me county
a6er year 2000?
Turn 1
Turn 2
Turn 3
Turn 4
SELECT campus FROM Campuses
WHERE year = 2000
SELECT campus, county FROM
Campuses WHERE year > 2000
X
slot value
Campuses.county NONE
What are the degrees on the
campuses list?
SELECT degrees FROM Campuses
JOIN Degrees
Which one in the university
conferred the least number in
year 2000 ?
SELECT degrees FROM Campuses
JOIN Degrees WHERE
Campuses.year = 2000 ORDER BY
Degrees.degrees LIMIT 1
Degrees.campus
WHERE =
NONE
······ ······
Degrees.degrees
Campuses.year
SELECT
Campuses.county NONE
Degrees.campus
WHERE =
SELECT
Degrees.degrees
Campuses.year
NONE
······ ······
Campuses.county SELECT
Degrees.campus
WHERE >
SELECT
Degrees.degrees
Campuses.year
NONE
······ ······
Campuses.county NONE
Degrees.campus
NONE
NONE
Degrees.degrees
Campuses.year
SELECT
······ ······
Figure 1: An example of cross-domain context-
dependent Text-to-SQL conversation. Here, each
database schema refers to the table/column names of
databases and each schema state refers to a slot-value
pair, whose slot is a column/table name (e.g., De-
grees.campus) and its value is a SQL keyword (e.g.,
SELECT). “x” indicates that the semantic/intent is
switched between Turn2 and Turn3 utterances.
executable SQL queries, which enables the users
who are unfamiliar with SQL to query databases
with natural language. Pre-trained language mod-
els (PLMs) have proved to be powerful in enhanc-
ing text-to-SQL parsing and yield impressive per-
formances, which benefit from the rich linguistic
knowledge in large-scale corpora. However, as re-
vealed in previous works (Yin et al.,2020;Yu et al.,
2021a;Qin et al.,2022a), there are intrinsic dis-
crepancy between the distributions of tables and
plain texts, leading to sub-optimal performances of
general PLMs such as BERT (Devlin et al.,2019),
ROBERTA (Liu et al.,2019), ELECTRA (Clark
et al.,2020). Recently, some studies (Yu et al.,
2021a,b;Shi et al.,2021;Deng et al.,2021;Liu
et al.,2021a,b) alleviate the above limitation by de-
signing tailored tabular language models (TaLMs)
for text-to-SQL parsing, which simultaneously en-
code NL questions and tables.
Despite the remarkable progress of previous
TaLMs, they still suffer from technical challenges
in the context-dependent setting.
First
, existing
TaLMs merely explore contextual information to
enrich utterance representations without consid-
arXiv:2210.11888v2 [cs.CL] 28 Oct 2022