Large Language Models are few(1)-shot Table Reasoners
Wenhu Chen
University of Waterloo, Vector Institute
wenhuchen@uwaterloo.ca
Abstract
Recent literature has shown that large lan-
guage models (LLMs) are generally excel-
lent few-shot reasoners to solve text reasoning
tasks. However, the capability of LLMs on ta-
ble reasoning tasks is yet to be explored. In
this paper, we aim at understanding how well
LLMs can perform table-related tasks with
few-shot in-context learning. Specifically, we
evaluated LLMs on popular table QA and fact
verification datasets like WikiTableQuestion,
FetaQA, TabFact, and FEVEROUS and found
that LLMs are competent at complex reason-
ing over table structures, though these models
are not pre-trained on any table corpus. When
combined with ‘chain of thoughts’ prompting,
LLMs can achieve very strong performance
with only a 1-shot demonstration, even on par
with some SoTA models. We show that LLMs
are even more competent at generating com-
prehensive long-form answers on FetaQA than
tuned T5-large. We further manually studied
the reasoning chains elicited from LLMs and
found that these reasoning chains are highly
consistent with the underlying semantic form.
We believe that LLMs can serve as a sim-
ple yet generic baseline for future research.
The code and data are released in https://
github.com/wenhuchen/TableCoT.
1 Introduction
The problem of structured knowledge grounding
has been extensively studied for many years. Ta-
bles, as one of the most popular (semi)-structured
forms to store world knowledge receive signifi-
cant attention from the natural language processing
(NLP) community. Traditional approaches mostly
rely on synthesizing executable languages like SQL
or SPARQL to access the information inside the ta-
ble. However, these symbolic languages normally
make a rigid assumption about the table and can-
not capture the semantics of text chunks inside the
table. Such issues are even more pronounced with
web tables due to their irregular forms. To fully
understand web tables, both structured reasoning
and textual reasoning are required. Such challenges
have attracted many researchers to work in the field.
Recently, a wide range of table-based tasks have
been proposed like table question answering (Pasu-
pat and Liang,2015;Chen et al.,2020c;Zhu et al.,
2021;Chen et al.,2021b;Talmor et al.,2020;Chen
et al.,2020a;Nan et al.,2022), table fact verifi-
cation (Chen et al.,2019;Aly et al.,2021), table-
based generation (Chen et al.,2020b;Parikh et al.,
2020;Nan et al.,2021), and table-grounded con-
versation (Budzianowski et al.,2018;Nakamura
et al.,2022). This wide range of table-based tasks
all come with different input-output formats and
domains. Due to the heterogeneity of these tasks,
models achieving the best results on these tasks
normally need to be fully fine-tuned on the specific
downstream dataset with 10K-100K examples to
achieve reasonable performance.
Recently, there have been efforts like Unified-
SKG (Xie et al.,2022) aiming to unify these het-
erogeneous table-based tasks as a generic text-to-
text format. UnifiedSKG has shown that using
T5-3B (Raffel et al.,2020) with the text-to-text
format can already achieve state-of-the-art perfor-
mance on almost all the table-based tasks without
task-specific designs. However, the proposed text-
to-text models still need to be fully fine-tuned on
the downstream tasks. UnifiedSKG also identified
that T0-style (Sanh et al.,2022) cross-task transfer
can only achieve almost random performance.
Wei et al. (2022); Wang et al. (2022); Zhou et al.
(2022); Drozdov et al. (2022) have recently dis-
covered that large language models (Brown et al.,
2020;Chowdhery et al.,2022;Ouyang et al.,2022)
can be used to solve complex mathematical and
commonsense reasoning tasks with few-shot in-
context learning. Inspired by this discovery, we
aim at understanding whether these LLMs can also
solve complex table-based reasoning tasks. Though
the LLMs are not specifically designed to encode ta-
arXiv:2210.06710v2 [cs.CL] 23 Jan 2023