
Realistic Data Augmentation Framework for Enhancing Tabular
Reasoning
Dibyakanti Kumar1∗
, Vivek Gupta2*†
, Soumya Sharma3, Shuo Zhang4
1IIT Guwahati; 2University of Utah; 3IIT Kharagpur; 4Bloomberg;
dibyakan@iitg.ac.in; vgupta@cs.utah.edu; soumyasharma20@gmail.com; szhang611@bloomberg.net
Abstract
Existing approaches to constructing training
data for Natural Language Inference (NLI)
tasks, such as for semi-structured table reason-
ing, are either via crowdsourcing or fully au-
tomatic methods. However, the former is ex-
pensive and time-consuming and thus limits
scale, and the latter often produces naive exam-
ples that may lack complex reasoning. This pa-
per develops a realistic semi-automated frame-
work for data augmentation for tabular infer-
ence. Instead of manually generating a hypoth-
esis for each table, our methodology gener-
ates hypothesis templates transferable to simi-
lar tables. In addition, our framework entails
the creation of rational counterfactual tables
based on human written logical constraints and
premise paraphrasing. For our case study, we
use the INFOTABS (Gupta et al.,2020), which
is an entity-centric tabular inference dataset.
We observed that our framework could gen-
erate human-like tabular inference examples,
which could benefit training data augmenta-
tion, especially in the scenario with limited su-
pervision.
1 Introduction
Natural Language Inference (NLI) is a Natural Lan-
guage Processing task of determining if a hypothe-
sis is entailed or contradicted given a premise or is
unrelated to it (Dagan et al.,2013). The NLI task
has been extended for tabular data where it takes
tables as the premise instead of sentences, namely
tabular inference task. Two popular human-curated
datasets for tabular reasoning, TABFACT (Chen
et al.,2020b) and INFOTABS (Gupta et al.,2020)
datasets, have enhanced recent research in this area.
However, human-generated datasets are limited
in scale and thus insufficient for learning with large
language models (e.g., Devlin et al.,2019;Liu
et al.,2019a). Since curating these datasets requires
∗Equal Contribution †Corresponding Author
expertise, huge annotation time, and expense, they
cannot be scaled. Furthermore, it has been shown
that these datasets suffer from annotation bias and
spurious correlation problem (e.g., Poliak et al.,
2018;Gururangan et al.,2018;Geva et al.,2019).
In contrast, automatically generated data lacks di-
versity and have naive reasoning aspects. Recently,
use of large language generation model (e.g., Rad-
ford et al.;Lewis et al.,2020;Raffel et al.,2020) is
also proposed for data generation (e.g., Zhao et al.,
2022;Ouyang et al.,2022;Mishra et al.,2022).
Despite substantial improvement, these generation
approaches still lack factuality, i.e., suffer hallucina-
tion, have poor facts coverage, and also suffer from
token repetition (refer to Appendix §Eanalysis).
Recently, Chen et al. (2020a) shows that automatic
tabular NLG frameworks cannot produce logical
statements and provide only surface reasoning.
To address the above shortcomings, we propose
a semi-automatic framework that exploits the pat-
terns in tabular structure for hypothesis generation.
Specifically, this framework generates hypothesis
templates transferable to similar tables since ta-
bles with similar categories, e.g., two athlete tables
in Wikipedia, will share many common attributes.
In Table 1the premise table key attributes such
as “Born”, “Died”, “Children” will soon be shared
across other tables from the “Person” category. One
can generate a template for tables in the Person cat-
egory, such as <Person
_
Name> died before/after
<Died:Year>. This template could be used to gen-
erate sentences as shown in Table 1hypothesis
H1 and H1
C
. Furthermore, humans can utilize
cell types (e.g., Date, Boolean) for generation tem-
plates. Recently, it has been shown that training
on counterfactual data enhances model robustness
(Müller et al.,2021;Wang and Culotta,2021;Ra-
jagopal et al.,2022). Therefore, we also utilize
the overlapping key pattern to create counterfac-
tual tables. The complexity and diversity of the
templates can be enforced via human annotators.
arXiv:2210.12795v1 [cs.CL] 23 Oct 2022