Realistic Data Augmentation Framework for Enhancing Tabular Reasoning Dibyakanti Kumar1 Vivek Gupta2y Soumya Sharma3 Shuo Zhang4

2025-04-29 0 0 721.52KB 19 页 10玖币

侵权投诉

Realistic Data Augmentation Framework for Enhancing Tabular

Reasoning

Dibyakanti Kumar1∗

, Vivek Gupta2*†

, Soumya Sharma3, Shuo Zhang4

1IIT Guwahati; 2University of Utah; 3IIT Kharagpur; 4Bloomberg;

dibyakan@iitg.ac.in; vgupta@cs.utah.edu; soumyasharma20@gmail.com; szhang611@bloomberg.net

Abstract

Existing approaches to constructing training

data for Natural Language Inference (NLI)

tasks, such as for semi-structured table reason-

ing, are either via crowdsourcing or fully au-

tomatic methods. However, the former is ex-

pensive and time-consuming and thus limits

scale, and the latter often produces naive exam-

ples that may lack complex reasoning. This pa-

per develops a realistic semi-automated frame-

work for data augmentation for tabular infer-

ence. Instead of manually generating a hypoth-

esis for each table, our methodology gener-

ates hypothesis templates transferable to simi-

lar tables. In addition, our framework entails

the creation of rational counterfactual tables

based on human written logical constraints and

premise paraphrasing. For our case study, we

use the INFOTABS (Gupta et al.,2020), which

is an entity-centric tabular inference dataset.

We observed that our framework could gen-

erate human-like tabular inference examples,

which could beneﬁt training data augmenta-

tion, especially in the scenario with limited su-

pervision.

1 Introduction

Natural Language Inference (NLI) is a Natural Lan-

guage Processing task of determining if a hypothe-

sis is entailed or contradicted given a premise or is

unrelated to it (Dagan et al.,2013). The NLI task

has been extended for tabular data where it takes

tables as the premise instead of sentences, namely

tabular inference task. Two popular human-curated

datasets for tabular reasoning, TABFACT (Chen

et al.,2020b) and INFOTABS (Gupta et al.,2020)

datasets, have enhanced recent research in this area.

However, human-generated datasets are limited

in scale and thus insufﬁcient for learning with large

language models (e.g., Devlin et al.,2019;Liu

et al.,2019a). Since curating these datasets requires

∗Equal Contribution †Corresponding Author

expertise, huge annotation time, and expense, they

cannot be scaled. Furthermore, it has been shown

that these datasets suffer from annotation bias and

spurious correlation problem (e.g., Poliak et al.,

2018;Gururangan et al.,2018;Geva et al.,2019).

In contrast, automatically generated data lacks di-

versity and have naive reasoning aspects. Recently,

use of large language generation model (e.g., Rad-

ford et al.;Lewis et al.,2020;Raffel et al.,2020) is

also proposed for data generation (e.g., Zhao et al.,

2022;Ouyang et al.,2022;Mishra et al.,2022).

Despite substantial improvement, these generation

approaches still lack factuality, i.e., suffer hallucina-

tion, have poor facts coverage, and also suffer from

token repetition (refer to Appendix §Eanalysis).

Recently, Chen et al. (2020a) shows that automatic

tabular NLG frameworks cannot produce logical

statements and provide only surface reasoning.

To address the above shortcomings, we propose

a semi-automatic framework that exploits the pat-

terns in tabular structure for hypothesis generation.

Speciﬁcally, this framework generates hypothesis

templates transferable to similar tables since ta-

bles with similar categories, e.g., two athlete tables

in Wikipedia, will share many common attributes.

In Table 1the premise table key attributes such

as “Born”, “Died”, “Children” will soon be shared

across other tables from the “Person” category. One

can generate a template for tables in the Person cat-

egory, such as <Person

Name> died before/after

<Died:Year>. This template could be used to gen-

erate sentences as shown in Table 1hypothesis

H1 and H1

. Furthermore, humans can utilize

cell types (e.g., Date, Boolean) for generation tem-

plates. Recently, it has been shown that training

on counterfactual data enhances model robustness

(Müller et al.,2021;Wang and Culotta,2021;Ra-

jagopal et al.,2022). Therefore, we also utilize

the overlapping key pattern to create counterfac-

tual tables. The complexity and diversity of the

templates can be enforced via human annotators.

arXiv:2210.12795v1 [cs.CL] 23 Oct 2022

Janet Leigh (Original) Janet Leigh (Counter-Factual)

Born July 6, 1927 Born July 6, 1927

Died October 3, 2004 Died January 13, 1994

Children Kelly Curtis; Jamie Lee Curtis Children Kelly Curtis

Alma Mater Stanford University Alma Mater University of California

Occupation None Occupation Scientist

H1: Janet Leigh was born before 1940. EH1C: Janet Leigh was born after 1915. E

H2: The age of Janet Leigh is more than 70. EH2C: The age of Janet Leigh is more than 70. C

H3: Janet Leigh has 1 children CH3C: Janet Leigh has more than 2 children. C

H4: Janet Leigh graduated from Stanford University EH4C: Janet Leigh graduated from Stanford University C

Table 1: A example of an original and counterfactual table from the "Person" category. Here, we illustrate how

multiple operations can be used to alter different keys. In addition, we have shown how the labels (E - Entail,C -

Contradict) for a speciﬁc hypothesis can alter. In the “Janet Leigh” example table, the ﬁrst column represents the

keys (e.g. Born; Died) and the second column has the relevant values (e.g. July 6,1927; October 3, 2004 etc).

Additionally, one can further enhance the diversity

by automatic/manual paraphrasing (Dagan et al.,

2013) of the template or generated sentences.

To show the effectiveness of our proposed frame-

work, we conduct a case study with INFOTABS

dataset. INFOTABS is an entity-centric dataset for

tabular inference, as shown in example Table 1. We

extend the INFOTABS data (25K table-hypothesis

pair) by creating AUTO-TNLI, which consists of

1,478,662 table-hypothesis pairs derived from 660

human written templates based on 134 unique ta-

ble keys from 10,182 tables. For experiments, we

utilize AUTO-TNLI in three ways (a.) as a stan-

dalone tabular inference dataset for benchmarking,

(b.) as a potential augmentation dataset to enhance

tabular reasoning on INFOTABS, i.e., the human-

created data (c.) as evaluation set to assess model

reasoning ability. We show that AUTO-TNLI is an

effective data for benchmarking and data augmen-

tation, especially in a limited supervision setting.

Thus, this semi-automatic generation methodology

has the potential to provide the best of both worlds

(automatic and human generation).

To summarize, we make the following contribu-

tions in this paper:

•

We propose a semi-automatic framework that

exploits the patterns in tabular structure for

hypothesis generation.

•

We apply this framework to extend the IN-

FOTABS (Gupta et al.,2020) dataset and

create a large-scale human-like synthetic

data AUTO-TNLI that contains counterfac-

tual entity-based tables.

•

We conduct intensive experiments using

AUTO-TNLI and demonstrate it helps bench-

mark and data augmentation, especially in a

limited supervision setting.

The dataset and associated scripts, are available

at https://autotnli.github.io.

2 Proposed Framework

Our framework includes four main components:

(a.) Hypothesis Template Creation, (b.) Ratio-

nal Counterfactual Table Creation, (c.) Paraphras-

ing of Premise Tables, and (d.) Automatic Table-

Hypothesis Generation. .

2.1 Hypothesis Template Creation

For a particular category of tables (e.g., movie),

the row attributes (i.e. keys) are mostly overlap-

ping across all tables (e.g., Length,Producer,Di-

rector, and others). Therefore, this consistency

across table beneﬁts in writing table category spe-

ciﬁc

key-based rules

to create logical hypothesis

sentences. We create such key-based rules for the

following reasoning types: (a.) Temporal Reason-

ing, (b.) Numerical Reasoning, (c.) Spatial Rea-

soning, (d.) Common Sense Reasoning. Table 3

provide examples of logical rules used to create

templates. We denote the category of a table as

Category

and the table row keys of as <Key>. In

addition, each template is paraphrased to enhance

lexical diversity.

Frequently, these key-based reasoning rules gen-

eralize effectively across several categories. For

example, the temporal reasoning rule based on

the date-time type could be minimally modiﬁed

to work for <Release Date> of category

Movies

ta-

bles, as well as the <Established Date> of category

University

tables, in addition to the <Born> of cat-

egory

Person

in Table 3. Additionally, reasoning

rules can be expanded to incorporate multi-row en-

tities from the same table’s data, as illustrated in

Table 3for the numerical reasoning type. Other

examples for the same are "The elevation range of

−

<LowestEleva-

Figure 1: Our Proposed Framework. yellow represents modiﬁed values in the counterfactual tables.

tion>" for category

City

table, "<SportName> was

held at <location> on <date>" for

Sports

category.

2.2 Rational Counterfactual Table Creation

We also construct counterfactual tables, as illus-

trated in Table 1, in which the values correspond-

ing to the original table’s keys are altered. This

counterfactual table contains non-factual unreal in-

formation but is consistent, i.e., the table facts are

not self contradictory. Language models trained

on such counterfactual instances exhibit greater ro-

bustness (Müller et al.,2021;Wang and Culotta,

2021;Rajagopal et al.,2022;Gupta et al.,2021)

and prevent the model from over-ﬁtting its pre-

learned knowledge. Beneﬁting model in grounding

and examining the premise evidence as opposed to

employing spurious correlation. To create counter-

factual table, we modify an original table with

keys. For a given category, these

keys constitute

a subset of the

possible unique keys (

n >=k

)

for that category.

To construct a counterfactual table, we modify

the original table in one or more of the following

ways: (a.) keep the row as it without any change,

(b.) adding new value to an existing key, (c.) substi-

tuting the existing key-value with counter-factual

data, (d.) deleting a particular key-value pair from

the table, (e.) and add a missing new keys (i.e. a

key from (

n−k

) ), (f.) and adding a missing key

row to the table. For creating counterfactual ta-

bles, for each row of existing, a subset of operation

is selected at a random each with a pre-decided

probability p(a hyper-parameter).

While creating these tables, we impose an es-

sential key-speciﬁc constraints to ensure logical

rational in the generated sentences. E.g. in the ex-

ample Table 1, for the counterfactual table of Janet

Leigh (Counterfactual), the <Born> is kept simi-

lar to original of Janet Leigh (Original), whereas

<Died> has been substituted for another Person

table, while ensuring the constraint BORN DATE

< DEATH DATE i.e. Jan 13, 1994 (Died Date of

Counterfactual Table) is after July 6, 1927 (Born

Date of Counterfactual Table)). Without the fol-

Train-Data City Album Person Movie Book F&D Org Paint Fest S&E Univ

Orig 78.32 67.81 92.45 97.12 96.31 92.27 92.44 98.93 87.44 82.53 85.59

Orig +Count 61.89 68.26 94.45 98.67 98.72 97.04 96.46 99.56 93.73 95.68 93.02

MNLI +Orig 78.6 68.12 92.89 97.74 97.21 93.19 93.06 99.36 88.12 84.18 87.03

MNLI +Orig +Count 62.32 68.01 94.54 99.01 98.46 97.47 96.8 99.63 93.66 95.08 93.56

Table 2: Category-wise results for AUTO-TNLI (F&D- Food & Drinks, S&E - Sports & Events)

Reasoning Category Template-Rules Table-Constraints

Temporal Person <Person> was born in a leap year. Born Date ≤

<Person> died before/after <Died:Year> Death Date

Numerical Movie <Movie> was a "hit if <Box Ofﬁce> −<Budget> else ﬂop" Budget ≥0

<Movie> had a Box Ofﬁce collection of <BoxOfﬁce>

Spatial Movie <Movie> was released in <Release1:Loc>, "X" months before/after Release1:Location 6=

<Release2:Location> Release2:Location

KCS City The governing of <City> is supervised by <Mayor> Lowest Elevation ≤

<Mayor> is an important local leader of <City> Highest Elevation

Table 3: Rules and Constraints are classiﬁed into speciﬁc areas of reasoning, as indicated in the table. A few

examples of rules and constraints have been provided for each category. <Died:Year> indicates that the year value

is extracted from <Died> , whereas <Release1:Location> indicates that the location is extracted from a single

key-value pair in <Release>. KCS denote knowledge and common sense reasoning in this context.

lowing the constraint that BORN DATE < DEATH

DATE, the table with become rationally incorrect

or self contradictory.

2.3 Paraphrasing of Premise Tables

Lack of linguistic variety is a signiﬁcant con-

cern with grammar-based data generating methods.

Therefore, we employ both automated and human

paraphrase of premise tables to address diversity

problem. For each key for of a given category, we

create at least three to ﬁve simple paraphrased sen-

tences of the key-speciﬁc template. E.g. for <Alma

Mater> from

Person

, possible paraphrases can be

"<PersonName> earned his degree from <Alma-

Mater>", "<PersonName> is a graduate of <Al-

maMater>", and "<AlmaMater> is a alma mater

of <PersonName>". We observe that paraphras-

ing considerably increases the variability across

instances.

2.4 Automatic Table-Hypothesis Generation

Once the templates are constructed as discussed in

§2.1, they can be used to automatically ﬁll in the

blanks from the entries of the considered tables and

create logically rational hypothesis sentences. To

create contradictory sentences, we randomly select

a value from a collection of key values shared by

all tables to ﬁll in the blanks. This replacement

ensures that the key-speciﬁc constraints, such as

the key-value type, are adhered to. Furthermore,

we ensure that similar template with minimal to-

ken alteration is used to create entail contradict

pair. This way of creating entail and contradiction

statement pairs with lexically overlapping tokens

ensure that, future model trained on such data won’t

adhere spurious correlation from the tabular NLI

data i.e. minimising the hypothesis bias problem

(Poliak et al.,2018). For example, for movie "Iron-

man" movie with rows "Budget:$140 million" and

"Box-ofﬁce:$585.8 million", using the template

<Movie> was a "hit

−

else

ﬂop" from example Table 3, one can generate

hypothesis entail: "The movie Ironman was a hit"

and contradict: "The movie Ironman was a ﬂop".

3 The AUTO-TNLI Dataset

We apply our framework as described in §2on an

entity speciﬁc tabular inference dataset INFOTABS

to construct AUTO-TNLI. INFOTABS (Gupta et al.,

2020) consists of pairs of NLI instances: a hypoth-

esis statement grounded and inferred on premise

table is extracted from Wikipedia Infobox table

across multiple diverse categories. We construct

the AUTO-TNLI dataset from a subset of the IN-

FOTABS dataset (

out of

total categories),

which includes the original table plus ﬁve counter-

factual tables corresponding to each original table,

for a total of

10,182

tables. We retrieve

134

keys

and

660

templates, which we utilize to generate

1,478,662

sentences. However, unlike INFOTABS,

which contains

labels, ENTAIL,CONTRADICT

and NEUTRAL, AUTO-TNLI contains only two

labels ENTAIL and CONTRADICT.

Statistic Metric Numbers

Number of Unique Keys 134

Average number of keys per table 12.63

Average number of sentences per table 164.51

Table 4: AUTO-TNLI Statistics.

As previously reported in the original IN-

FOTABS paper by Gupta et al. (2020), annotators

are biased towards speciﬁc keys over others. For

example, for the category

Company

, annotators

would create more sentences for the key <Founded

by> than for the key <Website>, resulting in an

inherent hypothesis bias in the dataset. While cre-

ating the templates for AUTO-TNLI, we ensure

that each key has a minimum of two hypotheses

and a minimum of three (

) premise paraphrases,

which helps mitigate hypothesis bias. To address

the inference class imbalance labeling issue, we

construct approximately 1:1 ENTAIL to CONTRA-

DICT the hypothesis.

We observe that most additional human labor

required to build such sentences is spent on the set

of key-speciﬁc rules and constraints that ensure the

sentences are grammatically accurate. The counter-

factual tabular data is logically consistent, i.e., not

self-contradictory. Table 4details the number of

unique keys, the minimum/maximum/average num-

ber of keys, and the total number of sentences per

table in AUTO-TNLI. As can be observed, the sys-

tem generates a large amount of AUTO-TNLI data

compared to limited INFOTABS while using only a

few human-constructed templates with key-speciﬁc

rules and constraints.

We have chosen INFOTABS as it has three evalu-

ation sets

α1

α2

, and

α3

, in addition to the regular

training and development sets. The

α1

set is lex-

ically and topic-wise similar to the train set, and

α2

the hypothesis is lexically adversarial to the

train set. And in

α3

the tables are from topics not

in the train set. Moreover, it has multiple reason-

ing types such as multi-row reasoning, entity type,

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

RealisticDataAugmentationFrameworkforEnhancingTabularReasoningDibyakantiKumar1,VivekGupta2*y,SoumyaSharma3,ShuoZhang41IITGuwahati;2UniversityofUtah;3IITKharagpur;4Bloomberg;dibyakan@iitg.ac.in;vgupta@cs.utah.edu;soumyasharma20@gmail.com;szhang611@bloomberg.netAbstractExistingapproachestoconstructin...

展开>> 收起<<

Realistic Data Augmentation Framework for Enhancing Tabular Reasoning Dibyakanti Kumar1 Vivek Gupta2y Soumya Sharma3 Shuo Zhang4.pdf

共19页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Realistic Data Augmentation Framework for Enhancing Tabular Reasoning Dibyakanti Kumar1 Vivek Gupta2y Soumya Sharma3 Shuo Zhang4

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: