Learning to Reason With Relational Abstractions Andrew J. Nam1 Mengye Ren2 Chelsea Finn1 James L. McClelland1 1Stanford University2NYU

2025-04-29 0 0 756.08KB 27 页 10玖币

侵权投诉

Learning to Reason With Relational Abstractions

Andrew J. Nam∗1, Mengye Ren∗2, Chelsea Finn1, James L. McClelland1

1Stanford University, 2NYU

December 7, 2022

Abstract

Large language models have recently shown promising progress in mathematical reasoning

when ﬁne-tuned with human-generated sequences walking through a sequence of solution steps.

However, the solution sequences are not formally structured and the resulting model-generated

sequences may not reﬂect the kind of systematic reasoning we might expect an expert human to

produce. In this paper, we study how to build stronger reasoning capability in language models

using the idea of relational abstractions. We introduce new types of sequences that more explic-

itly provide an abstract characterization of the transitions through intermediate solution steps

to the goal state. We ﬁnd that models that are supplied with such sequences as prompts can

solve tasks with a signiﬁcantly higher accuracy, and models that are trained to produce such se-

quences solve problems better than those that are trained with previously used human-generated

sequences and other baselines. Our work thus takes several steps toward elucidating and im-

proving how language models perform on tasks requiring multi-step mathematical reasoning.

1 Introduction

Deep learning has had tremendous success in a wide range of domains, such as vision [He et al.,

2016], language [Brown et al., 2020], and playing games at superhuman levels [Mnih et al., 2015,

Silver et al., 2016, Vinyals et al., 2019]. Yet despite these accomplishments, these systems remain

limited in their formal and mathematical reasoning abilities [Saxton et al., 2019, Cobbe et al., 2021,

Hendrycks et al., 2021]. Although there have be recent impressive gains Lewkowycz et al. [2022],

the models remain challenged to succeed at harder problems.

Recent work suggest that neural networks, like humans, beneﬁt from relying on a chain of reason-

ing steps rather than attempting to produce the ﬁnal output as a direct mapping from the problem

prompt [Recchia, 2021, Nye et al., 2021, Hendrycks et al., 2021, Cobbe et al., 2021, Lewkowycz et al.,

2022]. These works rely entirely on naturalistic data and manipulations, in the sense that problems

and their step-wise solutions are taken as they are found in existing sources, or human annotators

are asked to produce a sequence of solution steps using numbers interspersed with natural language.

However, while naturalistic sentences are certainly how we often communicate our solutions to each

other informally, we argue that formal and mathematical reasoning depends on identifying and ex-

ploiting the set of abstract relationships that underlies the details of the problem at hand. Even

in settings where the focus is on the step-wise manipulation of quantities to obtain valid practical

results, a set of abstract relationships underlies the sequence of operations.

We build on this intuition by exploring the possibility that, if a problem-solver can formulate

the problem under consideration at an abstract level, this will be conducive to ﬁnding the correct

sequence of more speciﬁc arithmetic operations. However, to our knowledge, no math dataset

currently exists that utilizes natural language and also isolates key reasoning components such as

entities and their relations, i.e. there is no way to train the model to convert natural language

inputs into these core elements. We address this gap by proposing a new dataset, GSM8K-R, by

expanding on the GSM8K dataset [Cobbe et al., 2021], a dataset containing grade-school level math

∗Equal Contribution.

arXiv:2210.02615v2 [cs.LG] 5 Dec 2022

Num 1

A. Numeric only B. Relational-First C. Interleaved D. Multitask

Num 2 Num 3

Num 1 Num 2 Num 3

Rel 1 Rel 2 Rel 3 Num 1

Num 2 Num 3

Rel 1 Rel 2

Rel 3 Num 1 Num 2 Num 3

Rel 1 Rel 2 Rel 3

<relation> eggs laid per day -eggs

for breakfast -eggs for baking =

remaining eggs;

remaining eggs * price per egg =

amount earned daily from eggs

<equation> 16-3-4=9; 9*2=18

<relation> eggs laid per day -eggs

for breakfast -eggs for baking =

remaining eggs;

remaining eggs * price per egg =

amount earned daily from eggs

<equation> 16-3-4=9; 9*2=18

eggs laid per day -eggs for

breakfast -eggs for baking =

remaining eggs; 16-3-4=9;

remaining eggs * price per egg =

amount earned daily from eggs;

9*2=18

16-3-4=9; 9*2=18

Math Question: Janet's ducks lay 16 eggs per day. She eats 3 for breakfast every morning and bakes muffins for her friends every day with 4. She sells the

remainder at the farmers' market daily for $2 per fresh duck egg. How much does she make every day?

Unit Conversion Task: H = 2A; F = 3D; B = 3A; I = 3F; E = 3B; J = 2I; B = 3C; F = 4E; G = 3C; I = 4H; D = 2C; G = 1B;

Convert J to G (mod 5)

1 * 2 = 2; 2 * 3 = 1; 1 * 3 = 3; 3 * 2 =

1; 1 / 3 = 2;

C; C->G; <equation> 1 * 2 = 2; 2 * 3

= 1; 1 * 3 = 3; 3 * 2 = 1; 1 / 3 = 2;

J -> I; 1 * 2 = 2; I -> F; 2 * 3 = 1;

F->D; 1 * 3 = 3; D -> C; 3 * 2 = 1;

C->G; 1 / 3 = 2;

C; C->G;

<equation> 1 * 2 = 2; 2 * 3 = 1; 1 * 3

= 3; 3 * 2 = 1; 1 / 3 = 2;

Figure 1: We explore abstract relational reasoning by partitioning the reasoning process into the abstract relational

and the numeric part, and compare four diﬀerent possibilities: Numeric only (NN): Only numeric steps are

provided without any relational tokens; Relational-ﬁrst: (RRNN) The abstract relational parts are stated before

the numeric; Interleaved: (RNRN): relational then numeric parts occur in alternating sequence; and Multitask:

(RR|NN): The network learns to produce either the abstract relational or the numeric sequence to a task prompt,

then prompted for the numeric sequence at test time.

word problems, with human annotations that highlight the relational abstractions that are central

to mathematical reasoning. We also introduce a new synthetic task, called the unit conversion

(UC) task, in which the abstract relational problem is reduced to its essence that enables controlled

analyses without the complications that arise from naturalistic datasets.

At their core, both tasks involve reasoning about how diﬀerent quantities relate to each other,

and formulating appropriate arithmetic equations to perform the corresponding numerical computa-

tions. We can decompose each step of the solution into abstract relational reasoning and arithmetic

expressions, which can then be used to recompose the solution sequence in diﬀerent forms.

We summarize our main contributions as follows:

•We decompose the problem solving process into identifying the relevant abstract relationships and

performing the corresponding arithmetic manipulations.

•We present a new dataset called GSM8K-R that adds relational abstraction annotations to the

original GSM8K dataset [Cobbe et al., 2021] (to be released with the paper).

•We introduce the new synthetic task Unit Conversion task that brings out the importance of

engaging with the relational abstractions, even in smaller transformer models.

•We ﬁnd that teaching models to identify the relevant abstract relationships on trained problems

can lead to substantial performance gains at test, and identify several factors aﬀecting this out-

come.

•We ﬁnd that identifying the crucial abstract relationships remains a challenge, and that providing

the relational abstraction at test time can produce drastic gains.

Taken together, we believe these ﬁndings highlight the importance of identifying the relevant ab-

stract relations to enable correct formal and mathematical reasoning. In the discussion, we consider

next steps that may allow the development of artiﬁcial systems that capture this ability.

2 Incorporating Relational Abstraction

In this section, we describe our framework of incorporating relational abstractions into mathematical

reasoning. We begin with the notion that mathematical problem solving involves determining the

values of unknown quantities from known quantities, where a quantity is a numerical attribute of an

item or set, such as the price of an item or the number of items in the set. Quantities can be derived

from other quantities relying on rules that apply to quantities of relevant types. For example, as in

the problem shown in Table 1, the amount earned from selling some number of items (in this case,

eggs) is equal to the product of the number of items sold times the price per item.

In general, mathematical problem solving requires several operations on given quantities to obtain

a ﬁnal answer – a speciﬁed target or goal quantity. In the problem in Table 1, we are given the

number of eggs Janet’s ducks lay each day, eggs eaten for breakfast, eggs used in baking, and we

are told that she sells the remainder for a speciﬁed price per egg. To solve for how much money

she makes, we must ﬁrst determine the remainder by subtracting the number of eggs eaten and the

number of eggs used in baking from the number laid, and then determine the amount earned by

multiplying the remaining number of eggs times the price per egg.

This exempliﬁes what we call the abstract relational plan: a plan outlining the reasoning process

without invoking any numbers. Here, “eggs laid”, “eggs eaten”, “eggs used in baking”, “remaining

eggs” and “price per egg” are quantities needed to reach the target quantity. The abstract relational

plan speciﬁes the steps that must be applied to the given quantities to reach the relevant intermediate

quantities, and then applied to these quantities to reach the ﬁnal answer. What makes a plan abstract

is that it omits speciﬁc information – that is, the speciﬁc quantities involved – and connects items

through how they relate to each other at a more general or abstract level. What makes it relational

is that it speciﬁes which entities are relevant to each other in the problem. An abstract relational

plan formulates the problem as a graph of interconnected abstract entities, whose speciﬁc values

could be replaced by others without changing the set of relationships.

The problems found in the GSM8K dataset can all be seen as solvable by extracting the correct

abstract relational plan from the verbal statement of the problem and then applying the plan to

obtain the numeric value of the target quantities given the values of the given quantities. The chal-

lenge here is that GSM8K, and other math datasets like it, consists entirely of natural language data

that makes it diﬃcult to systematically extract the relevant entities and their relations. We address

this issue through our human-annotated dataset GSM8K-R that provides the ground truth labels

to train the model with, and we explore several instructional forms that utilize these annotations.

Figure 1 enumerates a few possibilities for how we can incorporate abstract relational reasoning

into the training and testing of a decoder-only transformer of the kind used in the GPT model series.

We ﬁrst decompose a solution sequence into an an abstract relational plan, consisting of a sequence

of abstract relational expressions as described above and a sequence of arithmetic expressions in-

volving only numbers and basic arithmetic operations. We can then train and test the models using

conditions of the following four types: numeric-only (NN) uses only the narithmetic sequences, and

serves as our baseline. In relational-then-numeric, (RRNN) the relational expressions are presented

before numeric ones. This represents the strategy of generating a high-level relational plan ﬁrst, and

then implementing the plan by performing the relevant arithmetic operations. The interleaved for-

mat (RNRN) alternates between the abstract relational expressions and the arithmetic expressions,

so that each arithmetic expression is accompanied by the relevant abstract relational expression.

Lastly, in the multitask approach (RR|NN), the model is prompted to output the sequence of ei-

ther the relational or the numeric expressions, but not both. This may allow the model to learn to

represent the problem at the abstract level and exploit such representations even when it is only pro-

ducing the numerical expressions. This approach tests the claim that additional auxiliary language

tokens eﬀectively function as regularizers or learning tools that can be discarded at test time and

may even suppress performance if included [Mu et al., 2020, Lampinen et al., 2022, Hendrycks et al.,

2021]. Moreover, learning and generating the two sequences separately has the added advantage of

generating shorter sequences at test time, just like numeric-only. In this paper, we examine which

type of relational abstraction brings the best reasoning capability in each of our two task settings.

3 Related Work

Although computational models of mathematical reasoning have been proposed for over half a cen-

tury [Bobrow, 1964], application of neural network models began much more recently using recurrent

networks for sequence-to-sequence prediction [Wang et al., 2017]. Shortly after their introduction

in Vaswani et al. [2017], Saxton et al. [2019] found that transformers-based models outperformed

other architectures when trained to generate the answer directly from the problem statement. Many

researchers have explored enhancing model performance by ﬁne-tuning to produce intermediate

equations or programs [Shi et al., 2015, Upadhyay and Chang, 2015, Amini et al., 2019, Miao et al.,

2020, Drori et al., 2021]. Recent advances rely on large transformer-based language models [Brown

et al., 2020, Thoppilan et al., 2022, Chowdhery et al., 2022, Lewkowycz et al., 2022] and/or datasets

involving full step-by-step solutions in natural language [Ling et al., 2017, Hendrycks et al., 2021,

Welleck et al., 2021, Cobbe et al., 2021, Drori et al., 2021].

Interestingly, prompting large language models such as GPT-3 to generate chains of thought with

just a few examples at test time can enhance performance considerably [Wei et al., 2022], indicating

that the models may already have the ability to engage in a step by step reasoning process, in part

because such a process is exempliﬁed in their training. Many recent works use multiple samples

from a model, either using a veriﬁer trained on model-generated responses to re-rank candidate

sequences Cobbe et al. [2021] or relying on a majority voting scheme [Wang et al., 2022]. The

strongest results overall to date [Lewkowycz et al., 2022] use a very large transformer based language

model, ﬁne-tuned on scientiﬁc and mathematical text, provided with a chain of thought prompt, and

assessed using majority voting. However, these models still only achieve modest scores on harder

problems, consistent with the view Hendrycks et al. [2021] that simply scaling up the model size is

an intractable strategy for solving mathematics problems of higher diﬃculty, even with the added

beneﬁt of chain-of-thought prompting, veriﬁers, or majority voting.

Common across these existing works is the use of human-generated solution sequences. In our

work, we introduce our GSM8K-R dataset to explicitly contrast performance on diﬀerent types of

solution sequences and explore how explicit focus on generating a structured abstract relational

plan can improve learning, an analysis that would not be possible with existing datasets. We

also introduce the unit conversion (UC) task, a completely synthetic task domain to complement

our exploration of solving problems expressed in natural language. This parallels the approach of

Gontier et al. [2020], with a crucial diﬀerence. These authors investigated logical reasoning over

a ﬁxed data-base of speciﬁc relational facts, training models to produce an inferable relation to a

probe question, and found only small advantages of a plan sequence compared to generating the

answer directly. In contrast, our UC task aﬀords separating the abstract relational plan from the

speciﬁc numerical computations. This allows us to demonstrate a striking advantage from learning

to produce the abstract relational sequence rather than just the necessary numerical expressions.

4 Experiments

We use two tasks to explore the possible beneﬁts or relational abstractions: a set of natural language

math problems from the Grade School Math 8K (GSM8K) dataset [Cobbe et al., 2021], and an

abstract unit conversion task (UC) in which the model must determine how the number of units of

one type corresponds to a speciﬁed number of units of another type. Both tasks contain quantities

and relations that can be represented by a graph, and involve formulating and solving a series of

numerical equations. However, the two tasks pose diﬀerent challenges, allow diﬀerent approaches to

model training, and aﬀord diﬀerent comparison conditions and analyses.

The GSM8K dataset consists of realistic word problems requiring a broad understanding of

mathematical concepts and their application to grade school math problems. The dataset includes

human-generated mixed expressions that usually step through the problems in a linear order corre-

sponding to the problem statement in a fairly small number of solution steps. Because these are word

problems, they challenge the model’s natural language understanding and general world knowledge

(such as the fact that a dozen consists of 12 items, or that the number of eggs increases when it is

laid by a chicken but decreases when it is used in baking cookies). We present our GSM8K-R dataset

by building on the GSM8K dataset, adding human annotations that extract the core components

of the reasoning process, namely the entities, quantities, and the arithmetic operations that deﬁne

the entities’ relations. In this setting we ﬁne-tune pre-trained language models and compare our

proposed conditions to the natural language based comparison conditions provided with the data

set.

The unit conversion task avoids the natural language understanding and world knowledge issues

Model, training regime, and dataset

GPT2-XL, ﬁne-tuned on GSM8K Simple transformer trained from

scratch on unit conversion dataset

Type of steps

in training

Problem

prompt only

Problem & relational

plan prompt

Problem

prompt only

Problem & relational

plan prompt

Answer only

baseline 4.93 - 24.7 -

Numeric only

(NN) 22.97 - 25.9 -

Multitask

(RR|NN) 28.05 - 29.8 -

Relational First

(RRNN) 19.48 64.59

71.1

W/ correct plan: 96.8

W/ incorrect plan: 21.0

Plan accuracy: 66.2

96.7

Interleaved

(RNRN) 22.97 66.26

85.8

W/ correct plan: 99.9

W/ incorrect plan: 20.3

Plan accuracy: 82.3

99.9

Table 1: Key results demonstrating the key ﬁndings from the parallel conditions of our two experiments. Fuller

deﬁnition of the conditions are given in the caption for Figure 1.

by presenting conversion rules in a simple symbolic form. This allows us to present problems requiring

the use of a larger number of speciﬁed relationships that are presented to the model in a random

order and requiring longer sequences of solution steps. In this setting we use smaller scale models

that we are able to train end-to-end, allowing us consider several additional variations of the training

regime and to analyze the model’s step-by-step performance more straightforwardly. Together our

two tasks oﬀer both a rich, naturalistic environment with empirical results for broader applicability

and a systematic, synthetic environment that reduces mathematical reasoning to its most abstract

form, bringing out the advantage of relational abstractions more clearly.

Table 1 presents key results from the four conditions illustrated in Figure 1. In both the GSM8K-

R and UC tasks, the models perform very poorly after ﬁne tuning to generate the answer directly

from a problem statement (25% correct is the chance level on the UC task), and training on nu-

meric sequences produces some improvement for GSM8K-R but only a hint of a gain over chance

level for UC. The multitask condition produces slight gains for but models, but the real big gains

are observed when the models have been trained to produce relational sequences either before or

alternating with the numerical sequences. For GSM8K-R, the beneﬁt only appears when the rela-

tional plan is included in the prompt at test time. In the UC setting, we also see big gains when

the model produces the relational sequence for itself, and we also see that this advantage comes

only on trials where the model produces the relational sequence correctly. Indeed, either when the

model produces the relational sequence correctly itself or when prompted with the correct relational

sequence, performance is at near-ceiling levels. In the next sections we describe the two data sets

and experiments in more detail, along with further many ﬁndings from many additional comparison

conditions.

4.1 Task 1: Solving Grade School Math Problems

We ﬁrst evaluate our framework on more realistic problems posed using natural language in the

GSM8K-R dataset, which contains around 7.7K training question and 1.3K test questions from the

original GSM8K dataset with additional human annotated solutions, all in the form of the English

language. An example of the problem and its solution can be found in the ﬁrst two rows of Table 2.

The original dataset contains the following possible solution formats:

•The original solution format was used in the original paper. It provides solution steps in natural

language annotated with executable equations. It is similar to our interleaved approach in that

the target unit of each step often appears at the end of the sentence (e.g. Janet sells 16-3-4 eggs

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

LearningtoReasonWithRelationalAbstractionsAndrewJ.Nam*1,MengyeRen2,ChelseaFinn1,JamesL.McClelland11StanfordUniversity,2NYUDecember7,2022AbstractLargelanguagemodelshaverecentlyshownpromisingprogressinmathematicalreasoningwhenne-tunedwithhuman-generatedsequenceswalkingthroughasequenceofsolutionsteps...

展开>> 收起<<

Learning to Reason With Relational Abstractions Andrew J. Nam1 Mengye Ren2 Chelsea Finn1 James L. McClelland1 1Stanford University2NYU.pdf

共27页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Learning to Reason With Relational Abstractions Andrew J. Nam1 Mengye Ren2 Chelsea Finn1 James L. McClelland1 1Stanford University2NYU

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: