
SELF-CONSISTENT REASONING FOR SOLVING MATH WORD PROBLEMS
Jing Xiong1, Zhongwei Wan2,3, Xiping Hu1, Min Yang2,3, Chengming Li1
1Sun Yat-Sen University, China
2University of Chinese Academy of Sciences, China
3SIAT, Chinese Academy of Sciences, China
xiongj69@mail2.sysu.edu.cn, {huxiping, lichengming}@mail.sysu.edu.cn, {zw.wan1, min.yang}@siat.ac.cn
ABSTRACT
Math word problems (MWPs) is a task that automatically de-
rives solution expression from a giving math problems in text. The
previous studies suffer from spurious correlations between input
text and output expression. To mitigate this issue, we propose a
self-consistent reasoning framework called SCR, which attempts to
adopt a pruning strategy to correct the output distribution shift so
as to implicitly fix those spurious correlative samples. Specifically,
we firstly obtain a sub-network by pruning a roberta2tree model, for
the sake to use the gap on output distribution between the original
roberta2tree model and the pruned sub-network to expose spuri-
ous correlative samples. Then, we calibrate the output distribution
shift by applying symmetric Kullback-Leibler divergence to alle-
viate spurious correlations. In addition, SCR generates equivalent
expressions, thereby, capturing the original text’s logic rather than
relying on hints from original text. Extensive experiments on two
large-scale benchmarks demonstrate that our model substantially
outperforms the strong baseline methods.
Index Terms—Math Word Problems, Spurious correlative sam-
ples, Pruning, Self-consistency
1. INTRODUCTION
Math word problems (MWPs) [1] is a challenging symbolic logi-
cal reasoning task based on natural language description and draws
much attention from researchers about the reasoning power of large
language models [2,3,4,5,6,7] recently. MWPs aims to auto-
matically solve mathematical questions given in a natural language,
which requires the model not only to understand the natural language
but also to have the ability to reason logically. Table 1shows several
examples of MWPs.
At present, there are mainly three paradigms of models that have
achieved excellent performance, namely seq2seq [8,9,10,11,12,
13], seq2tree [14,15], and complex relation extraction [16]. But all
three paradigm model suffers from spurious correlations[17,18,16].
Take an example of Table 1, some of the previous works may ob-
tain the same mathematical formula as ”a÷b×c” for Problem
1 and Problem 2, due to the similar semantic context information,
e.g., Calculate the money situation. However, if the models ignore
the spurious correlations, they intend to generate the wrong solution
expression for Problem 3, which has very analogous semantic infor-
mation like the important words ”money”, ”bank”, and ”account,”
which exist in problems 1 and 2. To be specific, the models that
learn spurious information among problems 1 to 3 are more likely
to generate the wrong expression ”12500÷5%×15%” instead of
Problem 1: Tom takes the money from his bank ac-
count and has taken 240 dollars of his account for 3
days. If he takes the same amount of money every day,
how much money will Tom take for next 2 days?
Solution Expression: 240÷3×2Solution: 160
Problem 2: Sherry has deposited 6000 dollars to the
bank for the last 5 months. If she saves the same
monthly money, how much will she add to the account
in the next 3 months?
Solution Expression: 6000÷5×3Solution: 3600
Problem 3: Uncle Jack spends 5%of his bank account
to invest for the trust funds of States and 15%of the
account for the shares of Apple Inc. The money he has
spent on financial management is 12500 dollars. How
much money is in Uncle Jack’s account?
Solution Expression: 12500÷(5%+ 15%)Solu-
tion: 62500
Wrong Solution Expression: 12500÷5%×15%
Table 1. A typical math word examples of the spurious correlation.
”12500÷( 5%+ 15%)” for problem 3 Calculate the money of the
account.
Some recent models address this problem by using variational
information bottlenecks[15]. Our article considers this problem from
the perspective of memorization. Some recent articles have revealed
that pruning can make the model forget some hard-to-memorize
samples[19]. In addition, [20] have revealed that long-tailed sam-
ples are easily forgettable. Usually, these long-tailed samples easily
confuse the model, and the model will generate the final result based
on some shallow hints. A natural hypothesis is that some spurious
correlative samples are tougher for the model to learn well due to
shortcuts, and these samples can be adaptively exposed by pruning
[20]. A key question in the task of MWPs is how to implicitly
emphasize their shortcuts between expressions and original texts
when exposing spurious correlative samples through pruning. Some
work about the reasoning ability of large models has also revealed
that encouraging the model to produce self-consistent outputs can
effectively improve reasoning performance when the model pro-
duces multiple inferences [2,3,4]. However, their work uses voting
to encourage self-consistency, which cannot adaptively correct the
shortcuts of expressions and original texts online through the loss
function.
In this paper, we propose a self-consistent reasoning framework
(called SCR) to solve MWPs tasks. We obtain a sub-network by
arXiv:2210.15373v2 [cs.CL] 17 Feb 2024