learn a rule for Zendo, the learner may generate the partial
hypothesis shown in Figure 1c. In this hypothesis, the first-
order variable
N
is marked as a numerical variable with the
predicate symbol @numerical but is not yet bound to any
particular value.
In the numerical search stage, the learner searches for val-
ues for the numerical variables using the training examples.
We encode the search for numerical values as a satisfiability
modulo theories (SMT) formula. For instance, to find values
for
N
in the hypothesis in Figure 1c, the learner executes
the partial hypothesis without its numerical literal geq(D,N)
against the examples to find possible substitutions for the vari-
able
D
, from which it builds a system of linear inequations.
These inequations constrain the search for the numerical vari-
able
N
with the values obtained for
D
from the examples.
Finally, the learner substitutes
N
in the partial program with
any solution found for the inequations.
To implement our idea, we build on the state-of-the-art
learning from failures (LFF) (Cropper and Morel 2021) ILP
approach. LFF is a constraint-driven ILP approach where the
goal is to accumulate constraints on the hypothesis space. A
LFF learner continually generates and tests hypotheses, from
which it infers constraints. We implement our numerical rea-
soning approach in NUMSYNTH, which, as it builds on the
LFF learner POPPER, supports predicate invention and learn-
ing recursive and optimal (in textual complexity) programs.
NUMSYNTH uses built-in numerical literals to additionally
support linear arithmetical reasoning over real numbers and
integers.
Novelty and Contributions
Compared to existing ap-
proaches, the main novelty of our approach is expressiv-
ity: NUMSYNTH can learn programs with numerical values
which require reasoning over multiple examples in linear
arithmetic fragments. In other words, our approach can learn
programs that existing approaches cannot. For instance, our
experiments show that our approach can learn programs of
the form shown in Figure 1b. In addition, our approach can
(i) efficiently search for numerical values in infinite domains
such as real numbers or integers, (ii) identify numerical val-
ues which may not appear in the background knowledge, and
(iii) learn programs with several chained numerical literals.
For instance, it can learn that the sum of two variables is
lower than some particular numerical value. As far as we
are aware, no existing approach can efficiently solve such
problems. Overall, we make the following contributions:
1.
We introduce an approach for numerical reasoning in infi-
nite domains. Our approach supports numerical reasoning
in linear arithmetic fragments.
2.
We implement our approach in NUMSYNTH, which can
learn programs with numerical values, perform predicate
invention, and learn recursive and optimal programs.
3.
We experimentally show on four domains (geometry, bi-
ology, game playing, and program synthesis) that our
approach can (i) learn programs requiring numerical rea-
soning, and (ii) outperform existing ILP systems in terms
of learning time and predictive accuracy.
2 Related Work
Program Synthesis.
Program synthesis approaches that
enumerate the search space (Raghothaman et al. 2019; Ellis
et al. 2018; Evans et al. 2021) can only learn from small
and finite domains and cannot learn from infinite domains.
Several program synthesis systems delegate the search for
programs to an SMT solver (Jha et al. 2010; Gulwani et al.
2011; Reynolds et al. 2015; Albarghouthi et al. 2017). By
contrast, we delegate the numerical search to an SMT solver.
Moreover, NUMSYNTH can learn programs with numerical
values from infinite domains. Sketch (Solar-Lezama 2009)
uses a SAT solver to search for suitable constants given a
partial program, where the constants can be numerical val-
ues. This approach is similar to our numerical search stage.
However, Sketch does not learn the structure of programs but
expects as input a skeleton of a solution: it requires a partial
program and its task is to fill in missing values with constants
symbols. By contrast, NUMSYNTH learns both the program
and numerical values.
ILP.
Many ILP approaches (Muggleton 1995; Srinivasan
2001) use bottom clause construction to search for programs.
However, these approaches can only identify numerical val-
ues that appear in the bottom clause of a single example.
They cannot reason about multiple examples jointly, which
is, for instance, necessary to learn inequations.
Constraint inductive logic programming (Sebag and Rou-
veirol 1996) uses constraint logic programming to learn pro-
grams with numerical values. This approach generalises a
single positive example given some negative examples and is
restricted to numerical reasoning in difference logic.
Anthony and Frisch (1997) propose an algorithm to learn
hypotheses with numerical literals. FORS (Karali
ˇ
c and
Bratko 1997) fits regression lines to subsets of the positive ex-
amples in a positive example only setting. In contrast to NUM-
SYNTH, these two approaches allow some error in numerical
values predicted by numerical literals. However, these two ap-
proaches follow top-down refinement with one specialisation
step at a time, which prevents them from learning hypotheses
with multiple chained numerical literals.
TILDE (Blockeel and De Raedt 1998) uses a discretiza-
tion procedure to find candidate numerical constants while
making the induction process more efficient (Blockeel and
De Raedt 1997). However, TILDE cannot learn recursive
programs and struggles to learn from small numbers of ex-
amples.
Many recent ILP systems enumerate every possible rule in
the search space (Corapi, Russo, and Lupu 2011; Kaminski,
Eiter, and Inoue 2018; Evans and Grefenstette 2018; Sch
¨
uller
and Benz 2018) or all constant symbols as unary predicate
symbols (Evans and Grefenstette 2018; Cropper and Morel
2021; Purgał, Cerna, and Kaliszyk 2022) and therefore cannot
handle infinite domains.
LFF.
Recent LFF systems represent constant symbols with
unary predicate symbols (Cropper and Morel 2021; Purgał,
Cerna, and Kaliszyk 2022), which prevents them from learn-
ing in infinite domains. MAGICPOPPER (Hocquette and Crop-
per 2022) can identify constant symbols from infinite do-
mains. Similar to ALEPH’s lazy evaluation approach and