
(1) instantiate xby head symbol t
with arity 2and zby gof arity 1
(going from level0to level1)
(2) instantiate x1, x2, z1by
constants c,c, and e, respectively
(going from level1to level2)
∀x z P(f(x,z) )
∀x1x2z1P(f(t(x1,x2),g(z1) ) )
P(f(t(c,c),g(e) ) )
t/2g/1
c/0c/0e/0
Figure 1: Term instantiation through incremental deepening. In the figure, there are two instantiation steps, one after the other.
2 Ground Solver
There are several ways how to combine instantiation of
clausal first-order problems with decidable and efficient
(un)satisfiability checking of their proposed ground in-
stances. The most direct approach (used in instantiation-
based ATPs such as iProver (Korovin 2008)) is to explic-
itly add axioms for equality, allowing their instantiation
as for any other axioms, and directly use SAT solvers for
the ground checking. An alternative approach is to avoid
explicit addition of the equality axioms, and instead use
combinations of SAT solvers with ground congruence clo-
sure (Detlefs, Nelson, and Saxe 2005; Nieuwenhuis and
Oliveras 2007).
We have explored both approaches and ultimately decided
to use the latter in this work. The main reason is that the
combinations of SAT solvers with ground congruence clo-
sure are today very efficiently implemented (Barrett et al.
2021), posing practically no issues even with thousands of
instances. Using the most direct approach would, on the
other hand, require a large number of additional instances of
the equality axioms to successfully solve the ground prob-
lems. In our preliminary measurements, the average ratio of
such necessary additional instances was over 40%, which
would exponentially decrease the chance of randomly pre-
dicting the right set of instances.
In more detail, we use as our ground solver an effi-
cient combination of a SAT solver with ground congru-
ence closure provided in the Vampire automated theorem
prover (Kov´
acs and Voronkov 2013). Here, the SAT solver
abstracts first-order logic atoms as propositional variables
and starts producing satisfying assignments (models) of this
abstraction, which are then checked against the properties
of equality (reflexivity, transitivity, congruence, symmetry).
This process terminates when a model is found that satisfies
the equality properties, or when the SAT solver runs out of
models to try. In that case, the original problem is unsatisfi-
able, which means that the ground instances were proposed
correctly by the ML system.
3 Dataset of Mathematical Problems
We construct a corpus of instantiations by repeatedly run-
ning a randomized grounding procedure (Section 3.1) on
113 332 first-order ATP problems made available to us
by the AI4REASON project.1They originate from the
Mizar Mathematical Library (MML) (Kaliszyk and Urban
2015) and are exported to first-order logic by the MPTP
system (Urban 2006). All these problems have an ATP
1https://github.com/ai4reason/ATP Proofs
proof (in general in a high time limit) found by either
the E/ENIGMA (Schulz 2013; Jakub˚
uv et al. 2020) or
Vampire/Deepire (Kov´
acs and Voronkov 2013; Suda 2021)
systems. Additionally, the problems’ premises have been
pseudo-minimized (Kaliszyk and Urban 2014) by iterated
Vampire runs. We use the pseudo-minimized versions be-
cause our focus here is on guiding instantiation rather than
premise selection. The problems come from 38 108 prob-
lem families, where each problem family corresponds to
one original Mizar theorem. Each of these theorems can
have multiple minimized ATP proofs using different sets of
premises. The problems range from easier to challenging
ones, across various mathematical fields such as topology,
set theory, logic, algebra and linear algebra, real, complex
and multivariate analysis, trigonometry, number and graph
theory, etc.
We first clausify the problems by E, and then repeatedly
run the randomized grounding procedure on all of them, fol-
lowed by the ground solver using a 30s time limit. The first
run solves 3897 of the problems, growing to 7790 for the
union of the first 9 runs, and to 11675 for the union of the
first 100 runs. The 113332 problems have on average 35.6
input clauses and the 3897 problems solved in the first run
have on average 12.8 input clauses. 6.0 instances are needed
on average to solve a problem.2
Note that each clause can in general be instantiated more
than once. Also, 3.9 (almost two thirds) of the instances con-
tain on average at least one skolem symbol.
This means that we strongly need a learning architec-
ture invariant under symbol renamings, rather than off-the-
shelf architectures (e.g., transformers) that depend on fixed
consistent naming. This motivates our use of a property-
invariant graph neural architecture (Section 4).
3.1 Randomized Grounding Procedure
Randomized grounding can be parameterized in various
ways. To develop the initial dataset here we use a sim-
ple multi-pass randomized grounding with settings that
roughly correspond to our ML-guided instantiation archi-
tecture (Section 4). These settings are as follows. We use
at most two passes (levels) of instantiation for every input
clause. In the first pass, for each variable we randomly se-
lect an arbitrary function symbol from the problem’s signa-
ture, and provide it (if non-constant) with fresh variables as
arguments. In the second pass, we ground all variables with
randomly selected constants from the problem’s signature.
The first pass is repeated 25 times for each clause in its in-
2Over 40 instances (max. 63) are used in some problems