A Solver-Free Framework for Scalable Learning in Neural ILP Architectures Yatin Nandwani Rishabh Ranjan Mausam Parag Singla

2025-04-30 2 0 480.18KB 19 页 10玖币
侵权投诉
A Solver-Free Framework for Scalable Learning in
Neural ILP Architectures
Yatin Nandwani
, Rishabh Ranjan
, Mausam & Parag Singla
Department of Computer Science, Indian Institute of Technology Delhi, INDIA
{yatin.nandwani, rishabh.ranjan.cs118, mausam, parags}@cse.iitd.ac.in
Abstract
There is a recent focus on designing architectures that have an Integer Linear
Programming (ILP) layer following a neural model (referred to as Neural ILP
in this paper). Neural ILP architectures are suitable for pure reasoning tasks
that require data-driven constraint learning or for tasks requiring both perception
(neural) and reasoning (ILP). A recent SOTA approach for end-to-end training of
Neural ILP explicitly defines gradients through the ILP black box [Paulus et al.,
2021] – this trains extremely slowly, owing to a call to the underlying ILP solver
for every training data point in a minibatch. In response, we present an alternative
training strategy that is solver-free, i.e., does not call the ILP solver at all at training
time. Neural ILP has a set of trainable hyperplanes (for cost and constraints in ILP),
together representing a polyhedron. Our key idea is that the training loss should
impose that the final polyhedron separates the positives (all constraints satisfied)
from the negatives (at least one violated constraint or a suboptimal cost value),
via a soft-margin formulation. While positive example(s) are provided as part
of the training data, we devise novel techniques for generating negative samples.
Our solution is flexible enough to handle equality as well as inequality constraints.
Experiments on several problems, both perceptual as well as symbolic, which
require learning the constraints of an ILP, show that our approach has superior
performance and scales much better compared to purely neural baselines and other
state-of-the-art models that require solver-based training. In particular, we are able
to obtain excellent performance on
9×9
symbolic and visual sudoku, to which the
other Neural ILP model is not able to scale. 2
1 Introduction
There has been a growing interest in the community which focuses on developing neural models
for solving combinatorial optimization problems. These problems often require complex reasoning
over discrete symbols. Many of these problems can be expressed in the form an underlying Integer
Linear Program (ILP). Two different kinds of problem (input) settings have been considered in the
literature: (a) purely symbolic and (b) combination of perceptual and symbolic. Solving an n-Queens
problem given the partial assignment of queens on the board as input would be an example of the
former, and solving a sudoku puzzle given the image of a partially filled board as input would be an
example of the latter. While the first setting corresponds to a pure reasoning task, second involves a
combination of perception and reasoning tasks which need to be solved in a joint fashion. Existing
literature proposes various approaches to handle one or both these settings. One line of work proposes
purely neural models [Palm et al., 2018, Nandwani et al., 2021, Dong et al., 2019] for solving these
tasks representing the underlying constraints and costs implicitly. While standard CNNs are used to
Equal contribution. Work done while at IIT Delhi. Current email: rishabhr@andrew.cmu.edu
2Code available at: https://github.com/dair-iitd/ilploss
36th Conference on Neural Information Processing Systems (NeurIPS 2022).
arXiv:2210.09082v2 [cs.LG] 13 Jan 2023
solve the perceptual task, neural models such as Graph Neural Networks (GNNs) take care of the
reasoning component. In an alternate view, one may want to solve these tasks by explicitly learning
the constraints and cost of the underlying ILP. While the perceptual reasoning would still be handled
by using modules such as CNN, reasoning is taken care of by having an explicit representation in the
form of an ILP layer representing constraints and costs. Such an approach would have the potential
advantage of being more interpretable, and also being more accurate if the underlying constraints
could be learned in a precise manner. Some recent approaches take this route, and include works
by [Paulus et al., 2021, Pogancic et al., 2020, Berthet et al., 2020]. We refer to the latter set of
approaches as Neural ILP architectures. 3
Learning in Neural ILP architectures is complicated by the fact that there is a discrete optimization (in
the form of an ILP layer) at the end of the network, which is typically non-differentiable, making the
end-to-end learning of the system difficult. One of the possible ways is to instead use an iterative ILP
solving algorithm such as a cutting-plane method [Gomory, 2010] that uses a continuous relaxation
in each iteration which is shown to be differentiable due to the introduction of continuous variables
[Ferber et al., 2020, Wilder et al., 2019]. Most of these works are concerned with learning only the
cost function and assume that the constraints are given. Recent work by Paulus et al. [2021] has
proposed an approach to directly pass the gradients through a black-box ILP solver. Specifically, they
rely on Euclidean distances between the constraint hyperplanes and the current solution obtained
by the ILP solver to produce gradients for backprop. Though this approach improves the quality of
results compared to earlier works involving continuous relaxations, scalability gets severely affected
since an ILP has to be solved at every learning iteration, making the training extremely slow.
We are interested in answering the following question: is there a way to train a neural ILP architecture
in an end-to-end manner, which does not require access to an underlying solver? Such an approach,
if exists, could presumably result in significantly faster training times, resulting in scalability. In
response, we propose a novel technique to back-propagate through the learnable constraints as well
as the learnable cost function of an unknown Integer Linear Program (ILP). During training, our
technique doesn’t solve the ILP to compute the gradients. Instead, we cast the learning of ILP
constraints (and cost) as learning of a polyhedron, consisting of a set of hyperplanes, such that points
inside the polyhedron are treated as positive, and points outside as negative. While a positive point
needs to be classified as positive by each of the hyperplanes, a negative point needs to be classified as
negative only by at least one of the hyperplanes. Our formulation incorporates the learning of ILP
cost also as learning of one of the hyperplanes in the system. We formulate a novel margin-based
loss to learn these hyperplanes in a joint fashion. A covariance based regularizer that minimizes the
cosine similarity between all pairs of hyperplanes ensures that learned constraints (hyperplanes) are
not redundant. Since the training data comes only with positive examples, i.e., solutions to respective
optimization problems, we develop several techniques for sampling the negatives, each of which is
central to the effective learning of our hyperplanes in our formulation.
We present several experiments on problems which require learning of ILP constraints and cost, with
both symbolic as well as perceptual input. These include solving a symbolic sudoku as well as visual
sudoku in which we are given an image of a partially filled board [Wang et al., 2019] (perceptual);
ILPs with random constraints (symbolic), Knapsack from sentence description (perceptual) and key-
point matching (perceptual) from [Paulus et al., 2021]. Our closest competitor, CombOptNet [Paulus
et al., 2021], can not solve even the smallest of the sudoku boards sized
4×4
, whereas we can easily
scale to
9×9
, getting close to 100% accuracy. We are slightly better on key-point matching, and
obtain significantly better accuracy on random ILPs and knapsack, especially on large problem sizes.
We also outperform purely neural baselines (wherever applicable).
2 Related Work
When the constraints of an ILP are known, many works propose converting such constraints over
discrete output variables into equivalent constraints over their probabilities given by the model and
backpropagate over them [Nandwani et al., 2019, Li and Srikumar, 2019]. Some works use an ILP
solver with the known constraints during inference to improve the predictions [Ning et al., 2019,
Han et al., 2019]. When constraints have to be learned, which is the focus of this paper, one line of
work uses a neural model, such as a GNN, to replace a combinatorial solver altogether and implicitly
3
Although these methods can also train a Neural-ILP-Neural architecture, studying this is beyond our scope.
2
encodes the constraints in model’s weights and learns them from data. Nandwani et al. [2022, 2021],
Palm et al. [2018] train a Recurrent Relational Network for learning to solve puzzles like sudoku,
graph coloring, Futoshiki, etc; Dong et al. [2019] propose Neural Logic Machines (NLMs) that learn
lifted first order rules and experiment with blocks world, reasoning on family trees etc; Ranjan et al.
[2022] use a siamese GNN architecture to learn the combinatorial problems of graph and subgraph
distance computation; Bajpai et al. [2018], Garg et al. [2019, 2020], Sharma et al. [2022] use GNNs
to train probabilistic planners for combinatorial domains with large, discrete state and action spaces;
Selsam et al. [2019], Amizadeh et al. [2019a,b] train a GNN for solving any CSP when constraints
are explicitly given in a standard form such as CNF, DNF or Boolean Circuits. This is clearly different
from us since we are interested in explicitly learning the constraints of the optimization problem.
Inverse Optimization [Chan et al., 2021] aims to learn the constraints (cost) of a linear program (LP)
from the observed optimal decisions. Recently [Tan et al., 2019, 2020] use the notion of Parameterized
Linear Programs (PLP) in which both the cost and the constraints of an LP are parameterized by
unknown weights. [Gould et al., 2019] show how to differentiate through continuous constrained
optimization problem using the notion of a ‘declarative node’ that uses implicit function theorem
[Mingari Scarpello and Ritelli, 2002]. Similar to this are other works [Amos and Kolter, 2017,
Agrawal et al., 2019] that define methods for differentiating through other continuous problems like
convex quadratic programs (QPs) or cone programs. While all these techniques are concerned with
learning the constraints (cost) for optimization problems defined over continuous variables, our focus
is on learning constraints (cost) for an ILP which involves optimization over discrete variables and
can be a significantly harder problem.
In the space of learning cost for an ILP, several approaches have been proposed recently. [Pogancic
et al., 2020] give the gradient w.r.t. linear cost function that is optimized over a given discrete space.
[Rolínek et al., 2020a,b] exploit it for the task of Deep Graph Matching, and for differentiating
through rank based metrics such as Average Precision/Recall respectively. [Berthet et al., 2020]
replace the black-box discrete solver with its smooth perturbed version during training and exploit
‘perturb-and-MAP’ [Papandreou and Yuille, 2011] to compute the gradients. On similar lines, [Niepert
et al., 2021] also exploit perturb-and-MAP but propose a different noise model for perturbation and
also extends to the case when the output of the solver may feed into a downstream neural model.
These methods assume the constraints to be given, and only learn the cost for an ILP.
Other methods use relaxations of specific algorithms for the task of learning the constraints and cost
for an ILP. Ferber et al. [2020] backprop through the KKT conditions of the LP relaxation created
iteratively while using cutting-plane method [Gomory, 2010] for solving an MILP; Wilder et al.
[2019] add a quadratic penalty term to the continuous relaxation and use implicit function theorem to
backprop through the KKT conditions, as done in Amos and Kolter [2017] for smooth programs;
Mandi and Guns [2020] instead add a log-barrier term to get the LP relaxation and differentiate
through its homogeneous self-dual formulation linking it to the iterative steps in the interior point
method. Wang et al. [2019] uses a low rank SDP relaxation of MAXSAT and defines how the
gradients flow during backpropagation. Presumably, these approaches are limited in their application
since they rely on specific algorithmic relaxations.
Instead of working with a specific algorithm, some recent works differentiate through a black-box
combinatorial solver for the task of learning constraints [Paulus et al., 2021]. The solver is called at
every learning step to compute the gradient. This becomes a bottleneck during training, since the
constraints are being learned on the go, and the problem could become ill-behaved resulting in a
larger solver time, severely limiting scalability of such approaches. In contrast, we would like to
perform this task in a solver-free manner. Pan et al. [2020] propose learning of constraints for the task
of structured prediction. They represent the linear constraints compactly using a specific two layer
Rectifier Network [Pan and Srikumar, 2016]. A significant limitation is that their method can only be
used for learning constraints, and not costs. Further, their approach does not do well experimentally
on our benchmark domains. Meng and Chang [2021] propose a non-learning based approach for
mining constraints in which for each training data
(c,y)
, a new constraint is added which essentially
imply that no other target can have better cost than the given target
y
for the corresponding cost
c
. We
are specifically interested in learning not only the constraints, but also the cost of an ILP from data.
Convex Polytope Machines (CPM) [Kantchelian et al., 2014] learns a non linear binary classifier
in the form of a polytope from a dataset of positive and negative examples. In contrast, our goal is
to learn the cost as well as constraints for an ILP where both the constraints and the cost could be
parameterized by another neural network. Also, we do not have access to any negative samples.
3
3 Differentiable ILP Loss
3.1 Background and Task Description
We are interested in learning how to solve combinatorial optimization problems that can be expressed
as an Integer Linear Program (ILP) with equality as well as inequality constraints:
arg min
zZn
cTzsubject to Uz =v;Gz +h0(1)
Here
cRn
represents an
n
dimensional cost vector, the matrix
URm1×n
and
vRm1
represent
m1
linear equality constraints, and the matrix
GRm2×n
and
hRm2
represent
m2
linear inequality constraints, together defining the feasible region.
Without loss of generality, one can replace an equality constraint
uT
iz=vi
by two inequality
constraints:
uT
izvi
and
uT
iz≥ −vi
. Here
ui
and
vi
represent the
ith
row of the matrix
U
and
ith
element of the vector
v
respectively. Using this, one can reduce ILP in eq. (1) to an equivalent
form with just inequality constraints:
arg min
zZn
cTzsubject to Az +b0(2)
Here matrix
ARm×n
is row-wise concatenation of the matrices
G,U
and
U
. Vector
bRm
is a row-wise concatenation of vectors
h,v
and
v
, and
m= 2m1+m2
is the total number
of inequality constraints. We represent the
ith
constraint
aT
iz+bi0
, as
[ai|bi]
. The integer
constraints zZnmake the problem NP hard.
Neural ILP Architecture:
In a neural ILP architecture, the constraint matrix
A
, vector
b
, and
the cost vector
c
are neural functions
fA, fb,
and
fc
(of input
x
) parameterized by learnable
Θ =
(θA, θb, θc)
,
i.e.
,
A=fA(x;θA)
;
b=fb(x;θb)
;
c=fc(x;θc)
. This results in a different ILP
for each input
x
. For an input
x
, the solution given by a neural ILP model,
MΘ(x)
, is nothing but
the optima of the following paramterized ILP where
A,b,c
are replaced by corresponding neural
functions fA, fb, fcevaluated at input x:
MΘ(x) = arg min
z
(fc(x;θc))Tzsubject to fA(x;θA)z+fb(x;θb)0,zZn(3)
Example of visual-sudoku:
In
k×k
visual-sudoku, the input
x
is an image of a sudoku puzzle and
y∈ {0,1}n
is the corresponding solution, represented as a
n=k3
dimensional binary vector: the
integer in each of the
k2
cells is represented by a
k
dimensional one-hot binary vector. Function
fc(x;θc)
parameterizing the cost is nothing but a neural digit classifier that classifies the content of
each of the
k2
cells into one of the
k
classes. The neural functions
fA
and
fb
are independent of the
input
x
as the constraints are the same for every
k×k
sudoku puzzle. Therefore,
fA(x;θA) = θA
,
and
fb(x;θb) = θb
, where
θA
is just a learnable matrix of dimension
m×n
and
θb
is a learnable
vector of dimension
m
. See Bartlett et al. [2008] for an ILP formulation of
k×k
sudoku with
4k2
equality constraints in a n=k3dimensional binary space {0,1}n.
Learning Task:
Given a training dataset
D={(xs,y
s)|s∈ {1. . . S}}
with
S
training samples,
the task is to learn the parameters
Θ = (θA, θb, θc)
, such that
y
s=MΘ(xs)
for each
s
. To do so,
one needs to define derivatives
L
A,L
b,
and
L
cw.r.t. A,b,
and
c
respectively, of an appropriate loss
function
L
. Once such a derivative is defined, one can easily compute derivatives
w.r.t.θA, θb,
and
θc
using the chain rule:
L
θA=L
A
fA(x;θA)
θA
;
L
θb=L
b
fb(x;θb)
θb
; and
L
θc=L
c
fA(x;θc)
θc
. Hence,
in the formulation below, we only worry about computing gradients
w.r.t.
the constraint matrix
A
,
vector b, and the cost vector c.
The existing approaches,
e.g.
, [Paulus et al., 2021], explicitly need access to current model prediction
MΘ(x)
. This requires solving the ILP in eq. (3), making the learning process extremely slow.
In contrast, we present a ‘solver-free’ framework for computation of an appropriate loss and its
derivatives
w.r.t. A,b,
and
c
. Our framework does not require solving any ILP while training, thereby
making it extremely scalable as compared to existing approaches.
3.2 A Solver-free Framework
Conversion to a constraint satisfaction problem:
As a first step, we convert the constraint op-
timization problem in eq. (2) to an equivalent constraint satisfaction problem by introducing an
4
摘要:

ASolver-FreeFrameworkforScalableLearninginNeuralILPArchitecturesYatinNandwani,RishabhRanjan,Mausam&ParagSinglaDepartmentofComputerScience,IndianInstituteofTechnologyDelhi,INDIA{yatin.nandwani,rishabh.ranjan.cs118,mausam,parags}@cse.iitd.ac.inAbstractThereisarecentfocusondesigningarchitecturesthath...

展开>> 收起<<
A Solver-Free Framework for Scalable Learning in Neural ILP Architectures Yatin Nandwani Rishabh Ranjan Mausam Parag Singla.pdf

共19页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:19 页 大小:480.18KB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 19
客服
关注