A Solver-Free Framework for Scalable Learning in Neural ILP Architectures Yatin Nandwani Rishabh Ranjan Mausam Parag Singla

2025-04-30 2 0 480.18KB 19 页 10玖币

侵权投诉

A Solver-Free Framework for Scalable Learning in

Neural ILP Architectures

Yatin Nandwani∗

, Rishabh Ranjan∗

, Mausam & Parag Singla

Department of Computer Science, Indian Institute of Technology Delhi, INDIA

{yatin.nandwani, rishabh.ranjan.cs118, mausam, parags}@cse.iitd.ac.in

Abstract

There is a recent focus on designing architectures that have an Integer Linear

Programming (ILP) layer following a neural model (referred to as Neural ILP

in this paper). Neural ILP architectures are suitable for pure reasoning tasks

that require data-driven constraint learning or for tasks requiring both perception

(neural) and reasoning (ILP). A recent SOTA approach for end-to-end training of

Neural ILP explicitly deﬁnes gradients through the ILP black box [Paulus et al.,

2021] – this trains extremely slowly, owing to a call to the underlying ILP solver

for every training data point in a minibatch. In response, we present an alternative

training strategy that is solver-free, i.e., does not call the ILP solver at all at training

time. Neural ILP has a set of trainable hyperplanes (for cost and constraints in ILP),

together representing a polyhedron. Our key idea is that the training loss should

impose that the ﬁnal polyhedron separates the positives (all constraints satisﬁed)

from the negatives (at least one violated constraint or a suboptimal cost value),

via a soft-margin formulation. While positive example(s) are provided as part

of the training data, we devise novel techniques for generating negative samples.

Our solution is ﬂexible enough to handle equality as well as inequality constraints.

Experiments on several problems, both perceptual as well as symbolic, which

require learning the constraints of an ILP, show that our approach has superior

performance and scales much better compared to purely neural baselines and other

state-of-the-art models that require solver-based training. In particular, we are able

to obtain excellent performance on

9×9

symbolic and visual sudoku, to which the

other Neural ILP model is not able to scale. 2

1 Introduction

There has been a growing interest in the community which focuses on developing neural models

for solving combinatorial optimization problems. These problems often require complex reasoning

over discrete symbols. Many of these problems can be expressed in the form an underlying Integer

Linear Program (ILP). Two different kinds of problem (input) settings have been considered in the

literature: (a) purely symbolic and (b) combination of perceptual and symbolic. Solving an n-Queens

problem given the partial assignment of queens on the board as input would be an example of the

former, and solving a sudoku puzzle given the image of a partially ﬁlled board as input would be an

example of the latter. While the ﬁrst setting corresponds to a pure reasoning task, second involves a

combination of perception and reasoning tasks which need to be solved in a joint fashion. Existing

literature proposes various approaches to handle one or both these settings. One line of work proposes

purely neural models [Palm et al., 2018, Nandwani et al., 2021, Dong et al., 2019] for solving these

tasks representing the underlying constraints and costs implicitly. While standard CNNs are used to

∗Equal contribution. Work done while at IIT Delhi. Current email: rishabhr@andrew.cmu.edu

2Code available at: https://github.com/dair-iitd/ilploss

36th Conference on Neural Information Processing Systems (NeurIPS 2022).

arXiv:2210.09082v2 [cs.LG] 13 Jan 2023

solve the perceptual task, neural models such as Graph Neural Networks (GNNs) take care of the

reasoning component. In an alternate view, one may want to solve these tasks by explicitly learning

the constraints and cost of the underlying ILP. While the perceptual reasoning would still be handled

by using modules such as CNN, reasoning is taken care of by having an explicit representation in the

form of an ILP layer representing constraints and costs. Such an approach would have the potential

advantage of being more interpretable, and also being more accurate if the underlying constraints

could be learned in a precise manner. Some recent approaches take this route, and include works

by [Paulus et al., 2021, Pogancic et al., 2020, Berthet et al., 2020]. We refer to the latter set of

approaches as Neural ILP architectures. 3

Learning in Neural ILP architectures is complicated by the fact that there is a discrete optimization (in

the form of an ILP layer) at the end of the network, which is typically non-differentiable, making the

end-to-end learning of the system difﬁcult. One of the possible ways is to instead use an iterative ILP

solving algorithm such as a cutting-plane method [Gomory, 2010] that uses a continuous relaxation

in each iteration which is shown to be differentiable due to the introduction of continuous variables

[Ferber et al., 2020, Wilder et al., 2019]. Most of these works are concerned with learning only the

cost function and assume that the constraints are given. Recent work by Paulus et al. [2021] has

proposed an approach to directly pass the gradients through a black-box ILP solver. Speciﬁcally, they

rely on Euclidean distances between the constraint hyperplanes and the current solution obtained

by the ILP solver to produce gradients for backprop. Though this approach improves the quality of

results compared to earlier works involving continuous relaxations, scalability gets severely affected

since an ILP has to be solved at every learning iteration, making the training extremely slow.

We are interested in answering the following question: is there a way to train a neural ILP architecture

in an end-to-end manner, which does not require access to an underlying solver? Such an approach,

if exists, could presumably result in signiﬁcantly faster training times, resulting in scalability. In

response, we propose a novel technique to back-propagate through the learnable constraints as well

as the learnable cost function of an unknown Integer Linear Program (ILP). During training, our

technique doesn’t solve the ILP to compute the gradients. Instead, we cast the learning of ILP

constraints (and cost) as learning of a polyhedron, consisting of a set of hyperplanes, such that points

inside the polyhedron are treated as positive, and points outside as negative. While a positive point

needs to be classiﬁed as positive by each of the hyperplanes, a negative point needs to be classiﬁed as

negative only by at least one of the hyperplanes. Our formulation incorporates the learning of ILP

cost also as learning of one of the hyperplanes in the system. We formulate a novel margin-based

loss to learn these hyperplanes in a joint fashion. A covariance based regularizer that minimizes the

cosine similarity between all pairs of hyperplanes ensures that learned constraints (hyperplanes) are

not redundant. Since the training data comes only with positive examples, i.e., solutions to respective

optimization problems, we develop several techniques for sampling the negatives, each of which is

central to the effective learning of our hyperplanes in our formulation.

We present several experiments on problems which require learning of ILP constraints and cost, with

both symbolic as well as perceptual input. These include solving a symbolic sudoku as well as visual

sudoku in which we are given an image of a partially ﬁlled board [Wang et al., 2019] (perceptual);

ILPs with random constraints (symbolic), Knapsack from sentence description (perceptual) and key-

point matching (perceptual) from [Paulus et al., 2021]. Our closest competitor, CombOptNet [Paulus

et al., 2021], can not solve even the smallest of the sudoku boards sized

4×4

, whereas we can easily

scale to

9×9

, getting close to 100% accuracy. We are slightly better on key-point matching, and

obtain signiﬁcantly better accuracy on random ILPs and knapsack, especially on large problem sizes.

We also outperform purely neural baselines (wherever applicable).

2 Related Work

When the constraints of an ILP are known, many works propose converting such constraints over

discrete output variables into equivalent constraints over their probabilities given by the model and

backpropagate over them [Nandwani et al., 2019, Li and Srikumar, 2019]. Some works use an ILP

solver with the known constraints during inference to improve the predictions [Ning et al., 2019,

Han et al., 2019]. When constraints have to be learned, which is the focus of this paper, one line of

work uses a neural model, such as a GNN, to replace a combinatorial solver altogether and implicitly

Although these methods can also train a Neural-ILP-Neural architecture, studying this is beyond our scope.

encodes the constraints in model’s weights and learns them from data. Nandwani et al. [2022, 2021],

Palm et al. [2018] train a Recurrent Relational Network for learning to solve puzzles like sudoku,

graph coloring, Futoshiki, etc; Dong et al. [2019] propose Neural Logic Machines (NLMs) that learn

lifted ﬁrst order rules and experiment with blocks world, reasoning on family trees etc; Ranjan et al.

[2022] use a siamese GNN architecture to learn the combinatorial problems of graph and subgraph

distance computation; Bajpai et al. [2018], Garg et al. [2019, 2020], Sharma et al. [2022] use GNNs

to train probabilistic planners for combinatorial domains with large, discrete state and action spaces;

Selsam et al. [2019], Amizadeh et al. [2019a,b] train a GNN for solving any CSP when constraints

are explicitly given in a standard form such as CNF, DNF or Boolean Circuits. This is clearly different

from us since we are interested in explicitly learning the constraints of the optimization problem.

Inverse Optimization [Chan et al., 2021] aims to learn the constraints (cost) of a linear program (LP)

from the observed optimal decisions. Recently [Tan et al., 2019, 2020] use the notion of Parameterized

Linear Programs (PLP) in which both the cost and the constraints of an LP are parameterized by

unknown weights. [Gould et al., 2019] show how to differentiate through continuous constrained

optimization problem using the notion of a ‘declarative node’ that uses implicit function theorem

[Mingari Scarpello and Ritelli, 2002]. Similar to this are other works [Amos and Kolter, 2017,

Agrawal et al., 2019] that deﬁne methods for differentiating through other continuous problems like

convex quadratic programs (QPs) or cone programs. While all these techniques are concerned with

learning the constraints (cost) for optimization problems deﬁned over continuous variables, our focus

is on learning constraints (cost) for an ILP which involves optimization over discrete variables and

can be a signiﬁcantly harder problem.

In the space of learning cost for an ILP, several approaches have been proposed recently. [Pogancic

et al., 2020] give the gradient w.r.t. linear cost function that is optimized over a given discrete space.

[Rolínek et al., 2020a,b] exploit it for the task of Deep Graph Matching, and for differentiating

through rank based metrics such as Average Precision/Recall respectively. [Berthet et al., 2020]

replace the black-box discrete solver with its smooth perturbed version during training and exploit

‘perturb-and-MAP’ [Papandreou and Yuille, 2011] to compute the gradients. On similar lines, [Niepert

et al., 2021] also exploit perturb-and-MAP but propose a different noise model for perturbation and

also extends to the case when the output of the solver may feed into a downstream neural model.

These methods assume the constraints to be given, and only learn the cost for an ILP.

Other methods use relaxations of speciﬁc algorithms for the task of learning the constraints and cost

for an ILP. Ferber et al. [2020] backprop through the KKT conditions of the LP relaxation created

iteratively while using cutting-plane method [Gomory, 2010] for solving an MILP; Wilder et al.

[2019] add a quadratic penalty term to the continuous relaxation and use implicit function theorem to

backprop through the KKT conditions, as done in Amos and Kolter [2017] for smooth programs;

Mandi and Guns [2020] instead add a log-barrier term to get the LP relaxation and differentiate

through its homogeneous self-dual formulation linking it to the iterative steps in the interior point

method. Wang et al. [2019] uses a low rank SDP relaxation of MAXSAT and deﬁnes how the

gradients ﬂow during backpropagation. Presumably, these approaches are limited in their application

since they rely on speciﬁc algorithmic relaxations.

Instead of working with a speciﬁc algorithm, some recent works differentiate through a black-box

combinatorial solver for the task of learning constraints [Paulus et al., 2021]. The solver is called at

every learning step to compute the gradient. This becomes a bottleneck during training, since the

constraints are being learned on the go, and the problem could become ill-behaved resulting in a

larger solver time, severely limiting scalability of such approaches. In contrast, we would like to

perform this task in a solver-free manner. Pan et al. [2020] propose learning of constraints for the task

of structured prediction. They represent the linear constraints compactly using a speciﬁc two layer

Rectiﬁer Network [Pan and Srikumar, 2016]. A signiﬁcant limitation is that their method can only be

used for learning constraints, and not costs. Further, their approach does not do well experimentally

on our benchmark domains. Meng and Chang [2021] propose a non-learning based approach for

mining constraints in which for each training data

(c,y)

, a new constraint is added which essentially

imply that no other target can have better cost than the given target

for the corresponding cost

. We

are speciﬁcally interested in learning not only the constraints, but also the cost of an ILP from data.

Convex Polytope Machines (CPM) [Kantchelian et al., 2014] learns a non linear binary classiﬁer

in the form of a polytope from a dataset of positive and negative examples. In contrast, our goal is

to learn the cost as well as constraints for an ILP where both the constraints and the cost could be

parameterized by another neural network. Also, we do not have access to any negative samples.

3 Differentiable ILP Loss

3.1 Background and Task Description

We are interested in learning how to solve combinatorial optimization problems that can be expressed

as an Integer Linear Program (ILP) with equality as well as inequality constraints:

arg min

z∈Zn

cTzsubject to Uz =v;Gz +h≥0(1)

Here

c∈Rn

represents an

dimensional cost vector, the matrix

U∈Rm1×n

and

v∈Rm1

represent

linear equality constraints, and the matrix

G∈Rm2×n

and

h∈Rm2

represent

linear inequality constraints, together deﬁning the feasible region.

Without loss of generality, one can replace an equality constraint

iz=vi

by two inequality

constraints:

iz≥vi

and

−uT

iz≥ −vi

. Here

and

represent the

ith

row of the matrix

and

ith

element of the vector

respectively. Using this, one can reduce ILP in eq. (1) to an equivalent

form with just inequality constraints:

arg min

z∈Zn

cTzsubject to Az +b≥0(2)

Here matrix

A∈Rm×n

is row-wise concatenation of the matrices

G,U

and

−U

. Vector

b∈Rm

is a row-wise concatenation of vectors

h,−v

and

, and

m= 2m1+m2

is the total number

of inequality constraints. We represent the

ith

constraint

iz+bi≥0

, as

[ai|bi]

. The integer

constraints z∈Znmake the problem NP hard.

Neural ILP Architecture:

In a neural ILP architecture, the constraint matrix

, vector

, and

the cost vector

are neural functions

fA, fb,

and

(of input

) parameterized by learnable

Θ =

(θA, θb, θc)

i.e.

A=fA(x;θA)

;

b=fb(x;θb)

;

c=fc(x;θc)

. This results in a different ILP

for each input

. For an input

, the solution given by a neural ILP model,

MΘ(x)

, is nothing but

the optima of the following paramterized ILP where

A,b,c

are replaced by corresponding neural

functions fA, fb, fcevaluated at input x:

MΘ(x) = arg min

(fc(x;θc))Tzsubject to fA(x;θA)z+fb(x;θb)≥0,z∈Zn(3)

Example of visual-sudoku:

k×k

visual-sudoku, the input

is an image of a sudoku puzzle and

y∗∈ {0,1}n

is the corresponding solution, represented as a

n=k3

dimensional binary vector: the

integer in each of the

cells is represented by a

dimensional one-hot binary vector. Function

fc(x;θc)

parameterizing the cost is nothing but a neural digit classiﬁer that classiﬁes the content of

each of the

cells into one of the

classes. The neural functions

and

are independent of the

input

as the constraints are the same for every

k×k

sudoku puzzle. Therefore,

fA(x;θA) = θA

and

fb(x;θb) = θb

, where

θA

is just a learnable matrix of dimension

m×n

and

θb

is a learnable

vector of dimension

. See Bartlett et al. [2008] for an ILP formulation of

k×k

sudoku with

4k2

equality constraints in a n=k3dimensional binary space {0,1}n.

Learning Task:

Given a training dataset

D={(xs,y∗

s)|s∈ {1. . . S}}

with

training samples,

the task is to learn the parameters

Θ = (θA, θb, θc)

, such that

y∗

s=MΘ(xs)

for each

. To do so,

one needs to deﬁne derivatives

∂L

∂A,∂L

∂b,

and

∂L

∂cw.r.t. A,b,

and

respectively, of an appropriate loss

function

. Once such a derivative is deﬁned, one can easily compute derivatives

w.r.t.θA, θb,

and

θc

using the chain rule:

∂L

∂θA=∂L

∂A

∂fA(x;θA)

∂θA

;

∂L

∂θb=∂L

∂b

∂fb(x;θb)

∂θb

; and

∂L

∂θc=∂L

∂c

∂fA(x;θc)

∂θc

. Hence,

in the formulation below, we only worry about computing gradients

w.r.t.

the constraint matrix

vector b, and the cost vector c.

The existing approaches,

e.g.

, [Paulus et al., 2021], explicitly need access to current model prediction

MΘ(x)

. This requires solving the ILP in eq. (3), making the learning process extremely slow.

In contrast, we present a ‘solver-free’ framework for computation of an appropriate loss and its

derivatives

w.r.t. A,b,

and

. Our framework does not require solving any ILP while training, thereby

making it extremely scalable as compared to existing approaches.

3.2 A Solver-free Framework

Conversion to a constraint satisfaction problem:

As a ﬁrst step, we convert the constraint op-

timization problem in eq. (2) to an equivalent constraint satisfaction problem by introducing an

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ASolver-FreeFrameworkforScalableLearninginNeuralILPArchitecturesYatinNandwani,RishabhRanjan,Mausam&ParagSinglaDepartmentofComputerScience,IndianInstituteofTechnologyDelhi,INDIA{yatin.nandwani,rishabh.ranjan.cs118,mausam,parags}@cse.iitd.ac.inAbstractThereisarecentfocusondesigningarchitecturesthath...

展开>> 收起<<

A Solver-Free Framework for Scalable Learning in Neural ILP Architectures Yatin Nandwani Rishabh Ranjan Mausam Parag Singla.pdf

共19页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

A Solver-Free Framework for Scalable Learning in Neural ILP Architectures Yatin Nandwani Rishabh Ranjan Mausam Parag Singla

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: