ON REPRESENTING MIXED-INTEGER LINEAR PROGRAMS BY GRAPH NEURAL NETWORKS ZIANG CHEN JIALIN LIU XINSHANG WANG JIANFENG LU AND WOTAO YIN

2025-05-02 0 0 706.59KB 31 页 10玖币
侵权投诉
ON REPRESENTING MIXED-INTEGER LINEAR PROGRAMS BY
GRAPH NEURAL NETWORKS
ZIANG CHEN, JIALIN LIU, XINSHANG WANG, JIANFENG LU, AND WOTAO YIN
Abstract. While Mixed-integer linear programming (MILP) is NP-hard in general, prac-
tical MILP has received roughly 100–fold speedup in the past twenty years. Still, many
classes of MILPs quickly become unsolvable as their sizes increase, motivating researchers
to seek new acceleration techniques for MILPs. With deep learning, they have obtained
strong empirical results, and many results were obtained by applying graph neural networks
(GNNs) to making decisions in various stages of MILP solution processes. This work discov-
ers a fundamental limitation: there exist feasible and infeasible MILPs that all GNNs will,
however, treat equally, indicating GNN’s lacking power to express general MILPs. Then, we
show that, by restricting the MILPs to unfoldable ones or by adding random features, there
exist GNNs that can reliably predict MILP feasibility, optimal objective values, and optimal
solutions up to prescribed precision. We conducted small-scale numerical experiments to
validate our theoretical findings.
1. Introduction
Mixed-integer linear programming (MILP) is a type of optimization problems that minimize
a linear objective function subject to linear constraints, where some or all variables must take
integer values. MILP has a wide type of applications, such as transportation [39], control [34],
scheduling [16], etc. Branch and Bound (B&B) [27], an algorithm widely adopted in modern
solvers that exactly solves general MILPs to global optimality, unfortunately, has an exponential
time complexity in the worst-case sense. To make MILP more practical, researchers have to
analyze the features of each instance of interest based on their domain knowledge, and use such
features to adaptively warm-start B&B or design the heuristics in B&B.
To automate such laborious process, researchers turn attention to Machine learning (ML)
techniques in recent years [7]. The literature has reported some encouraging findings that
a proper chosen ML model is able to learn some useful knowledge of MILP from data and
generalize well to some similar but unseen instances. For example, one can learn fast approxi-
mations of Strong Branching, an effective but time-consuming branching strategy usually used
in B&B [2, 24, 28, 46]. One may also learn cutting strategies [9, 23, 42], node selection/pruning
strategies [22, 45], or decomposition strategies [41] with ML models. The role of ML models
in those approaches can be summarized as: approximating useful mappings or parameterizing
Date: May 29, 2023.
A major part of the work of Z. Chen was completed during his internship at Alibaba US DAMO Academy.
Corresponding author: Jialin Liu, jialin.liu@alibaba-inc.com.
1
arXiv:2210.10759v2 [cs.LG] 25 May 2023
2 ZIANG CHEN, JIALIN LIU, XINSHANG WANG, JIANFENG LU, AND WOTAO YIN
key strategies in MILP solvers, and these mappings/strategies usually take an MILP instance
as input and output its key peroperties.
The graph neural network (GNN), due to its nice properties, say permutation invariance, is
considered a suitable model to represent such mappings/strategies for MILP. More specifically,
permutations on variables or constraints of an MILP do not essentially change the problem
itself, reliable ML models such as GNNs should satisfy such properties, otherwise the model may
overfit to the variable/constraint orders in the training data. [17] proposed that an MILP can be
encoded into a bipartite graph on which one can use a GNN to approximate Strong Branching.
[13] proposed to represent MILP with a tripartite graph. Since then, GNNs have been adopted
to represent mappings/strategies for MILP, for example, approximating Strong Branching [20,
21, 31, 40], approximating optimal solution [25, 31], parameterizing cutting strategies [32], and
parameterizing branching strategies [33, 38].
However, theoretical foundations in this direction still remain unclear. A key problem is the
ability of GNN to approximate important mappings related with MILP. We ask the following
questions:
Is GNN able to predict whether an MILP is feasible?(Q1)
Is GNN able to approximate the optimal objective value of an MILP?(Q2)
Is GNN able to approximate an optimal solution of an MILP?(Q3)
To answer questions (Q1 - Q3), one needs theories of separation power and representation
power of GNN. The separation power of GNN is measured by whether it can distinguish
two non-isomorphic graphs. Given two graphs G1, G2, we say a mapping F(e.g. a GNN) has
strong separation power if F(G1)̸=F(G2) as long as G1, G2that are not the isomorphic. In
our settings, since MILPs are represented by graphs, the separation power actually indicates
the ability of GNN to distinguish two different MILP instances. The representation power of
GNN refers to how well it can approximate mappings with permutation invariant properties. In
our settings, we study whether GNN can map an MILP to its feasiblity, optimal objective value
and an optimal solution. The separation power and representation power of GNNs are closely
related to the Weisfeiler-Lehman (WL) test [43], a classical algorithm to identify whether two
graphs are isomorphic or not. It has been shown that GNN has the same separation power
with the WL test [44], and, based on this result, GNNs can universally approximate continuous
graph-input mappings with separation power no stronger than the WL test [5,18].
Our contributions. With the above tools in hand, one still cannot directly answer questions
(Q1 - Q3) since the relationship between characteristics of general MILPs and properties of
graphs are not clear yet. Although there are some works studying the representation power
of GNN on some graph-related optimization problems [30, 36] and linear programming [12],
representing general MILPs with GNNs are still not theoretically studied, to the best of our
knowledge. Our contributions are listed below:
(Limitation of GNNs for MILP) We show with an example that GNNs do not have strong
enough separation power to distinguish any two different MILP instances. There exist two
MILPs such that one of them is feasible while the other one is not, but, unfortunately, all
ON REPRESENTING MIXED-INTEGER LINEAR PROGRAMS BY GRAPH NEURAL NETWORKS 3
GNNs treat them equally without detecting the essential difference between them. In fact,
there are infinitely many pairs of MILP instances that can puzzle GNN.
(Foldable and unfoldable MILP) We provide a precise mathematical description on what
type of MILPs makes GNNs fail. These hard MILP instances are named as foldable MILPs.
We prove that, for unfoldable MILPs, GNN has strong enough separation power and rep-
resentation power to approximate the feasibility, optimal objective value and an optimal
solution.
(MILP with random features) To handle those foldable MILPs, we propose to append random
features to the MILP-induced graphs. We prove that, with the random feature technique,
the answers to questions (Q1 - Q3) are affirmative.
The answers to (Q1 - Q3) serve as foundations of a more practical question: whether GNNs
are able to predict branching strategies or primal heuristics for MILP? Although the answers
to (Q1) and (Q2) do not directly answer that practical question, they illustrate the possibility
that GNNs can capture some key information of a MILP instance and have the capacity to
suggest an adaptive branching strategy for each instance. To obtain a GNN-based strategy,
practitioners should consider more factors: feature spaces, action spaces, training methods,
generalization performance, etc., and some recent empirical studies show encouraging results
on learning such a strategy [17, 20, 21, 25, 31, 33, 38, 40]. The answer to (Q3) directly shows the
possibility of learning primal heuristics. With proper model design and training methods, one
could obtain competitive GNN-based primal heuristics [13, 31].
The rest of this paper is organized as follows. We state some preliminaries in Section 2.
The limitation of GNN for MILP is presented in Section 3 and we provide the descriptions
of foldable and unfoldable MILPs in Section 4. The random feature technique is introduced
in Section 5. Section 6 contains some numerical results and the whole paper is concluded in
Section 7.
2. Preliminaries
In this section, we introduce some preliminaries that will be used throughout this paper.
Our notations, definitions, and examples follow [12]. Consider a general MILP problem defined
with:
(2.1) min
xRncx, s.t. Ax b, l xu, xjZ,jI,
where ARm×n,cRn,bRm,l(R∪ {−∞})n,u(R∪ {+∞})n, and ◦ ∈ {≤,=,≥}m.
The index set I⊂ {1,2, . . . , n}includes those indices jwhere xjare constrained to be an
integer. The feasible set is defined with Xfeasible := {xRn|Axb, l xu, xjZ,jI},
and we say an MILP is infeasible if Xfeasible =and feasible otherwise. For feasible MILPs,
inf{cx:xXfeasible}is named as the optimal objective value. If there exists xXfeasible
with cxcx, xXfeasible, then we say that xis an optimal solution. It is possible
that the objective value is arbitrarily good, i.e., for any R > 0, cˆx < Rholds for some
ˆxXfeasible. In this case, we say the MILP is unbounded or its optimal objective value is −∞.
4 ZIANG CHEN, JIALIN LIU, XINSHANG WANG, JIANFENG LU, AND WOTAO YIN
2.1. MILP as a weighted bipartite graph with vertex features. Inspired by [17] and
following [12], we represent MILP by weighted bipartite graphs. The vertex set of such a graph
is VW, where V={v1, v2, . . . , vm}with virepresenting the i-th constraint, and W=
{w1, w2, . . . , wm}with wjrepresenting the j-th variable. To fully represent all information in
(2.1), we associate each vertex with features: The vertex viVis equipped with a feature
vector hV
i= (bi,i) that is chosen from HV=R× {≤,=,≥}. The vertex wjWis equipped
with a feature vector hW
j= (cj, lj, uj, τj), where τj= 1 if jIand τj= 0 otherwise. The
feature hW
jis in the space HW=R×(R∪ {−∞})×(R∪ {+∞})× {0,1}. The edge Ei,j R
connects viVand wjW, and its value is defined with Ei,j =Ai,j . Note that there is
no edge connecting vertices in the same vertex group (Vor W) and Ei,j = 0 if there is no
connection between viand wj. Thus, the whole graph is denoted as G= (VW, E), and we
denote Gm,n as the collection of all such weighted bipartite graphs whose two vertex groups
have size mand n, respectively. Finally, we define HV
m:= (HV)mand HW
n:= (HW)n, and
stack all the vertex features together as H= (hV
1, hV
2, . . . , hV
m, hW
1, hW
2, . . . , hW
n)∈ HV
m× HW
n.
Then a weighted bipartite graph with vertex features (G, H)∈ Gm,n × HV
m× HW
ncontains all
information in the MILP problem (2.1), and we name such a graph as an MILP-induced graph
or MILP-graph.1If I=, the feature τjcan be dropped and the graphs reduce to LP-graphs
in [12]. We provide an example of MILP-graph in Figure 1.
min
xR2x1+x2,
s.t. x1+ 3x21, x1+x21,
x13, x25, x2Z.
v1
v2
w1
w2
hV
1= (1,)
hV
2= (1,)
hW
1= (1,−∞,3,0)
hW
2= (1,−∞,5,1)
1
3
1
1
Figure 1. An example of MILP-graph
2.2. Graph neural networks with message passing for MILP-graphs. To represent
properties of the whole graph, one needs to build a GNN that maps (G, H) to a real number:
Gm,n ×HV
m×HW
nR; to represent properties of each vertex in W(or represent properties for
each variable), one needs to build a GNN that maps (G, H) to a vector: Gm,n ×HV
m×HW
nRn.
The GNNs used in this paper can be constructed with the following three steps:
(1) Initial mapping at level l= 0. Let p0and q0be learnable embedding mappings, then
the embedded features are s0
i=p0(hV
i) and t0
j=q0(hW
j) for i= 1,2, . . . , m and j=
1,2, . . . , n.
(2) Message-passing layers for l= 1,2, . . . , L. Given learnable mappings fl, gl, pl, ql, one can
update vertex features by the following formulas for all i= 1,2, . . . , m and j= 1,2, . . . , n:
sl
i=plsl1
i,
n
X
j=1
Ei,j fl(tl1
j), tl
j=qltl1
j,
m
X
i=1
Ei,j gl(sl1
i).
1In [17] and other related empirical studies, the feature spaces HV,HWare more complicated than those
defined in this paper. We only keep the most basic features here for simplicity of analysis.
ON REPRESENTING MIXED-INTEGER LINEAR PROGRAMS BY GRAPH NEURAL NETWORKS 5
(3a) Last layer for graph-level output. Define a learnable mapping rGand the output is a
scalar yG=rG(¯sL,¯
tL), where ¯sL=Pm
i=1 sL
iand ¯
tL=Pn
j=1 tL
j.
(3b) Last layer for node-level output. Define a learnable mapping rWand the output is a
vector yRnwith entries being yj=rW(¯sL,¯
tL, tL
j) for j= 1,2, . . . , n.
Throughout this paper, we require all learnable mappings to be continuous following the
same settings as in [5, 12]. In practice, one may parameterize those mappings with multi-layer
perceptions (MLPs). Define FGNN as the set of all GNN mappings connecting layers (1), (2),
and (3a). FW
GNN is defined similarly by replacing (3a) with (3b).
2.3. Mappings to represent MILP characteristics. Now we introduce the mappings that
are what we aim to approximate by GNNs. With the definitions, we will revisit questions
(Q1-Q3) and describe them in a mathematically precise way.
Feasibility mapping. We first define the following mapping that indicates the feasibility of
MILP:
Φfeas :Gm,n × HV
m× HW
n→ {0,1},
where Φfeas(G, H) = 1 if (G, H) corresponds to a feasible MILP and Φfeas(G, H) = 0 otherwise.
Optimal objective value mapping. We then define the following mapping that maps an
MILP to its optimal objective value:
Φobj :Gm,n × HV
m× HW
nR∪ {∞,−∞},
where Φobj(G, H) = implies infeasibility and Φobj(G, H) = −∞ implies unboundedness.
Note that the optimal objective value for MILP may be an infimum that can never be achieved.
An example would be minxZ2x1+πx2,s.t. x1+πx22. The optimal objective value is
2, since for any ϵ > 0, there exists a feasible xwith x1+πx2<2 + ϵ. However, there is
no xZ2such that x1+πx2=2. Thus, the preimage Φ1
obj(R) cannot precisely describe
all MILP instances with an optimal solution, it describes MILP problems with a finite optimal
objective.
Optimal solution mapping. To give a well-defined optimal solution mapping is much more
complicated 2since the optimal objective value, as we discussed before, may never be achieved
in some cases. To handle this issue, we only consider the case that any component in lor umust
be finite, which implies that an optimal solution exists as long as the MILP problem is feasible.
More specifically, the vertex feature space is limited to e
HW
n= (R×R×R× {0,1})n⊂ HW
n
and we consider MILP problems taken from Dsolu =Gm,n ×HV
m×e
HW
nΦ1
feas(1).Note that
Φ1
feas(1) describes the set of all feasible MILPs. Consequently, any MILP instance in Dsolu
admits at least one optimal solution. We can further define the following mapping which maps
an MILP to exactly one of its optimal solutions:
Φsolu :Dsolu\Dfoldable Rn,
2If we remove the integer constraints I=and let MILP reduces to linear programming (LP), the solution
mapping will be easier to define. In this case, as long as the optimal objective value is finite, there must exist
an optimal solution, and the optimal solution with the smallest 2-norm is unique [12]. Therefore, a mapping
Φsolu, which maps an LP to its optimal solution with the smallest 2-norm, is well defined on Φ1
obj(R).
摘要:

ONREPRESENTINGMIXED-INTEGERLINEARPROGRAMSBYGRAPHNEURALNETWORKSZIANGCHEN,JIALINLIU,XINSHANGWANG,JIANFENGLU,ANDWOTAOYINAbstract.WhileMixed-integerlinearprogramming(MILP)isNP-hardingeneral,prac-ticalMILPhasreceivedroughly100–foldspeedupinthepasttwentyyears.Still,manyclassesofMILPsquicklybecomeunsolvabl...

展开>> 收起<<
ON REPRESENTING MIXED-INTEGER LINEAR PROGRAMS BY GRAPH NEURAL NETWORKS ZIANG CHEN JIALIN LIU XINSHANG WANG JIANFENG LU AND WOTAO YIN.pdf

共31页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:31 页 大小:706.59KB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 31
客服
关注