On the optimal pivot path of simplex method for linear programming based on reinforcement learning

2025-05-02 0 0 1.35MB 43 页 10玖币
侵权投诉
On the optimal pivot path of simplex method for linear
programming based on reinforcement learning
Anqi Lia, Tiande Guoa, Congying Hana,, Bonan Lia, Haoran Lia
aSchool of Mathematical Sciences, University of Chinese Academy of Sciences, Shijingshan
District, Beijing, China
Abstract
Based on the existing pivot rules, the simplex method for linear programming
is not polynomial in the worst case. Therefore the optimal pivot of the simplex
method is crucial. This study proposes the optimal rule to find all shortest pivot
paths of the simplex method for linear programming problems based on Monte
Carlo tree search (MCTS). Specifically, we first propose the SimplexPseudoTree
to transfer the simplex method into tree search mode while avoiding repeated
basis variables. Secondly, we propose four reinforcement learning (RL) models
with two actions and two rewards to make the Monte Carlo tree search suitable
for the simplex method. Thirdly, we set a new action selection criterion to
ameliorate the inaccurate evaluation in the initial exploration. It is proved
that when the number of vertices in the feasible region is Cm
n, our method can
generate all the shortest pivot paths, which is the polynomial of the number of
variables. In addition, we experimentally validate that the proposed schedule
can avoid unnecessary search and provide the optimal pivot path. Furthermore,
this method can provide the best pivot labels for all kinds of supervised learning
methods to solve linear programming problems.
Keywords: simplex method, linear programming, pivot rules, reinforcement
learning
Corresponding author
Email addresses: lianqi20@mails.ucas.ac.cn (Anqi Li), tdguo@ucas.ac.cn (Tiande
Guo), hancy@ucas.ac.cn (Congying Han), libonan16@mails.ucas.ac.cn (Bonan Li),
lihaoran21@mails.ucas.ac.cn (Haoran Li)
Preprint submitted to Journal of L
A
T
E
X Templates October 13, 2023
arXiv:2210.02945v2 [math.OC] 12 Oct 2023
2010 MSC: 00-01, 99-00
1. Introduction
The simplex method is a classical method for solving linear programming
(LP) problems. Although it is a nonpolynomial time algorithm, its worst case
rarely occurs and its average performance is better than that of polynomial time
algorithms, such as the interior point method and ellipsoid method, especially
for small-scale and medium-scale problems. Much research work has focused on
making the simplex method a polynomial-time algorithm, but it has not been
successful. The existing pivot rules can neither provide the optimal pivot paths
for the simplex method nor make it a polynomial-time algorithm. In addition,
the traditional design idea only applies to designing the pivot rule suitable for
certain types of problems. There are no general ways to find the least number
of pivot iterations for all types of linear programming. Our research goal is
to design a general optimal pivot rule based on the inherent features of linear
programming extracted by reinforcement learning (RL) that can be solved in
polynomial time. This study is the first step toward achieving this goal.
With the rise of machine learning (ML), ML-based technologies provides re-
searchers with new ideas of pivot rules. Based on the deep Q-network (DQN) [1,
2], DeepSimplex [3] provides a pivot rule that can select the most suitable pivot
rule for the current state between the Dantzig and steepest-edge rules. While
another study [4] provides an instance-based method, the most suitable pivot
rule for the current instance is learned among the five conventional pivot rules.
The above two methods are based on several given pivot rules, and then learn
the pivot rule scheduling scheme depending on the solution state or input in-
stances. Therefore, the performance of these methods is heavily dependent on
the supervised pivot rules. Unfortunately, owing to the lack of optimal labels,
supervised pivot rules cannot extract the optimal pivot paths for the simplex
method.
In addition, the difficulty in determining the optimal pivot path lies in the
2
information after several pivot iterations in the future. The existing solution
state is insufficient for optimal future decisions. The most effective method is to
appropriately assess the future situation before deciding to guide the best pivot.
Fortunately, this idea is consistent with the Monte Carlo tree search (MCTS).
Specifically, MCTS explores the trajectory in advance to evaluate and obtain
future information to guide decision-making, significantly reducing the invalid
search space and effectively guiding the best decision-making. Thus, the simplex
method can effectively use future information to guide the current optimal pivot
decision.
Motivated by these observations, we propose to analyze and improve the
simplex method in pivoting with the Monte Carlo tree search, further pushing
forward the frontier of the simplex method for linear programming in a general
way. This study focuses on four core aspects: (1) transforming the simplex
method into a pseudo-tree structure, (2) constructing appropriate reinforcement
learning models, (3) providing the MCTS rule to find all shortest pivot paths,
and (4) giving thorough theory for the optimality and complexity of the MCTS
rule, as shown in Figure 1.
First, transforming the simplex method into the tree search mode is the
premise for applying the Monte Carlo tree search method. Considering the con-
nectivity and acyclicity characteristics, the tree structure can effectively avoid
the generation of cycles in exploration paths. In this way, it ingeniously avoid
repetition of the basis variables in exploration. To construct an imitative tree
structure, SimplexPseudoTree, we propose to regard current states as nodes and
the corresponding pivoting process as edges. The original linear programming
instance is taken as the root node, and optimal solutions correspond to leaf
nodes. Furthermore, we remove edges between nodes in the same layer and
pivots from the deep layer to the shallow layer. Under this construction, the
problem of finding the shortest path from the root node to the leaf node is
equivalent to that of finding the optimal pivot rule.
Subsequently, we transform the problem of finding the optimal pivot path
into four reinforcement-learning models. In particular, the following two novel
3
pivot paths
RL models
S 𝐴1 𝑅2
S 𝐴2 𝑅1
S 𝐴2 𝑅2
S 𝐴1 𝑅1
A
B
H I
C E J K
D L M
F
G Q O N P
SimplexPseudoTree
A
B
C
D
E
F
G
H
I
J K
L
M
N
O P
Q
repeat MCTS rule
A
B
H I
C E J K
D L M
F
G Q O N P
A
B
H I
C E J K
D L M
F
G Q O N P
A
B
H I
C E J K
D L M
F
G Q O N P
A
B
H I
C E J K
D L M
F
G Q O N P
Figure 1: Overview of the methodological framework in this paper. Firstly, we create Simplex-
PseudoTree to transform the simplex method applicable to reinforcement methods in Section
3. Next, four RL models are proposed in Section 4.1 based on the SimplexPseudoTree. Then
we propose the MCTS rule to calculate all the shortest pivot paths in Section 4.2 and Section
4.3. Finally, we give thorough theory analysis for the MCTS rule in Section 5.
action spaces and two reward functions are introduced.
Action set: (1) non-basis variables whose reduced costs are less than zero
and (2) non-basis variables whose reduced costs are not equal to zero.
Reward functions: (1) opposite of pivot iterations and (2) linear decay
weight estimation of the average variation in the objective value caused
by a single pivot.
Based on the proposed actions and rewards, we design four models of different
forms. Through a comprehensive comparison, it is find that the model com-
prising action set (1) and reward functions (2) achieves the highest efficiency
with the least computational cost. Furthermore, we construct a novel action
selection criterion for the simplex method to ameliorate inaccurate evaluation
in the initial exploration. Subsequently, we present the MCTS rule based on
the Monte Carlo tree search to determine the optimal pivot for current states.
In addition, the optimal pivot that corresponds to the minimum pivot iter-
4
ations is not necessarily unique. However, current research has not provided a
way to find multiple optimal pivot paths. Unlike deterministic pivot rules, our
MCTS rule exhibits certain randomness in the exploration stage. Specifically,
the proposed exploration criterion can introduce a controllable scale factor based
on the upper confidence bounds (UCB) method. Thus, additional randomness
is added to the balance between the estimated value and explorations brought
by the UCB algorithm. Additionally, the randomness of the MCTS rule can
guide the selection of different actions to achieve the minimum pivot iterations
under the guidance of optimality.
Consequently, we prove the optimality and completeness of the MCTS rule.
And we also prove the polynomial complexity of the optimal pivot iterations
when the number of vertices in the feasible region is Cm
n. Concretely, the MCTS
rule can find all the shortest pivot paths according to Wiener-khinchin law of
large Numbers. Firstly, the MCTS rule can find the optimal pivot path when
explorations approaches infinity. Then the MCTS rule can find all the different
pivot paths when executions approach infinity. Additionally, from the perspec-
tive of combinatorial numbers, we prove that the minimum pivot iterations is
polynomial of variables when the number of vertices in the feasible region is
Cm
n. We also verify the polynomial iterations from the geometric perspective.
Given the above four aspects, we present a novel MCTS rule that provides
all the shortest pivot paths. Additionally, we can label massive instances with
little cost for the supervised pivot rule based on the proposed MCTS rule.
Comprehensive experiments on the NETLIB benchmark and random instances
demonstrated the efficacy of the MCTS rule. It is worth noting that compared
with the minimum pivot iterations achieved by other popular pivot rules, our
result is only 54.55% for random instances and 49.06% for NETLIB.
Our main contributions are as follows:
Construct the SimplexPseudoTree to ensure that MCTS can be applied
to the simplex method while avoiding duplicate bases.
Propose the MCTS rule to determine all the optimal pivot sequences.
5
摘要:

OntheoptimalpivotpathofsimplexmethodforlinearprogrammingbasedonreinforcementlearningAnqiLia,TiandeGuoa,CongyingHana,∗,BonanLia,HaoranLiaaSchoolofMathematicalSciences,UniversityofChineseAcademyofSciences,ShijingshanDistrict,Beijing,ChinaAbstractBasedontheexistingpivotrules,thesimplexmethodforlinearpr...

展开>> 收起<<
On the optimal pivot path of simplex method for linear programming based on reinforcement learning.pdf

共43页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:43 页 大小:1.35MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 43
客服
关注