On the optimal pivot path of simplex method for linear programming based on reinforcement learning

2025-05-02 0 0 1.35MB 43 页 10玖币

侵权投诉

On the optimal pivot path of simplex method for linear

programming based on reinforcement learning

Anqi Lia, Tiande Guoa, Congying Hana,∗, Bonan Lia, Haoran Lia

aSchool of Mathematical Sciences, University of Chinese Academy of Sciences, Shijingshan

District, Beijing, China

Abstract

Based on the existing pivot rules, the simplex method for linear programming

is not polynomial in the worst case. Therefore the optimal pivot of the simplex

method is crucial. This study proposes the optimal rule to ﬁnd all shortest pivot

paths of the simplex method for linear programming problems based on Monte

Carlo tree search (MCTS). Speciﬁcally, we ﬁrst propose the SimplexPseudoTree

to transfer the simplex method into tree search mode while avoiding repeated

basis variables. Secondly, we propose four reinforcement learning (RL) models

with two actions and two rewards to make the Monte Carlo tree search suitable

for the simplex method. Thirdly, we set a new action selection criterion to

ameliorate the inaccurate evaluation in the initial exploration. It is proved

that when the number of vertices in the feasible region is Cm

n, our method can

generate all the shortest pivot paths, which is the polynomial of the number of

variables. In addition, we experimentally validate that the proposed schedule

can avoid unnecessary search and provide the optimal pivot path. Furthermore,

this method can provide the best pivot labels for all kinds of supervised learning

methods to solve linear programming problems.

Keywords: simplex method, linear programming, pivot rules, reinforcement

learning

∗Corresponding author

Email addresses: lianqi20@mails.ucas.ac.cn (Anqi Li), tdguo@ucas.ac.cn (Tiande

Guo), hancy@ucas.ac.cn (Congying Han), libonan16@mails.ucas.ac.cn (Bonan Li),

lihaoran21@mails.ucas.ac.cn (Haoran Li)

Preprint submitted to Journal of L

X Templates October 13, 2023

arXiv:2210.02945v2 [math.OC] 12 Oct 2023

2010 MSC: 00-01, 99-00

1. Introduction

The simplex method is a classical method for solving linear programming

(LP) problems. Although it is a nonpolynomial time algorithm, its worst case

rarely occurs and its average performance is better than that of polynomial time

algorithms, such as the interior point method and ellipsoid method, especially

for small-scale and medium-scale problems. Much research work has focused on

making the simplex method a polynomial-time algorithm, but it has not been

successful. The existing pivot rules can neither provide the optimal pivot paths

for the simplex method nor make it a polynomial-time algorithm. In addition,

the traditional design idea only applies to designing the pivot rule suitable for

certain types of problems. There are no general ways to ﬁnd the least number

of pivot iterations for all types of linear programming. Our research goal is

to design a general optimal pivot rule based on the inherent features of linear

programming extracted by reinforcement learning (RL) that can be solved in

polynomial time. This study is the ﬁrst step toward achieving this goal.

With the rise of machine learning (ML), ML-based technologies provides re-

searchers with new ideas of pivot rules. Based on the deep Q-network (DQN) [1,

2], DeepSimplex [3] provides a pivot rule that can select the most suitable pivot

rule for the current state between the Dantzig and steepest-edge rules. While

another study [4] provides an instance-based method, the most suitable pivot

rule for the current instance is learned among the ﬁve conventional pivot rules.

The above two methods are based on several given pivot rules, and then learn

the pivot rule scheduling scheme depending on the solution state or input in-

stances. Therefore, the performance of these methods is heavily dependent on

the supervised pivot rules. Unfortunately, owing to the lack of optimal labels,

supervised pivot rules cannot extract the optimal pivot paths for the simplex

method.

In addition, the diﬃculty in determining the optimal pivot path lies in the

information after several pivot iterations in the future. The existing solution

state is insuﬃcient for optimal future decisions. The most eﬀective method is to

appropriately assess the future situation before deciding to guide the best pivot.

Fortunately, this idea is consistent with the Monte Carlo tree search (MCTS).

Speciﬁcally, MCTS explores the trajectory in advance to evaluate and obtain

future information to guide decision-making, signiﬁcantly reducing the invalid

search space and eﬀectively guiding the best decision-making. Thus, the simplex

method can eﬀectively use future information to guide the current optimal pivot

decision.

Motivated by these observations, we propose to analyze and improve the

simplex method in pivoting with the Monte Carlo tree search, further pushing

forward the frontier of the simplex method for linear programming in a general

way. This study focuses on four core aspects: (1) transforming the simplex

method into a pseudo-tree structure, (2) constructing appropriate reinforcement

learning models, (3) providing the MCTS rule to ﬁnd all shortest pivot paths,

and (4) giving thorough theory for the optimality and complexity of the MCTS

rule, as shown in Figure 1.

First, transforming the simplex method into the tree search mode is the

premise for applying the Monte Carlo tree search method. Considering the con-

nectivity and acyclicity characteristics, the tree structure can eﬀectively avoid

the generation of cycles in exploration paths. In this way, it ingeniously avoid

repetition of the basis variables in exploration. To construct an imitative tree

structure, SimplexPseudoTree, we propose to regard current states as nodes and

the corresponding pivoting process as edges. The original linear programming

instance is taken as the root node, and optimal solutions correspond to leaf

nodes. Furthermore, we remove edges between nodes in the same layer and

pivots from the deep layer to the shallow layer. Under this construction, the

problem of ﬁnding the shortest path from the root node to the leaf node is

equivalent to that of ﬁnding the optimal pivot rule.

Subsequently, we transform the problem of ﬁnding the optimal pivot path

into four reinforcement-learning models. In particular, the following two novel

pivot paths

RL models

S 𝐴1 𝑅2

S 𝐴2 𝑅1

S 𝐴2 𝑅2

S 𝐴1 𝑅1

H I

C E J K

D L M

G Q O N P

SimplexPseudoTree

J K

O P

repeat MCTS rule

H I

C E J K

D L M

G Q O N P

H I

C E J K

D L M

G Q O N P

H I

C E J K

D L M

G Q O N P

H I

C E J K

D L M

G Q O N P

Figure 1: Overview of the methodological framework in this paper. Firstly, we create Simplex-

PseudoTree to transform the simplex method applicable to reinforcement methods in Section

3. Next, four RL models are proposed in Section 4.1 based on the SimplexPseudoTree. Then

we propose the MCTS rule to calculate all the shortest pivot paths in Section 4.2 and Section

4.3. Finally, we give thorough theory analysis for the MCTS rule in Section 5.

action spaces and two reward functions are introduced.

•Action set: (1) non-basis variables whose reduced costs are less than zero

and (2) non-basis variables whose reduced costs are not equal to zero.

•Reward functions: (1) opposite of pivot iterations and (2) linear decay

weight estimation of the average variation in the objective value caused

by a single pivot.

Based on the proposed actions and rewards, we design four models of diﬀerent

forms. Through a comprehensive comparison, it is ﬁnd that the model com-

prising action set (1) and reward functions (2) achieves the highest eﬃciency

with the least computational cost. Furthermore, we construct a novel action

selection criterion for the simplex method to ameliorate inaccurate evaluation

in the initial exploration. Subsequently, we present the MCTS rule based on

the Monte Carlo tree search to determine the optimal pivot for current states.

In addition, the optimal pivot that corresponds to the minimum pivot iter-

ations is not necessarily unique. However, current research has not provided a

way to ﬁnd multiple optimal pivot paths. Unlike deterministic pivot rules, our

MCTS rule exhibits certain randomness in the exploration stage. Speciﬁcally,

the proposed exploration criterion can introduce a controllable scale factor based

on the upper conﬁdence bounds (UCB) method. Thus, additional randomness

is added to the balance between the estimated value and explorations brought

by the UCB algorithm. Additionally, the randomness of the MCTS rule can

guide the selection of diﬀerent actions to achieve the minimum pivot iterations

under the guidance of optimality.

Consequently, we prove the optimality and completeness of the MCTS rule.

And we also prove the polynomial complexity of the optimal pivot iterations

when the number of vertices in the feasible region is Cm

n. Concretely, the MCTS

rule can ﬁnd all the shortest pivot paths according to Wiener-khinchin law of

large Numbers. Firstly, the MCTS rule can ﬁnd the optimal pivot path when

explorations approaches inﬁnity. Then the MCTS rule can ﬁnd all the diﬀerent

pivot paths when executions approach inﬁnity. Additionally, from the perspec-

tive of combinatorial numbers, we prove that the minimum pivot iterations is

polynomial of variables when the number of vertices in the feasible region is

n. We also verify the polynomial iterations from the geometric perspective.

Given the above four aspects, we present a novel MCTS rule that provides

all the shortest pivot paths. Additionally, we can label massive instances with

little cost for the supervised pivot rule based on the proposed MCTS rule.

Comprehensive experiments on the NETLIB benchmark and random instances

demonstrated the eﬃcacy of the MCTS rule. It is worth noting that compared

with the minimum pivot iterations achieved by other popular pivot rules, our

result is only 54.55% for random instances and 49.06% for NETLIB.

Our main contributions are as follows:

•Construct the SimplexPseudoTree to ensure that MCTS can be applied

to the simplex method while avoiding duplicate bases.

•Propose the MCTS rule to determine all the optimal pivot sequences.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

OntheoptimalpivotpathofsimplexmethodforlinearprogrammingbasedonreinforcementlearningAnqiLia,TiandeGuoa,CongyingHana,∗,BonanLia,HaoranLiaaSchoolofMathematicalSciences,UniversityofChineseAcademyofSciences,ShijingshanDistrict,Beijing,ChinaAbstractBasedontheexistingpivotrules,thesimplexmethodforlinearpr...

展开>> 收起<<

On the optimal pivot path of simplex method for linear programming based on reinforcement learning.pdf

共43页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

On the optimal pivot path of simplex method for linear programming based on reinforcement learning

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: