UNIFY AUNIFIED POLICY DESIGNING FRAMEWORK FOR SOLVING CONSTRAINED OPTIMIZATION PROBLEMS WITH MACHINE LEARNING

2025-05-06 0 0 465.81KB 17 页 10玖币

侵权投诉

UNIFY: AUNIFIED POLICY DESIGNING FRAMEWORK FOR

SOLVING CONSTRAINED OPTIMIZATION PROBLEMS WITH

MACHINE LEARNING

Mattia Silvestri∗

, Allegra De Filippo∗

, Michele Lombardi∗

, Michela Milano

Dipartimento di Informatica - Scienza e Ingegneria

University of Bologna

{mattia.silvestri4,allegra.defilippo,michele.lombardi2,michela.milano}@unibo.it

ABSTRACT

The interplay between Machine Learning (ML) and Constrained Optimization (CO) has recently been

the subject of increasing interest, leading to a new and proliﬁc research area covering (e.g.) Decision

Focused Learning and Constrained Reinforcement Learning. Such approaches strive to tackle complex

decision problems under uncertainty over multiple stages, involving both explicit (cost function,

constraints) and implicit knowledge (from data), and possibly subject to execution time restrictions.

While a good degree of success has been achieved, the existing methods still have limitations in

terms of both applicability and effectiveness. For problems in this class, we propose UNIFY, a uniﬁed

framework to design a solution policy for complex decision-making problems. Our approach relies

on a clever decomposition of the policy in two stages, namely an unconstrained ML model and a CO

problem, to take advantage of the strength of each approach while compensating for its weaknesses.

With a little design effort, UNIFY can generalize several existing approaches, thus extending their

applicability. We demonstrate the method effectiveness on two practical problems, namely an Energy

Management System and the Set Multi-cover with stochastic coverage requirements. Finally, we

highlight some current challenges of our method and future research directions that can beneﬁt from

the cross-fertilization of the two ﬁelds.

1 Introduction

Methods for combining Machine Learning (ML) and Constrained Optimization (CO) for decision support have attracted

considerable interest in recent years. This is motivated by the possibility to tackle complex decision making problems

subject to uncertainty (sometimes over multiple stages), and having a partially speciﬁed structure where knowledge is

available both in explicit form (cost function, constraints) and implicit form (historical data or simulators).

As a practical example, an Energy Management Systems (EMS) needs to allocate minimum-cost power ﬂows from

different Distributed Energy Resources (DERs) [

]. Based on actual energy prices, and forecasts on the availability of

DERs and on consumption, the EMS decides which power generators should be used and whether the surplus should be

stored or sold to the market. Such a problem involves hard constraints (maintaining power balance, power ﬂow limits),

a clear cost structure, elements of uncertainty that are partially known via historical data, and multiple decision stages

likely subject to execution time restrictions. In this type of use case, pure CO methods struggle with robustness and

scalability, while pure ML methods such as Reinforcement Learning (RL) have trouble dealing with hard constraints

and combinatorial decision spaces. Motivated by the opportunity to obtain improvements via a combination of ML

and CO, multiple lines of research have emerged, such as Decision Focused Learning, Constrained Reinforcement

Learning, or Algorithm Conﬁguration. While existing methods have obtained a good measure of success, to the best of

the authors knowledge no existing method can deal with all the challenges we have identiﬁed.

Ideally, one wishes to obtain a solution policy capable of providing feasible (and high-quality) solutions, handling

robustness, taking advantage of existing data, and with a reasonable computational load. In this paper, we argue this can

∗Equal contributors.

arXiv:2210.14030v1 [cs.LG] 25 Oct 2022

UNIFY

be achieved by introducing a uniﬁcation framework for a family of existing ML & CO approaches, in particular: 1)

Decision Focused Learning, 2) Constrained Reinforcement Learning, 3) Algorithm Conﬁguration and 4) Stochastic

Optimization algorithms. We assume to have access to problem knowledge in the form of both a declarative formulation

(namely an objective function and a set of constraints), and of historical data. The framework is then based on a two-step

policy decomposition, respectively into an unconstrained ML model and a CO problem. The interface between the two

components consists of a new set of “virtual” model parameters, which can serve as an additional (and potentially very

useful) design handle. Since the approach is decomposition-based, multiple learning and optimization methods can

be used for its implementation. In our presentation we use RL for the learning task, due to its ability to handle both

non-differentiable loss functions and sequential problems. We refer to our method as UNIFY.

The paper is structured as follows: we describe two motivating use cases in section 2; in Section 3 we formalize the

approach, while in Section 4 we show its relation with other hybrid and traditional methods for decision support. We

present an empirical evaluation on the two use cases in Section 5, while concluding remarks are in Section 6.

2 Use Cases Description

We rely on two use cases to motivate and demonstrate our approach, i.e. an Energy Management System and a simpliﬁed

production scheduling problem, for which both implicit and explicit knowledge is available in the form of:

• historical data or simulators;

• an objective function;

• problem constraints.

Energy Management System

First, we consider a real-world EMS that requires allocating minimum-cost power

ﬂows from different DERs [

]. The problem has a high level of uncertainty, which stems from uncontrollable deviations

from the planned consumption load and the presence of Renewable Energy Sources. Based on actual energy prices, and

forecasts on the availability of DERs and on consumption, the EMS decides: 1) how much energy should be produced;

2) which generators should be used; 3) whether the surplus of energy should be stored or sold to the energy market. The

problem admits both a single-stage and a sequential formulation; in the former, a plan for 96 time units (each one 15

minutes long) must be prepared one day in advance; in the latter, routing decisions are made each time unit for the next

one, until the end of horizon is reached.

In this case, historical costs, forecasts, and actual power generation and consumption represent the available data. The

objective function is the power ﬂow cost and the constraints impose power balance and power ﬂow limits. This problem

was tackled in [

] by introducing a virtual model parameter. In particular, it was shown how associating a (normally

absent) cost to the storage equipment can improve the performance of a baseline optimization method. An effective,

dedicated, algorithm conﬁguration approach was then developed.

A complete model is provided in the supplemental material.

Set Multi-cover

Second, we consider a simpliﬁed production planning problem, modeled as a Set Multi-cover

Problem with stochastic coverage requirements [

]. Given a universe

containing

elements and a collection of sets

over

, the Set Multi-cover Problem requires ﬁnding a minimum size sub-collection of the sets such that coverage

requirements for each element are satisﬁed. The sets may represent products that need to be manufactured together,

while the coverage requirements represent product demands.

We consider a version of the problem where sets have non-uniform manufacturing costs, and where the demands are

stochastic and unknown at production time. Unmet demands can then be satisﬁed by buying additional items, but at a

higher cost. The requirements are generated according to a Poisson distribution, and we assume the existence of a linear

relationships between an observable variable o∈Rand the Poisson rates λfor each product, i.e. λj=ajo∀j∈N.

In this case, the historical data is represented by a dataset

{oi, di}i=1,...,m

, where

are the demands and

is the dataset

size. The objective is to minimize the cost of manufacturing and buying additional items; the constraints require to

manufacture products in the same set together.

The full problem description is available in the supplementary material.

UNIFY

3 The UNIFY Framework

Our objective is to learn a policy that provides a high-quality solution to the problem itself. The term “policy” is a

reference to Reinforcement Learning, which we will eventually employ; for sake of simplicity, however, we will ﬁrst

formalize the approach for single-stage problems, where all decisions must be taken at once.

Formally, let

x∈X

be a vector representing observable information and let

be a vector representing the decisions,

which must come from the feasible set

C(x)

. In the case of the EMS,

refers to the production and consumption

forecasts and

are the power ﬂow values. For the Set Multi-cover, the observable

corresponds to

, whereas the

decision variables zare the amount of sets to manufacture.

Typically,

will have a ﬁxed size and

C(x)

will specify which vectors in the decision space are feasible. In general,

however, even the size of

may depend on the observable (e.g. decide which transplants

to perform, given a pool

patients and donors). In all cases, we aim at deﬁning a function π(x, θ), with parameter vector θ∈Θ, such that:

π: (x, θ)7→ z∈C(x)(1)

The

function is our constrained policy. In both the use cases,

should provide feasible decisions, i.e. power ﬂow

values or units of each set to be manufactured. We wish to choose

to minimize a cost metric on a target distribution.

The corresponding training problem can be formalized as:

arg min

θ∈Θ

E(x+,x)∼p[f(x+, x, π(x, θ))] (2)

where

f(x+, x, z)

is a function returning the cost of decisions

, when

is observed and uncertainty unfolds with

outcome

. We assume without loss of generality that

x+∈X

; the term

refers to the training distribution (e.g.

approximated via a simulator or a training set).

Given a solution

θ∗

to the training problem, we can then perform inference (i.e. solve the decision problem) by

observing

and evaluating

π(x, θ∗)

. Unfortunately, Equation (2) is hard to solve, since

will typically be trained on a

large dataset and it is expected to always return feasible decisions from a space possibly lacking a ﬁxed structure.

Our key insight is that Equation (2) can be made easier to handle by decomposing the policy into a traditional Machine

Learning model and a traditional Constrained Optimization problem. Formally, the policy can be reformulated as:

π(x, θ) = g(x, h(x, θ)) (3)

with:

g(x, y)≡arg min

z∈˜

C(x,y)

f(x, y, z)(4)

where

h(x, θ)

is the ML model and

g(x, y)

is a function deﬁned via the constrained optimization problem. The term

which corresponds to the output of the ML model, is referred to as virtual parameter vector. Its value may have an

impact both on the feasible set

and on the cost

of the optimization problem, which are referred to as virtual feasible

set and virtual cost. For example, in the EMS,

may correspond to the (normally absent) cost associated to the storage

system mentioned in section 2, and as a side effect require to use a modiﬁed cost function

. It is also possible to use an

existing set of parameters as

: for instance, in the Set Multi-cover use case, the ML model can be designed to predict

the coverage requirements d, thus requiring no modiﬁcation in the structure of the feasible set.

Properties of the Approach

There are a few things to notice about Equation (3) and

(4)

. First, in the reformulated

policy it is comparatively easy to enforce feasibility and to deal with combinatorial decision spaces: since this is done

outside of the ML model, any traditional constrained optimization method can be used (e.g. Mathematical Programming,

Constraint Programming, or the Alternating Direction Method of Multipliers).

Second, the reformulated training problem is:

arg min

θ∈Θ

E(x+,x)∼p[f(x+, x, z)]

z=g(x, y) = arg min

z∈˜

C(x,y)

f(x, y, z)

y=h(x, θ)

(5)

This is a bilevel problem, combining unconstrained optimization on

and constrained optimization on

. Several

techniques are available to tackle Equation (5). For example, if

is differentiable, one may use the classical subgradient

method to reach a local optimum: in the “forward pass” we evaluate

and compute

, in the “backward pass” we ﬁx

the value of zand differentiate hw.r.t. the parameters θ.

UNIFY

Third, since most bilevel optimization methods rely on separate steps to deal with

and

, the optimization problem

that deﬁnes

can be built on the ﬂy, adjusting the size of the decision vector

as needed (e.g. scaling the number of

possible transplants with the patient/donor pool).

Finally, the reformulated policy features a vector (the virtual parameters

) that is absent in the original formulation.

Choosing

is a design decision, and the trickiest step for applying our method. A conservative approach consists in: 1)

starting from a mathematical model of the original decision problem (e.g. routing, assignment, selection); 2) selecting a

set of parameters from the model to be used as y.

While this approach is viable, it misses a key opportunity: in fact, as the notation and terminology for

and

imply,

the feasible set and cost function of the inner optimization problem

may differ from the original ones. For example, it

is possible to introduce penalties or rewards, buffers on constraint satisfaction thresholds, or even to relax constraints

or to remove cost terms. This element of ﬂexibility provides a new design handle that can be used to “partition” the

challenging aspects of the original problem into either

, depending on which of the two components is best suited

to deal with them. While this is a powerful new mechanism to tackle complex problems, it does require a degree of

creativity and expertise to be effectively used. We provide a few examples of how to do it in Section 5.

Sequential Formulation and RL

The training formulation from Equation (5) applies to single-stage problems, but

it can be extended to sequential decision-making. Formally, we can view the problem as a Markov Decision Process

hX, Z, p+, f, p1, γi

where

is the set of possible (observable) states and

is the decision space,

is the probability

distribution for state transitions,

is the cost function,

is the probability distribution for the initial state, and

γ∈(0,1]

is a discount factor. At each step of the sequential process we will have access to a distinct observed state

and we will

output a distinct decision vector

; the value

p+(xk+1, xk, zk)

denotes the probability of reaching state

xk+1

when

applying decisions

on state

; the cost function

is the same as in Equation (2) and

(5)

. Within this framework, the

training problem becomes:

arg min

θ∈Θ

Eτ∼p"eoh

k=1

γkf(xk+1, xk, zk)#

zk=g(xk, yk) = arg min

z∈˜

C(xk,yk)

f(xk, yk, zk)

yk=h(xk, θ)

(6)

where

is the ML model output for step

, and

refers to a trajectory, i.e. to a sequence of states, ML outputs, and

decisions

{(xk, yk, zk)}eoh

k=1

. The probability of a trajectory is given by

p(τ) = p1(x1)Qeoh

k=1 p+(xk+1, xk, zk)

. The

term

eoh

is the End Of Horizon, and it can be inﬁnite if the sequence is not upper-bounded. In this case, the discount

factor γshould be strictly lower than 1, while γshould be equal to 1 for decision problems over a ﬁnite horizon.

Equation (6) can be interpreted as deﬁning a Reinforcement Learning problem, where the

plays the role of a

conventional RL policy, and

can be viewed at training time as part of the environment. This is an important

observation since it implies that any RL approach can be used. Once an optimized parameter vector

θ∗

has been

obtained, the reformulated policy is given by

g(x, h(x, θ∗))

as usual. The sequential formulation directly generalizes

the single step one, meaning that RL algorithms can be used to tackle Equation (5). In fact, this is exactly what we do in

our experimentation since it simpliﬁes the exposition.

4 Relation to Other Approaches

Decision Focused Learning (DFL)

Approaches in this class seek to train estimators for parameters of optimization

models, while explicitly accounting for the impact that estimation inaccuracy has on the decisions. The idea gained

attention after the seminal work by [

], with several new methods being proposed – and well covered in a recent survey

by [5].

While the original approach was limited to convex costs and constraints, subsequent works have tackled combinatorial

problems, either in an approximate fashion via continuous relaxations [

], or in an exact fashion by assuming a ﬁxed

feasible space and linear costs [7, 8]. Outer and inner relaxations have also been used to improve scalability [9, 10].

DFL can be interpreted as solving (via subgradient optimization) a close variant of Equation (5), with some restrictions

and one generalization. In particular, all DFL approaches lack virtual parameters: the ML model is instead expected to

estimate part of the future state, i.e.

is a prediction for a portion of

. For example, the estimator might look at

current trafﬁc to predict future travel times. Additionally,

always corresponds to the original decision problem cost,

and in all but a few cases the feasible space is ﬁxed, i.e.

C(x, y) = C

. The connection between

and

implies that

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

UNIFY:AUNIFIEDPOLICYDESIGNINGFRAMEWORKFORSOLVINGCONSTRAINEDOPTIMIZATIONPROBLEMSWITHMACHINELEARNINGMattiaSilvestri,AllegraDeFilippo,MicheleLombardi,MichelaMilanoDipartimentodiInformatica-ScienzaeIngegneriaUniversityofBologna{mattia.silvestri4,allegra.defilippo,michele.lombardi2,michela.milano}@uni...

展开>> 收起<<

UNIFY AUNIFIED POLICY DESIGNING FRAMEWORK FOR SOLVING CONSTRAINED OPTIMIZATION PROBLEMS WITH MACHINE LEARNING.pdf

共17页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

UNIFY AUNIFIED POLICY DESIGNING FRAMEWORK FOR SOLVING CONSTRAINED OPTIMIZATION PROBLEMS WITH MACHINE LEARNING

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: