UNIFY AUNIFIED POLICY DESIGNING FRAMEWORK FOR SOLVING CONSTRAINED OPTIMIZATION PROBLEMS WITH MACHINE LEARNING

2025-05-06 0 0 465.81KB 17 页 10玖币
侵权投诉
UNIFY: AUNIFIED POLICY DESIGNING FRAMEWORK FOR
SOLVING CONSTRAINED OPTIMIZATION PROBLEMS WITH
MACHINE LEARNING
Mattia Silvestri
, Allegra De Filippo
, Michele Lombardi
, Michela Milano
Dipartimento di Informatica - Scienza e Ingegneria
University of Bologna
{mattia.silvestri4,allegra.defilippo,michele.lombardi2,michela.milano}@unibo.it
ABSTRACT
The interplay between Machine Learning (ML) and Constrained Optimization (CO) has recently been
the subject of increasing interest, leading to a new and prolific research area covering (e.g.) Decision
Focused Learning and Constrained Reinforcement Learning. Such approaches strive to tackle complex
decision problems under uncertainty over multiple stages, involving both explicit (cost function,
constraints) and implicit knowledge (from data), and possibly subject to execution time restrictions.
While a good degree of success has been achieved, the existing methods still have limitations in
terms of both applicability and effectiveness. For problems in this class, we propose UNIFY, a unified
framework to design a solution policy for complex decision-making problems. Our approach relies
on a clever decomposition of the policy in two stages, namely an unconstrained ML model and a CO
problem, to take advantage of the strength of each approach while compensating for its weaknesses.
With a little design effort, UNIFY can generalize several existing approaches, thus extending their
applicability. We demonstrate the method effectiveness on two practical problems, namely an Energy
Management System and the Set Multi-cover with stochastic coverage requirements. Finally, we
highlight some current challenges of our method and future research directions that can benefit from
the cross-fertilization of the two fields.
1 Introduction
Methods for combining Machine Learning (ML) and Constrained Optimization (CO) for decision support have attracted
considerable interest in recent years. This is motivated by the possibility to tackle complex decision making problems
subject to uncertainty (sometimes over multiple stages), and having a partially specified structure where knowledge is
available both in explicit form (cost function, constraints) and implicit form (historical data or simulators).
As a practical example, an Energy Management Systems (EMS) needs to allocate minimum-cost power flows from
different Distributed Energy Resources (DERs) [
1
]. Based on actual energy prices, and forecasts on the availability of
DERs and on consumption, the EMS decides which power generators should be used and whether the surplus should be
stored or sold to the market. Such a problem involves hard constraints (maintaining power balance, power flow limits),
a clear cost structure, elements of uncertainty that are partially known via historical data, and multiple decision stages
likely subject to execution time restrictions. In this type of use case, pure CO methods struggle with robustness and
scalability, while pure ML methods such as Reinforcement Learning (RL) have trouble dealing with hard constraints
and combinatorial decision spaces. Motivated by the opportunity to obtain improvements via a combination of ML
and CO, multiple lines of research have emerged, such as Decision Focused Learning, Constrained Reinforcement
Learning, or Algorithm Configuration. While existing methods have obtained a good measure of success, to the best of
the authors knowledge no existing method can deal with all the challenges we have identified.
Ideally, one wishes to obtain a solution policy capable of providing feasible (and high-quality) solutions, handling
robustness, taking advantage of existing data, and with a reasonable computational load. In this paper, we argue this can
Equal contributors.
arXiv:2210.14030v1 [cs.LG] 25 Oct 2022
UNIFY
be achieved by introducing a unification framework for a family of existing ML & CO approaches, in particular: 1)
Decision Focused Learning, 2) Constrained Reinforcement Learning, 3) Algorithm Configuration and 4) Stochastic
Optimization algorithms. We assume to have access to problem knowledge in the form of both a declarative formulation
(namely an objective function and a set of constraints), and of historical data. The framework is then based on a two-step
policy decomposition, respectively into an unconstrained ML model and a CO problem. The interface between the two
components consists of a new set of “virtual” model parameters, which can serve as an additional (and potentially very
useful) design handle. Since the approach is decomposition-based, multiple learning and optimization methods can
be used for its implementation. In our presentation we use RL for the learning task, due to its ability to handle both
non-differentiable loss functions and sequential problems. We refer to our method as UNIFY.
The paper is structured as follows: we describe two motivating use cases in section 2; in Section 3 we formalize the
approach, while in Section 4 we show its relation with other hybrid and traditional methods for decision support. We
present an empirical evaluation on the two use cases in Section 5, while concluding remarks are in Section 6.
2 Use Cases Description
We rely on two use cases to motivate and demonstrate our approach, i.e. an Energy Management System and a simplified
production scheduling problem, for which both implicit and explicit knowledge is available in the form of:
historical data or simulators;
an objective function;
problem constraints.
Energy Management System
First, we consider a real-world EMS that requires allocating minimum-cost power
flows from different DERs [
1
]. The problem has a high level of uncertainty, which stems from uncontrollable deviations
from the planned consumption load and the presence of Renewable Energy Sources. Based on actual energy prices, and
forecasts on the availability of DERs and on consumption, the EMS decides: 1) how much energy should be produced;
2) which generators should be used; 3) whether the surplus of energy should be stored or sold to the energy market. The
problem admits both a single-stage and a sequential formulation; in the former, a plan for 96 time units (each one 15
minutes long) must be prepared one day in advance; in the latter, routing decisions are made each time unit for the next
one, until the end of horizon is reached.
In this case, historical costs, forecasts, and actual power generation and consumption represent the available data. The
objective function is the power flow cost and the constraints impose power balance and power flow limits. This problem
was tackled in [
2
] by introducing a virtual model parameter. In particular, it was shown how associating a (normally
absent) cost to the storage equipment can improve the performance of a baseline optimization method. An effective,
dedicated, algorithm configuration approach was then developed.
A complete model is provided in the supplemental material.
Set Multi-cover
Second, we consider a simplified production planning problem, modeled as a Set Multi-cover
Problem with stochastic coverage requirements [
3
]. Given a universe
N
containing
n
elements and a collection of sets
over
N
, the Set Multi-cover Problem requires finding a minimum size sub-collection of the sets such that coverage
requirements for each element are satisfied. The sets may represent products that need to be manufactured together,
while the coverage requirements represent product demands.
We consider a version of the problem where sets have non-uniform manufacturing costs, and where the demands are
stochastic and unknown at production time. Unmet demands can then be satisfied by buying additional items, but at a
higher cost. The requirements are generated according to a Poisson distribution, and we assume the existence of a linear
relationships between an observable variable oRand the Poisson rates λfor each product, i.e. λj=ajojN.
In this case, the historical data is represented by a dataset
{oi, di}i=1,...,m
, where
d
are the demands and
m
is the dataset
size. The objective is to minimize the cost of manufacturing and buying additional items; the constraints require to
manufacture products in the same set together.
The full problem description is available in the supplementary material.
2
UNIFY
3 The UNIFY Framework
Our objective is to learn a policy that provides a high-quality solution to the problem itself. The term “policy” is a
reference to Reinforcement Learning, which we will eventually employ; for sake of simplicity, however, we will first
formalize the approach for single-stage problems, where all decisions must be taken at once.
Formally, let
xX
be a vector representing observable information and let
z
be a vector representing the decisions,
which must come from the feasible set
C(x)
. In the case of the EMS,
x
refers to the production and consumption
forecasts and
z
are the power flow values. For the Set Multi-cover, the observable
x
corresponds to
o
, whereas the
decision variables zare the amount of sets to manufacture.
Typically,
z
will have a fixed size and
C(x)
will specify which vectors in the decision space are feasible. In general,
however, even the size of
z
may depend on the observable (e.g. decide which transplants
z
to perform, given a pool
x
of
patients and donors). In all cases, we aim at defining a function π(x, θ), with parameter vector θΘ, such that:
π: (x, θ)7→ zC(x)(1)
The
π
function is our constrained policy. In both the use cases,
π
should provide feasible decisions, i.e. power flow
values or units of each set to be manufactured. We wish to choose
θ
to minimize a cost metric on a target distribution.
The corresponding training problem can be formalized as:
arg min
θΘ
E(x+,x)p[f(x+, x, π(x, θ))] (2)
where
f(x+, x, z)
is a function returning the cost of decisions
z
, when
x
is observed and uncertainty unfolds with
outcome
x+
. We assume without loss of generality that
x+X
; the term
p
refers to the training distribution (e.g.
approximated via a simulator or a training set).
Given a solution
θ
to the training problem, we can then perform inference (i.e. solve the decision problem) by
observing
x
and evaluating
π(x, θ)
. Unfortunately, Equation (2) is hard to solve, since
π
will typically be trained on a
large dataset and it is expected to always return feasible decisions from a space possibly lacking a fixed structure.
Our key insight is that Equation (2) can be made easier to handle by decomposing the policy into a traditional Machine
Learning model and a traditional Constrained Optimization problem. Formally, the policy can be reformulated as:
π(x, θ) = g(x, h(x, θ)) (3)
with:
g(x, y)arg min
z˜
C(x,y)
˜
f(x, y, z)(4)
where
h(x, θ)
is the ML model and
g(x, y)
is a function defined via the constrained optimization problem. The term
y
,
which corresponds to the output of the ML model, is referred to as virtual parameter vector. Its value may have an
impact both on the feasible set
˜
C
and on the cost
˜
f
of the optimization problem, which are referred to as virtual feasible
set and virtual cost. For example, in the EMS,
y
may correspond to the (normally absent) cost associated to the storage
system mentioned in section 2, and as a side effect require to use a modified cost function
˜
f
. It is also possible to use an
existing set of parameters as
y
: for instance, in the Set Multi-cover use case, the ML model can be designed to predict
the coverage requirements d, thus requiring no modification in the structure of the feasible set.
Properties of the Approach
There are a few things to notice about Equation (3) and
(4)
. First, in the reformulated
policy it is comparatively easy to enforce feasibility and to deal with combinatorial decision spaces: since this is done
outside of the ML model, any traditional constrained optimization method can be used (e.g. Mathematical Programming,
Constraint Programming, or the Alternating Direction Method of Multipliers).
Second, the reformulated training problem is:
arg min
θΘ
E(x+,x)p[f(x+, x, z)]
z=g(x, y) = arg min
z˜
C(x,y)
˜
f(x, y, z)
y=h(x, θ)
(5)
This is a bilevel problem, combining unconstrained optimization on
θ
and constrained optimization on
z
. Several
techniques are available to tackle Equation (5). For example, if
h
is differentiable, one may use the classical subgradient
method to reach a local optimum: in the “forward pass” we evaluate
h
and compute
z
, in the “backward pass” we fix
the value of zand differentiate hw.r.t. the parameters θ.
3
UNIFY
Third, since most bilevel optimization methods rely on separate steps to deal with
θ
and
z
, the optimization problem
that defines
g
can be built on the fly, adjusting the size of the decision vector
z
as needed (e.g. scaling the number of
possible transplants with the patient/donor pool).
Finally, the reformulated policy features a vector (the virtual parameters
y
) that is absent in the original formulation.
Choosing
y
is a design decision, and the trickiest step for applying our method. A conservative approach consists in: 1)
starting from a mathematical model of the original decision problem (e.g. routing, assignment, selection); 2) selecting a
set of parameters from the model to be used as y.
While this approach is viable, it misses a key opportunity: in fact, as the notation and terminology for
˜
C
and
˜
f
imply,
the feasible set and cost function of the inner optimization problem
g
may differ from the original ones. For example, it
is possible to introduce penalties or rewards, buffers on constraint satisfaction thresholds, or even to relax constraints
or to remove cost terms. This element of flexibility provides a new design handle that can be used to “partition” the
challenging aspects of the original problem into either
h
or
g
, depending on which of the two components is best suited
to deal with them. While this is a powerful new mechanism to tackle complex problems, it does require a degree of
creativity and expertise to be effectively used. We provide a few examples of how to do it in Section 5.
Sequential Formulation and RL
The training formulation from Equation (5) applies to single-stage problems, but
it can be extended to sequential decision-making. Formally, we can view the problem as a Markov Decision Process
hX, Z, p+, f, p1, γi
where
X
is the set of possible (observable) states and
Z
is the decision space,
p+
is the probability
distribution for state transitions,
f
is the cost function,
p1
is the probability distribution for the initial state, and
γ(0,1]
is a discount factor. At each step of the sequential process we will have access to a distinct observed state
xk
and we will
output a distinct decision vector
zk
; the value
p+(xk+1, xk, zk)
denotes the probability of reaching state
xk+1
when
applying decisions
zk
on state
xk
; the cost function
f
is the same as in Equation (2) and
(5)
. Within this framework, the
training problem becomes:
arg min
θΘ
Eτp"eoh
X
k=1
γkf(xk+1, xk, zk)#
zk=g(xk, yk) = arg min
z˜
C(xk,yk)
˜
f(xk, yk, zk)
yk=h(xk, θ)
(6)
where
yk
is the ML model output for step
k
, and
τ
refers to a trajectory, i.e. to a sequence of states, ML outputs, and
decisions
{(xk, yk, zk)}eoh
k=1
. The probability of a trajectory is given by
p(τ) = p1(x1)Qeoh
k=1 p+(xk+1, xk, zk)
. The
term
eoh
is the End Of Horizon, and it can be infinite if the sequence is not upper-bounded. In this case, the discount
factor γshould be strictly lower than 1, while γshould be equal to 1 for decision problems over a finite horizon.
Equation (6) can be interpreted as defining a Reinforcement Learning problem, where the
h
plays the role of a
conventional RL policy, and
g
can be viewed at training time as part of the environment. This is an important
observation since it implies that any RL approach can be used. Once an optimized parameter vector
θ
has been
obtained, the reformulated policy is given by
g(x, h(x, θ))
as usual. The sequential formulation directly generalizes
the single step one, meaning that RL algorithms can be used to tackle Equation (5). In fact, this is exactly what we do in
our experimentation since it simplifies the exposition.
4 Relation to Other Approaches
Decision Focused Learning (DFL)
Approaches in this class seek to train estimators for parameters of optimization
models, while explicitly accounting for the impact that estimation inaccuracy has on the decisions. The idea gained
attention after the seminal work by [
4
], with several new methods being proposed – and well covered in a recent survey
by [5].
While the original approach was limited to convex costs and constraints, subsequent works have tackled combinatorial
problems, either in an approximate fashion via continuous relaxations [
6
], or in an exact fashion by assuming a fixed
feasible space and linear costs [7, 8]. Outer and inner relaxations have also been used to improve scalability [9, 10].
DFL can be interpreted as solving (via subgradient optimization) a close variant of Equation (5), with some restrictions
and one generalization. In particular, all DFL approaches lack virtual parameters: the ML model is instead expected to
estimate part of the future state, i.e.
y
is a prediction for a portion of
x+
. For example, the estimator might look at
current traffic to predict future travel times. Additionally,
˜
f
always corresponds to the original decision problem cost,
and in all but a few cases the feasible space is fixed, i.e.
C(x, y) = C
. The connection between
y
and
x+
implies that
4
摘要:

UNIFY:AUNIFIEDPOLICYDESIGNINGFRAMEWORKFORSOLVINGCONSTRAINEDOPTIMIZATIONPROBLEMSWITHMACHINELEARNINGMattiaSilvestri,AllegraDeFilippo,MicheleLombardi,MichelaMilanoDipartimentodiInformatica-ScienzaeIngegneriaUniversityofBologna{mattia.silvestri4,allegra.defilippo,michele.lombardi2,michela.milano}@uni...

展开>> 收起<<
UNIFY AUNIFIED POLICY DESIGNING FRAMEWORK FOR SOLVING CONSTRAINED OPTIMIZATION PROBLEMS WITH MACHINE LEARNING.pdf

共17页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:17 页 大小:465.81KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 17
客服
关注