No-Regret Learning in Two-Echelon Supply Chain with Unknown Demand Distribution Mengxiao Zhang

2025-05-02 0 0 1.68MB 31 页 10玖币
侵权投诉
No-Regret Learning in Two-Echelon Supply Chain
with Unknown Demand Distribution
Mengxiao Zhang
University of Southern California
mengxiao.zhang@usc.edu
Shi Chen
University of Washington
shichen@uw.edu
Haipeng Luo
University of Southern California
haipengl@usc.edu
Yingfei Wang
University of Washington
yingfei@uw.edu
October 24, 2023
Abstract
Supply chain management (SCM) has been recognized as an important discipline with applications to
many industries, where the two-echelon stochastic inventory model, involving one downstream retailer and
one upstream supplier, plays a fundamental role for developing firms’ SCM strategies. In this work, we
aim at designing online learning algorithms for this problem with an unknown demand distribution, which
brings distinct features as compared to classic online optimization problems. Specifically, we consider the
two-echelon supply chain model introduced in [Cachon and Zipkin,1999] under two different settings: the
centralized setting, where a planner decides both agents’ strategy simultaneously, and the decentralized
setting, where two agents decide their strategy independently and selfishly. We design algorithms that
achieve favorable guarantees for both regret and convergence to the optimal inventory decision in both
settings, and additionally for individual regret in the decentralized setting. Our algorithms are based
on Online Gradient Descent and Online Newton Step, together with several new ingredients specifically
designed for our problem. We also implement our algorithms and show their empirical effectiveness.
1 Introduction
A supply chain is two or more parties linked by a flow of goods, information, and funds, before a product can
be finally delivered to outside customers. When multiple decision makers are involved, behavior that is locally
rational can be inefficient from a global perspective. Supply chain management (SCM) research then focuses on
methods for improving system efficiencies, so as to “efficiently integrate suppliers, manufacturers, warehouses,
and stores
···
in order to minimize system-wide costs while satisfying service level requirements” [Simchi-Levi
et al.,1999]. In the vast body of SCM literature, the mathematical model of a two-echelon stochastic
inventory system with a known demand distribution plays a fundamental role for analyzing firms’ SCM
strategies and has been well studied over the past decades [Clark and Scarf,1960,Federgruen and Zipkin,
1984,Chen and Zheng,1994,Cachon and Zipkin,1999].
In the classic two-echelon stochastic inventory planning problem, two agents, Agent 1 (the retailer, referred
to as he) and Agent 2 (the supplier, referred to as she), will go through a process of
T
rounds. Following
the sequence of events in the SCM literature [Cachon and Zipkin,1999], Agent 1 first observes an external
demand
dt∼ D
and utilizes his available inventory (products in stock) to satisfy customers’ demand; as
a result, Agent 1 suffers either an inventory holding cost (for excess inventory) or a backorder cost (for
excess demand). Then, Agent 1 decides his desired inventory level for round
t
+ 1 and orders from Agent 2.
Next, Agent 2 handles the order from Agent 1, suffers inventory holding costs or backorder costs, decides
her base-stock level for round
t
+ 1, and places an order from an external source (assumed to have infinite
1
arXiv:2210.12663v2 [cs.LG] 20 Oct 2023
inventory). The two agents’ orders will arrive at the beginning of the next round. The optimal policy
with known demand distributions is known as the base-stock policy for both agents [Clark and Scarf,1960,
Federgruen and Zipkin,1984,Chen and Zheng,1994]. Specifically, a base-stock policy keeps a fixed base-stock
level
s
over all time periods, meaning that if the inventory level (on-hand inventory minus the backlogged
ordered) at the beginning of a period is below
s
, an order will be placed to bring the inventory level to
s
;
otherwise, no order is placed.
There are recently works extending the classic inventory control problem with known demand distribution
to the one with unknown distribution [Levi et al.,2007,Huh and Rusmevichientong,2009,Huh et al.,2011,
Levi et al.,2015,Zhang et al.,2018,Chen et al.,2020,2021,Chen and Chao,2020,Ding et al.,2021].
However, these works consider the single-agent case, instead of the two-echelon case. In this work, we aim at
extending the classic two-echelon stochastic inventory planning problem to an online setup with an unknown
demand distribution
D
. In addition, we consider the nonperishable setting in which any leftover inventory
will be carried over to the next round; as a result, the inventory level at the beginning of the next round can
not be lower than the inventory level at the end of current round. The performance is measured by i) regret,
the difference between their total loss and that of the best base-stock policy in hindsight; ii) last-iterate
convergence to the best base-stock policy for both agents.
It is important to note that Agent 2 only observes orders from Agent 1 and does not necessarily receive
the same demand information as Agent 1 does. In addition, in our problem formulation, Agent 1’s inventory
will be impacted by Agent 2’s shortages. Specifically, when Agent 2 does not have enough inventory to fill
Agent 1’s order, we assume that Agent 2 cannot expedite to meet the shortfall, and this shortfall will cause a
partial shipment to Agent 1, which implies that Agent 1 may not achieve his desired inventory level at the
beginning of each round. This model with a known demand distribution is first examined in [Cachon and
Zipkin,1999].
We consider two different decision-making settings: centralized and decentralized settings. The centralized
setting takes the perspective of a central planner who decides both agents’ desired inventory level at each
round in order to minimize the total loss of the entire supply chain. A more interesting and realistic setting
concerns a decentralized structure in which the two agents independently decide their own desired inventory
level at each round to minimize their own costs, which often results in poor performance of the supply chain
(i.e., the optimal base-stock level for each agent may not be the one that achieves minimal overall loss). To
achieve the optimal supply chain performance under the decentralized setting, as discussed in previous works
(i.e. [Cachon,2003]), some mechanism concerns contractual arrangement or corporate rules, such as rules
for sharing the holding costs and backorder costs, accounting methods, and/or operational constraints. A
contract transfers the loss between the two agents such that each agent’s objective is aligned with the supply
chain’s objective. However, as far as we know, this is only discussed under known demand distribution.
Thus, we extend the results to the online setting with an unknown demand distribution and design learning
algorithms to achieve the optimal supply chain performance.
1.1 Techniques and Results
Techniques. Our problem has three salient features that are different from the classic stochastic online
convex optimization problem. First, as will be shown in Section 3, the overall loss function is not convex
with respect to both agents’ inventory decisions, meaning that we can not directly apply online convex
optimization algorithms to this problem. Second, due to the multi-echelon nature of the supply chain, Agent
2’s input information is dependent on the information generated by Agent 1, which can be non-stochastic.
Third, in the nonperishable setting, each agent’s inventory level at the beginning of the next round can not
be lower than the inventory level at the end of the current round, which implies that the desired inventory
level may not be always achievable.
To address the first challenge, we introduce an augmented loss function upon which is convex and we are
able to perform online convex optimization algorithms. To address the second and the third challenge, our
algorithm for both agents has the low-switching property, which only updates the strategy
O
(
log T
)times.
This makes Agent 2’s input information almost the same as the realized demand at each round. For Agent 1,
as he can always observe the true realized demand at each round in both centralized and decentralized setting,
he makes his inventory decision based on the empirical demand distribution, which is updated
O
(
log T
)times
during the process. For Agent 2, in the centralized and the decentralized setting, our algorithm is a variant
2
Table 1: Summary of our results. “Centralized” and “Decentralized” represent the centralized and decentralized
settings, respectively. The definitions of
RegT
,
RegT,1
and
RegT,2
are introduced in Section 3. “Convergence
for Agent 1” and “Convergence for Agent 2” represent the convergence rate to Agent 1 and Agent 2’s optimal
inventory level, respectively.
Setting RegTRegT,1RegT,2Convergence for Agent 1 Convergence for Agent 2
Centralized e
O(T)N/A N/A e
O(1/T)e
O(T1/4)
Decentralized e
O(T)e
O(T3/4)e
O(log3T)e
O(1/T)e
O(T1/4)
of Online Gradient Descent (OGD) and Online Newton Step (ONS) [Hazan et al.,2007], respectively. Both
of the algorithms have the important low-switching property, which only updates the strategy
O
(
log T
)times
while at the same time achieving
e
O
(
T
)and
O
(
log2T
)regret respectively. We remark that our variant of
ONS algorithm achieves
O
(
log2T
)regret, even when the loss function is not strongly convex but satisfies a
certain property.
Our results. In the centralized setting, we design an algorithm which achieves
e
O
(
T
)regret and
last-iterate convergence to the optimal base-stock policy with rate
e
O
(1
/T
)for Agent 1and
e
O
(
T1/4
)for
Agent 2. In the decentralized setting, we design a novel contract mechanism and also learning algorithms
for both agents, which lead to convergence to both agents’ global optimal base-stock policy with the same
rate as the centralized setting. In addition, our algorithm guarantees that Agent 1has
e
O
(
T3/4
)individual
regret and Agent 2has
O
(
log3T
)individual regret. Moreover, the regret with respect to the overall loss
is bounded by
e
O
(
T
), which is the same as the one in the centralized setting. Table 1 shows a summary
of our results. We also implement our algorithms, and the empirical results validate the effectiveness of
our algorithms (see Appendix D). To the best of our knowledge, our work is the first one considering the
two-echelon stochastic inventory planning problem in the online setup with unknown demand distribution.
1.2 Related Works
There is a vast body of SCM literature on achieving the optimal supply chain performance in the decentralized
setting [Lariviere,1999,Tsay et al.,1999,Cachon,2003,Chen,2003] concerning coordination with contract
design and information sharing. In this body of literature, there is a line of works based on multi-echelon
decentralized inventory models, which are closely related to our study, including [Lee and Whang,1999,
Cachon and Zipkin,1999,Lee et al.,2000,Porteus,2000,Watson and Zheng,2005,Shang et al.,2009].
However, these works all assume that the demand distribution is known (at least to the downstream agent).
More recently, there has been growing interest in single-agent inventory control problems with unknown
demand distribution [Levi et al.,2007,Huh and Rusmevichientong,2009,Huh et al.,2011,Levi et al.,2015,
Zhang et al.,2018,Chen et al.,2020,2021,Chen and Chao,2020,Ding et al.,2021]. In particular, [Huh
and Rusmevichientong,2009] achieves
O
(
log T
)regret in the perishable setting and
e
O
(
T
)regret in the
nonperishable setting using online gradient descent method. [Ding et al.,2021] further extends the results to
the feature-based setting. The nonparametric approach of this line of works is fundamentally different from
the conventional inventory control models in which the inventory manager knows the demand distribution
(see, e.g., [Zipkin,2000,Snyder and Shen,2019] for comprehensive reviews of the conventional inventory
models); however, unlike the conventional inventory theory which has been extended from the single-echelon
problems to multi-echelon problems, little has been done for the multi-echelon problems under unknown
demand distributions, and we aim to fill in this gap.
The other relevant line of works is online convex optimization. [Zinkevich,2003] shows that OGD
algorithm achieves
O
(
T
)expected regret bound for general convex functions. If the loss functions are
exp-concave, [Hazan et al.,2007] shows that ONS achieves
O
(
log T
)expected regret bound. Both algorithms
change their decision at every round. On the other hand, Sherman and Koren [2021] proposes a lazy version
of OGD, which changes its decision only
O
(
log T
)times and still achieves
e
O
(
T
)(or
O
(
log2T
)) regret when
the loss functions are stochastically generated and convex (or strongly convex). In our problem, it turns
out to be crucial to apply an algorithm with a small number of switches, and our algorithm generalizes the
3
idea of [Sherman and Koren,2021] to the ONS algorithm to achieve
O
(
log T
)switches and
e
O
(1) regret for a
larger class of functions including strong convex functions.
2 Preliminary
Notations. For a positive integer
n
, denote [
n
]to be the set
{
1
,
2
, . . . , n}
. For conciseness, we hide polynomial
dependence on the problem-dependent constants in the
O
(
·
)notation and only show the dependence on
the horizon
T
.
e
O
(
·
)further hides the poly-logarithmic dependency on
T
. Define (
x
)
+max{x,
0
}
and
(x)max{−x, 0}.xdenotes the Euclidean norm of x.
Throughout this work, we make the following two mild assumptions on the demand distribution
D
. These
mild assumptions are also made in [Chen et al.,2020].
Assumption 1. The demand distribution Dis supported on [d, D]where D > d > 0.
Assumption 2. The image of the density function of D,ϕ(·), lies in [γ, Γ] where Γ> γ > 0.
Under the above demand assumptions, we consider the following model in the two-echelon inventory
planning problem, which is first considered in [Cachon and Zipkin,1999]. Our goal is to find the best
base-stock policy.
We first introduce the cost function under a fixed base-stock policy. In this model, we assume that Agent
2’s inventory shortage will cause delayed (by one round) shipment and shortfalls at Agent 1 while Agent
2’s orders will always be satisfied as we assume that the external source has infinite inventory. In addition,
for unfilled demand for Agent 1, there is a backorder cost shared by the two agents,
αp1
for Agent 1 and
(1
α
)
p1
for Agent 2, where
α
is the negotiated cost sharing parameters via contractual arrangements. The
inventory holding cost per unit for Agent 1 and Agent 2 is h1and h2respectively.
Now we are ready to define the loss function for Agent 1 and Agent 2 respectively. Specifically, Agent 1’s
loss function is formulated as follows. Define
G1(y) = h1Ex∼D[(yx)+] + αp1Ex∼D[(yx)],
which is Agent 1’s expected sum of the holding and backorder costs per round with unlimited supply under
base-stock level
y
. Since the actual supply to Agent 1 is limited by Agent 2’s available inventory, according
to [Cachon and Zipkin,1999], Agent 1’s expected sum of the holding and backorder costs per period is defined
as
H1(s1, s2)Φ(s2)G1(s1) + ZD
s2
G1(s1+s2x)ϕ(x)dx,
where Φ(
·
)is the cumulative density function of
D
. The first term is Agent 1’s costs when Agent 2 has
sufficient inventory to satisfy Agent 1’s order (i.e., Agent 1’s inventory level can be brought up to
s1
), while
the second term is the cost when Agent 2 does not have enough inventory to satisfy Agent 1’s order, meaning
that Agent 2’s shortfall is xs2and Agent 1’s inventory can only be brought up to s1+s2x.
For Agent 2, define
G2(y) = (1 α)p1Ex∼D (yx),
which is the expected backorder cost per period incurred by Agent 2 due to Agent 1’s shortages. Then, the
expected backorder cost incurred by Agent 2 is
Φ(s2)G2(s1) + ZD
s2
G2(s1+s2x)ϕ(x)dx.
The first term is the backorder cost incurred by Agent 2 due to Agent 1’s shortfalls when the Agent 1’s
inventory level is
s1
, while the second term is the backorder cost incurred by Agent 2 when Agent 1’s inventory
level is
s1
(
xs2
)
< s1
. As can be seen, Agent 2’s shortages (
xs2
)will cause insufficient supply to Agent
1, which, in turn, will be detrimental to Agent 2 herself when Agent 1 is out of stock due to the insufficient
4
supply. Therefore, Agent 2’s loss function is the sum of the expected backorder cost and the expected holding
cost, which is defined as
H2(s1, s2)h2Ex∼D[(s2x)+] + Φ(s2)G2(s1) + ZD
s2
G2(s1+s2x)ϕ(x)dx.
We also define the sum of both agents loss as
H
(
s1, s2
)
H1
(
s1, s2
) +
H2
(
s1, s2
)and
G
(
s
)
G1
(
s
) +
G2
(
s
).
Online Inventory Control In this work, we study this conventional model in an online learning setting
that proceeds in
T
rounds. Before the game starts, both Agent 1and Agent 2order an initial inventory level
s1,1and s1,2. Then, for each round t[T]:
at the start of round
t
, both agents’ orders arrive. The current inventory level for Agent 1and Agent 2
reaches to bst,1and bst,2;
external demand
dt
occurs at Agent 1’s level where
dt
is drawn from the unknown demand distribution
D
. In this step, Agent 1 suffers from some inventory holding cost or backorder cost. Define the inventory
level for Agent 1 after demand as est,1. This value can be negative as we assume backlogged orders;
Agent 1decides his desired inventory level at the next round
st+1,1
, which leads to a demand for Agent
2: ot= (st+1,1est,1)+;
Agent 2 receives the demand
ot
from Agent 1, and the inventory level after demand is
est,2
. Note that,
in general, Agent 2 only knows otinstead of the real demand dt. Agent 2 then suffers some inventory
holding cost or backorder cost;
Agent 2 decides her desired inventory level for the next round
st+1,2
and orders
o
t
= (
st+1,2est,2
)
+
from some external source.
We remark that the dynamic of the inventory for Agent 1 and Agent 2 are different. As we assume that
the external source has infinite inventory, Agent 2’s order can always be satisfied and we have the following
dynamic for est,2and bst+1,2:
est,2=bst,2ot,bst+1,2=est,2+o
t.(1)
However, as Agent 2 may have delayed shipment when she does not have enough inventory, Agent 1’s dynamic
is defined as follows. Define the delayed shipment of Agent 2 as (
ot1bst1,2
)
+
, which will arrive after
Agent 1 has served the demand dt. This means that
est,1=bst,1dt+ (ot1bst1,2)+,
bst+1,1=est,1+ min{bst,2, ot}.
The specific costs suffered by the two agents in each step, as well as their objectives will be discussed in
detail in Section 3.2.
3 Main Results
3.1 Centralized Setting
In this section, we start from considering the centralized setting of our model where there is a central planner
who decides both agents’ strategy simultaneously. Define the loss suffered by the learner at round
t
as follows:
e
Ht=h1(bst,1dt)++p1(bst,1dt)+h2(bst,2ot)+,
and the benchmark as the expected loss suffered by the best base-stock policy:
H
(
s
1, s
2
)where (
s
1, s
2
) =
argmins1,s2H
(
s1, s2
). The regret is defined as the difference between the sum of the learners’ total loss and
the loss of the best base-stock policy, which is formally written as follows:
E[RegT] = E"T
X
t=1 e
HtH(s
1, s
2)#.
5
摘要:

No-RegretLearninginTwo-EchelonSupplyChainwithUnknownDemandDistributionMengxiaoZhangUniversityofSouthernCaliforniamengxiao.zhang@usc.eduShiChenUniversityofWashingtonshichen@uw.eduHaipengLuoUniversityofSouthernCaliforniahaipengl@usc.eduYingfeiWangUniversityofWashingtonyingfei@uw.eduOctober24,2023Abstr...

展开>> 收起<<
No-Regret Learning in Two-Echelon Supply Chain with Unknown Demand Distribution Mengxiao Zhang.pdf

共31页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:31 页 大小:1.68MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 31
客服
关注