No-Regret Learning in Two-Echelon Supply Chain with Unknown Demand Distribution Mengxiao Zhang

2025-05-02 0 0 1.68MB 31 页 10玖币

侵权投诉

No-Regret Learning in Two-Echelon Supply Chain

with Unknown Demand Distribution

Mengxiao Zhang

University of Southern California

mengxiao.zhang@usc.edu

Shi Chen

University of Washington

shichen@uw.edu

Haipeng Luo

University of Southern California

haipengl@usc.edu

Yingfei Wang

University of Washington

yingfei@uw.edu

October 24, 2023

Abstract

Supply chain management (SCM) has been recognized as an important discipline with applications to

many industries, where the two-echelon stochastic inventory model, involving one downstream retailer and

one upstream supplier, plays a fundamental role for developing ﬁrms’ SCM strategies. In this work, we

aim at designing online learning algorithms for this problem with an unknown demand distribution, which

brings distinct features as compared to classic online optimization problems. Speciﬁcally, we consider the

two-echelon supply chain model introduced in [Cachon and Zipkin,1999] under two diﬀerent settings: the

centralized setting, where a planner decides both agents’ strategy simultaneously, and the decentralized

setting, where two agents decide their strategy independently and selﬁshly. We design algorithms that

achieve favorable guarantees for both regret and convergence to the optimal inventory decision in both

settings, and additionally for individual regret in the decentralized setting. Our algorithms are based

on Online Gradient Descent and Online Newton Step, together with several new ingredients speciﬁcally

designed for our problem. We also implement our algorithms and show their empirical eﬀectiveness.

1 Introduction

A supply chain is two or more parties linked by a ﬂow of goods, information, and funds, before a product can

be ﬁnally delivered to outside customers. When multiple decision makers are involved, behavior that is locally

rational can be ineﬃcient from a global perspective. Supply chain management (SCM) research then focuses on

methods for improving system eﬃciencies, so as to “eﬃciently integrate suppliers, manufacturers, warehouses,

and stores

···

in order to minimize system-wide costs while satisfying service level requirements” [Simchi-Levi

et al.,1999]. In the vast body of SCM literature, the mathematical model of a two-echelon stochastic

inventory system with a known demand distribution plays a fundamental role for analyzing ﬁrms’ SCM

strategies and has been well studied over the past decades [Clark and Scarf,1960,Federgruen and Zipkin,

1984,Chen and Zheng,1994,Cachon and Zipkin,1999].

In the classic two-echelon stochastic inventory planning problem, two agents, Agent 1 (the retailer, referred

to as he) and Agent 2 (the supplier, referred to as she), will go through a process of

rounds. Following

the sequence of events in the SCM literature [Cachon and Zipkin,1999], Agent 1 ﬁrst observes an external

demand

dt∼ D

and utilizes his available inventory (products in stock) to satisfy customers’ demand; as

a result, Agent 1 suﬀers either an inventory holding cost (for excess inventory) or a backorder cost (for

excess demand). Then, Agent 1 decides his desired inventory level for round

+ 1 and orders from Agent 2.

Next, Agent 2 handles the order from Agent 1, suﬀers inventory holding costs or backorder costs, decides

her base-stock level for round

+ 1, and places an order from an external source (assumed to have inﬁnite

arXiv:2210.12663v2 [cs.LG] 20 Oct 2023

inventory). The two agents’ orders will arrive at the beginning of the next round. The optimal policy

with known demand distributions is known as the base-stock policy for both agents [Clark and Scarf,1960,

Federgruen and Zipkin,1984,Chen and Zheng,1994]. Speciﬁcally, a base-stock policy keeps a ﬁxed base-stock

level

over all time periods, meaning that if the inventory level (on-hand inventory minus the backlogged

ordered) at the beginning of a period is below

, an order will be placed to bring the inventory level to

;

otherwise, no order is placed.

There are recently works extending the classic inventory control problem with known demand distribution

to the one with unknown distribution [Levi et al.,2007,Huh and Rusmevichientong,2009,Huh et al.,2011,

Levi et al.,2015,Zhang et al.,2018,Chen et al.,2020,2021,Chen and Chao,2020,Ding et al.,2021].

However, these works consider the single-agent case, instead of the two-echelon case. In this work, we aim at

extending the classic two-echelon stochastic inventory planning problem to an online setup with an unknown

demand distribution

. In addition, we consider the nonperishable setting in which any leftover inventory

will be carried over to the next round; as a result, the inventory level at the beginning of the next round can

not be lower than the inventory level at the end of current round. The performance is measured by i) regret,

the diﬀerence between their total loss and that of the best base-stock policy in hindsight; ii) last-iterate

convergence to the best base-stock policy for both agents.

It is important to note that Agent 2 only observes orders from Agent 1 and does not necessarily receive

the same demand information as Agent 1 does. In addition, in our problem formulation, Agent 1’s inventory

will be impacted by Agent 2’s shortages. Speciﬁcally, when Agent 2 does not have enough inventory to ﬁll

Agent 1’s order, we assume that Agent 2 cannot expedite to meet the shortfall, and this shortfall will cause a

partial shipment to Agent 1, which implies that Agent 1 may not achieve his desired inventory level at the

beginning of each round. This model with a known demand distribution is ﬁrst examined in [Cachon and

Zipkin,1999].

We consider two diﬀerent decision-making settings: centralized and decentralized settings. The centralized

setting takes the perspective of a central planner who decides both agents’ desired inventory level at each

round in order to minimize the total loss of the entire supply chain. A more interesting and realistic setting

concerns a decentralized structure in which the two agents independently decide their own desired inventory

level at each round to minimize their own costs, which often results in poor performance of the supply chain

(i.e., the optimal base-stock level for each agent may not be the one that achieves minimal overall loss). To

achieve the optimal supply chain performance under the decentralized setting, as discussed in previous works

(i.e. [Cachon,2003]), some mechanism concerns contractual arrangement or corporate rules, such as rules

for sharing the holding costs and backorder costs, accounting methods, and/or operational constraints. A

contract transfers the loss between the two agents such that each agent’s objective is aligned with the supply

chain’s objective. However, as far as we know, this is only discussed under known demand distribution.

Thus, we extend the results to the online setting with an unknown demand distribution and design learning

algorithms to achieve the optimal supply chain performance.

1.1 Techniques and Results

Techniques. Our problem has three salient features that are diﬀerent from the classic stochastic online

convex optimization problem. First, as will be shown in Section 3, the overall loss function is not convex

with respect to both agents’ inventory decisions, meaning that we can not directly apply online convex

optimization algorithms to this problem. Second, due to the multi-echelon nature of the supply chain, Agent

2’s input information is dependent on the information generated by Agent 1, which can be non-stochastic.

Third, in the nonperishable setting, each agent’s inventory level at the beginning of the next round can not

be lower than the inventory level at the end of the current round, which implies that the desired inventory

level may not be always achievable.

To address the ﬁrst challenge, we introduce an augmented loss function upon which is convex and we are

able to perform online convex optimization algorithms. To address the second and the third challenge, our

algorithm for both agents has the low-switching property, which only updates the strategy

(

log T

)times.

This makes Agent 2’s input information almost the same as the realized demand at each round. For Agent 1,

as he can always observe the true realized demand at each round in both centralized and decentralized setting,

he makes his inventory decision based on the empirical demand distribution, which is updated

(

log T

)times

during the process. For Agent 2, in the centralized and the decentralized setting, our algorithm is a variant

Table 1: Summary of our results. “Centralized” and “Decentralized” represent the centralized and decentralized

settings, respectively. The deﬁnitions of

RegT

RegT,1

and

RegT,2

are introduced in Section 3. “Convergence

for Agent 1” and “Convergence for Agent 2” represent the convergence rate to Agent 1 and Agent 2’s optimal

inventory level, respectively.

Setting RegTRegT,1RegT,2Convergence for Agent 1 Convergence for Agent 2

Centralized e

O(√T)N/A N/A e

O(1/√T)e

O(T−1/4)

Decentralized e

O(√T)e

O(T3/4)e

O(log3T)e

O(1/√T)e

O(T−1/4)

of Online Gradient Descent (OGD) and Online Newton Step (ONS) [Hazan et al.,2007], respectively. Both

of the algorithms have the important low-switching property, which only updates the strategy

(

log T

)times

while at the same time achieving

(

√T

)and

(

log2T

)regret respectively. We remark that our variant of

ONS algorithm achieves

(

log2T

)regret, even when the loss function is not strongly convex but satisﬁes a

certain property.

Our results. In the centralized setting, we design an algorithm which achieves

(

√T

)regret and

last-iterate convergence to the optimal base-stock policy with rate

/√T

)for Agent 1and

(

T−1/4

)for

Agent 2. In the decentralized setting, we design a novel contract mechanism and also learning algorithms

for both agents, which lead to convergence to both agents’ global optimal base-stock policy with the same

rate as the centralized setting. In addition, our algorithm guarantees that Agent 1has

(

T3/4

)individual

regret and Agent 2has

(

log3T

)individual regret. Moreover, the regret with respect to the overall loss

is bounded by

(

√T

), which is the same as the one in the centralized setting. Table 1 shows a summary

of our results. We also implement our algorithms, and the empirical results validate the eﬀectiveness of

our algorithms (see Appendix D). To the best of our knowledge, our work is the ﬁrst one considering the

two-echelon stochastic inventory planning problem in the online setup with unknown demand distribution.

1.2 Related Works

There is a vast body of SCM literature on achieving the optimal supply chain performance in the decentralized

setting [Lariviere,1999,Tsay et al.,1999,Cachon,2003,Chen,2003] concerning coordination with contract

design and information sharing. In this body of literature, there is a line of works based on multi-echelon

decentralized inventory models, which are closely related to our study, including [Lee and Whang,1999,

Cachon and Zipkin,1999,Lee et al.,2000,Porteus,2000,Watson and Zheng,2005,Shang et al.,2009].

However, these works all assume that the demand distribution is known (at least to the downstream agent).

More recently, there has been growing interest in single-agent inventory control problems with unknown

demand distribution [Levi et al.,2007,Huh and Rusmevichientong,2009,Huh et al.,2011,Levi et al.,2015,

Zhang et al.,2018,Chen et al.,2020,2021,Chen and Chao,2020,Ding et al.,2021]. In particular, [Huh

and Rusmevichientong,2009] achieves

(

log T

)regret in the perishable setting and

(

√T

)regret in the

nonperishable setting using online gradient descent method. [Ding et al.,2021] further extends the results to

the feature-based setting. The nonparametric approach of this line of works is fundamentally diﬀerent from

the conventional inventory control models in which the inventory manager knows the demand distribution

(see, e.g., [Zipkin,2000,Snyder and Shen,2019] for comprehensive reviews of the conventional inventory

models); however, unlike the conventional inventory theory which has been extended from the single-echelon

problems to multi-echelon problems, little has been done for the multi-echelon problems under unknown

demand distributions, and we aim to ﬁll in this gap.

The other relevant line of works is online convex optimization. [Zinkevich,2003] shows that OGD

algorithm achieves

(

√T

)expected regret bound for general convex functions. If the loss functions are

exp-concave, [Hazan et al.,2007] shows that ONS achieves

(

log T

)expected regret bound. Both algorithms

change their decision at every round. On the other hand, Sherman and Koren [2021] proposes a lazy version

of OGD, which changes its decision only

(

log T

)times and still achieves

(

√T

)(or

(

log2T

)) regret when

the loss functions are stochastically generated and convex (or strongly convex). In our problem, it turns

out to be crucial to apply an algorithm with a small number of switches, and our algorithm generalizes the

idea of [Sherman and Koren,2021] to the ONS algorithm to achieve

(

log T

)switches and

(1) regret for a

larger class of functions including strong convex functions.

2 Preliminary

Notations. For a positive integer

, denote [

]to be the set

{

, . . . , n}

. For conciseness, we hide polynomial

dependence on the problem-dependent constants in the

(

)notation and only show the dependence on

the horizon

(

)further hides the poly-logarithmic dependency on

. Deﬁne (

)

+≜max{x,

}

and

(x)−≜max{−x, 0}.∥x∥denotes the Euclidean norm of x.

Throughout this work, we make the following two mild assumptions on the demand distribution

. These

mild assumptions are also made in [Chen et al.,2020].

Assumption 1. The demand distribution Dis supported on [d, D]where D > d > 0.

Assumption 2. The image of the density function of D,ϕ(·), lies in [γ, Γ] where Γ> γ > 0.

Under the above demand assumptions, we consider the following model in the two-echelon inventory

planning problem, which is ﬁrst considered in [Cachon and Zipkin,1999]. Our goal is to ﬁnd the best

base-stock policy.

We ﬁrst introduce the cost function under a ﬁxed base-stock policy. In this model, we assume that Agent

2’s inventory shortage will cause delayed (by one round) shipment and shortfalls at Agent 1 while Agent

2’s orders will always be satisﬁed as we assume that the external source has inﬁnite inventory. In addition,

for unﬁlled demand for Agent 1, there is a backorder cost shared by the two agents,

αp1

for Agent 1 and

−α

)

for Agent 2, where

is the negotiated cost sharing parameters via contractual arrangements. The

inventory holding cost per unit for Agent 1 and Agent 2 is h1and h2respectively.

Now we are ready to deﬁne the loss function for Agent 1 and Agent 2 respectively. Speciﬁcally, Agent 1’s

loss function is formulated as follows. Deﬁne

G1(y) = h1Ex∼D[(y−x)+] + αp1Ex∼D[(y−x)−],

which is Agent 1’s expected sum of the holding and backorder costs per round with unlimited supply under

base-stock level

. Since the actual supply to Agent 1 is limited by Agent 2’s available inventory, according

to [Cachon and Zipkin,1999], Agent 1’s expected sum of the holding and backorder costs per period is deﬁned

H1(s1, s2)≜Φ(s2)G1(s1) + ZD

G1(s1+s2−x)ϕ(x)dx,

where Φ(

)is the cumulative density function of

. The ﬁrst term is Agent 1’s costs when Agent 2 has

suﬃcient inventory to satisfy Agent 1’s order (i.e., Agent 1’s inventory level can be brought up to

), while

the second term is the cost when Agent 2 does not have enough inventory to satisfy Agent 1’s order, meaning

that Agent 2’s shortfall is x−s2and Agent 1’s inventory can only be brought up to s1+s2−x.

For Agent 2, deﬁne

G2(y) = (1 −α)p1Ex∼D (y−x)−,

which is the expected backorder cost per period incurred by Agent 2 due to Agent 1’s shortages. Then, the

expected backorder cost incurred by Agent 2 is

Φ(s2)G2(s1) + ZD

G2(s1+s2−x)ϕ(x)dx.

The ﬁrst term is the backorder cost incurred by Agent 2 due to Agent 1’s shortfalls when the Agent 1’s

inventory level is

, while the second term is the backorder cost incurred by Agent 2 when Agent 1’s inventory

level is

s1−

(

x−s2

)

< s1

. As can be seen, Agent 2’s shortages (

x−s2

)will cause insuﬃcient supply to Agent

1, which, in turn, will be detrimental to Agent 2 herself when Agent 1 is out of stock due to the insuﬃcient

supply. Therefore, Agent 2’s loss function is the sum of the expected backorder cost and the expected holding

cost, which is deﬁned as

H2(s1, s2)≜h2Ex∼D[(s2−x)+] + Φ(s2)G2(s1) + ZD

G2(s1+s2−x)ϕ(x)dx.

We also deﬁne the sum of both agents loss as

(

s1, s2

)

≜H1

(

s1, s2

) +

(

s1, s2

)and

(

)

≜G1

(

) +

(

Online Inventory Control In this work, we study this conventional model in an online learning setting

that proceeds in

rounds. Before the game starts, both Agent 1and Agent 2order an initial inventory level

s1,1and s1,2. Then, for each round t∈[T]:

•

at the start of round

, both agents’ orders arrive. The current inventory level for Agent 1and Agent 2

reaches to bst,1and bst,2;

•

external demand

occurs at Agent 1’s level where

is drawn from the unknown demand distribution

. In this step, Agent 1 suﬀers from some inventory holding cost or backorder cost. Deﬁne the inventory

level for Agent 1 after demand as est,1. This value can be negative as we assume backlogged orders;

•

Agent 1decides his desired inventory level at the next round

st+1,1

, which leads to a demand for Agent

2: ot= (st+1,1−est,1)+;

•

Agent 2 receives the demand

from Agent 1, and the inventory level after demand is

est,2

. Note that,

in general, Agent 2 only knows otinstead of the real demand dt. Agent 2 then suﬀers some inventory

holding cost or backorder cost;

•

Agent 2 decides her desired inventory level for the next round

st+1,2

and orders

o′

= (

st+1,2−est,2

)

from some external source.

We remark that the dynamic of the inventory for Agent 1 and Agent 2 are diﬀerent. As we assume that

the external source has inﬁnite inventory, Agent 2’s order can always be satisﬁed and we have the following

dynamic for est,2and bst+1,2:

est,2=bst,2−ot,bst+1,2=est,2+o′

t.(1)

However, as Agent 2 may have delayed shipment when she does not have enough inventory, Agent 1’s dynamic

is deﬁned as follows. Deﬁne the delayed shipment of Agent 2 as (

ot−1−bst−1,2

)

, which will arrive after

Agent 1 has served the demand dt. This means that

est,1=bst,1−dt+ (ot−1−bst−1,2)+,

bst+1,1=est,1+ min{bst,2, ot}.

The speciﬁc costs suﬀered by the two agents in each step, as well as their objectives will be discussed in

detail in Section 3.2.

3 Main Results

3.1 Centralized Setting

In this section, we start from considering the centralized setting of our model where there is a central planner

who decides both agents’ strategy simultaneously. Deﬁne the loss suﬀered by the learner at round

as follows:

Ht=h1(bst,1−dt)++p1(bst,1−dt)−+h2(bst,2−ot)+,

and the benchmark as the expected loss suﬀered by the best base-stock policy:

(

s∗

1, s∗

)where (

s∗

1, s∗

) =

argmins1,s2H

(

s1, s2

). The regret is deﬁned as the diﬀerence between the sum of the learners’ total loss and

the loss of the best base-stock policy, which is formally written as follows:

E[RegT] = E"T

t=1 e

Ht−H(s∗

1, s∗

2)#.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

No-RegretLearninginTwo-EchelonSupplyChainwithUnknownDemandDistributionMengxiaoZhangUniversityofSouthernCaliforniamengxiao.zhang@usc.eduShiChenUniversityofWashingtonshichen@uw.eduHaipengLuoUniversityofSouthernCaliforniahaipengl@usc.eduYingfeiWangUniversityofWashingtonyingfei@uw.eduOctober24,2023Abstr...

展开>> 收起<<

No-Regret Learning in Two-Echelon Supply Chain with Unknown Demand Distribution Mengxiao Zhang.pdf

共31页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

No-Regret Learning in Two-Echelon Supply Chain with Unknown Demand Distribution Mengxiao Zhang

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: