inventory). The two agents’ orders will arrive at the beginning of the next round. The optimal policy
with known demand distributions is known as the base-stock policy for both agents [Clark and Scarf,1960,
Federgruen and Zipkin,1984,Chen and Zheng,1994]. Specifically, a base-stock policy keeps a fixed base-stock
level
s
over all time periods, meaning that if the inventory level (on-hand inventory minus the backlogged
ordered) at the beginning of a period is below
s
, an order will be placed to bring the inventory level to
s
;
otherwise, no order is placed.
There are recently works extending the classic inventory control problem with known demand distribution
to the one with unknown distribution [Levi et al.,2007,Huh and Rusmevichientong,2009,Huh et al.,2011,
Levi et al.,2015,Zhang et al.,2018,Chen et al.,2020,2021,Chen and Chao,2020,Ding et al.,2021].
However, these works consider the single-agent case, instead of the two-echelon case. In this work, we aim at
extending the classic two-echelon stochastic inventory planning problem to an online setup with an unknown
demand distribution
D
. In addition, we consider the nonperishable setting in which any leftover inventory
will be carried over to the next round; as a result, the inventory level at the beginning of the next round can
not be lower than the inventory level at the end of current round. The performance is measured by i) regret,
the difference between their total loss and that of the best base-stock policy in hindsight; ii) last-iterate
convergence to the best base-stock policy for both agents.
It is important to note that Agent 2 only observes orders from Agent 1 and does not necessarily receive
the same demand information as Agent 1 does. In addition, in our problem formulation, Agent 1’s inventory
will be impacted by Agent 2’s shortages. Specifically, when Agent 2 does not have enough inventory to fill
Agent 1’s order, we assume that Agent 2 cannot expedite to meet the shortfall, and this shortfall will cause a
partial shipment to Agent 1, which implies that Agent 1 may not achieve his desired inventory level at the
beginning of each round. This model with a known demand distribution is first examined in [Cachon and
Zipkin,1999].
We consider two different decision-making settings: centralized and decentralized settings. The centralized
setting takes the perspective of a central planner who decides both agents’ desired inventory level at each
round in order to minimize the total loss of the entire supply chain. A more interesting and realistic setting
concerns a decentralized structure in which the two agents independently decide their own desired inventory
level at each round to minimize their own costs, which often results in poor performance of the supply chain
(i.e., the optimal base-stock level for each agent may not be the one that achieves minimal overall loss). To
achieve the optimal supply chain performance under the decentralized setting, as discussed in previous works
(i.e. [Cachon,2003]), some mechanism concerns contractual arrangement or corporate rules, such as rules
for sharing the holding costs and backorder costs, accounting methods, and/or operational constraints. A
contract transfers the loss between the two agents such that each agent’s objective is aligned with the supply
chain’s objective. However, as far as we know, this is only discussed under known demand distribution.
Thus, we extend the results to the online setting with an unknown demand distribution and design learning
algorithms to achieve the optimal supply chain performance.
1.1 Techniques and Results
Techniques. Our problem has three salient features that are different from the classic stochastic online
convex optimization problem. First, as will be shown in Section 3, the overall loss function is not convex
with respect to both agents’ inventory decisions, meaning that we can not directly apply online convex
optimization algorithms to this problem. Second, due to the multi-echelon nature of the supply chain, Agent
2’s input information is dependent on the information generated by Agent 1, which can be non-stochastic.
Third, in the nonperishable setting, each agent’s inventory level at the beginning of the next round can not
be lower than the inventory level at the end of the current round, which implies that the desired inventory
level may not be always achievable.
To address the first challenge, we introduce an augmented loss function upon which is convex and we are
able to perform online convex optimization algorithms. To address the second and the third challenge, our
algorithm for both agents has the low-switching property, which only updates the strategy
O
(
log T
)times.
This makes Agent 2’s input information almost the same as the realized demand at each round. For Agent 1,
as he can always observe the true realized demand at each round in both centralized and decentralized setting,
he makes his inventory decision based on the empirical demand distribution, which is updated
O
(
log T
)times
during the process. For Agent 2, in the centralized and the decentralized setting, our algorithm is a variant
2