Online Resource Allocation with Samples Negin Golrezaei Sloan School of Management Massachusetts Institute of Technology golrezaeimit.edu

2025-05-02 0 0 4.41MB 75 页 10玖币
侵权投诉
Online Resource Allocation with Samples
Negin Golrezaei
Sloan School of Management, Massachusetts Institute of Technology, golrezaei@mit.edu
Patrick Jaillet
Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, jaillet@mit.edu
Zijie Zhou
Operations Research Center, Massachusetts Institute of Technology, Cambridge, MA, 02139, zhou98@mit.edu
We study an online resource allocation problem under uncertainty about demand and about the reward of
each type of demand (agents) for the resource. Even though dealing with demand uncertainty in resource
allocation problems has been the topic of many papers in the literature, the challenge of not knowing
rewards has been barely explored. The lack of knowledge about agents’ rewards is inspired by the problem of
allocating units of a new resource (e.g., newly developed vaccines or drugs) with unknown effectiveness/value.
For such settings, we assume that we can test the market before the allocation period starts. During the
test period, we sample each agent in the market with probability p. We study how to optimally exploit
the sample information in our online resource allocation problem under adversarial arrival processes. We
present an asymptotically optimal algorithm that achieves 1 Θ(1/(pm)) competitive ratio, where mis the
number of available units of the resource. By characterizing an upper bound on the competitive ratio of any
randomized and deterministic algorithm, we show that our competitive ratio of 1 Θ(1/(pm)) is tight for
any p=ω(1/m). That asymptotic optimality is possible with sample information highlights the significant
advantage of running a test period for new resources. We demonstrate the efficacy of our proposed algorithm
using a dataset that contains the number of COVID-19 related hospitalized patients across different age
groups.
Key words : online resource allocation, sample information, new resources, competitive ratio, COVID-19
1. Introduction
In online resource allocation problems, the goal is to allocate a limited number of a given
resource to heterogeneous demand/agents that arrive over time. These problems are noto-
riously challenging mainly because of the demand uncertainty and scarcity of the resource.
Such problems get even more challenging for newly developed resources (e.g., new drugs,
products, and services). For such a resource, the effectiveness/value of the resource for
different types of agents may not be fully known. To overcome this additional challenge,
businesses, for example, offer free products in an exchange for honest feedback (produc-
treviewmom.com 2022), and pharmaceutical companies test potential treatments/drugs in
1
arXiv:2210.04774v1 [math.OC] 10 Oct 2022
2Golrezaei, Jaillet, and Zhou: Online Resource Allocation with Samples
human volunteers (Pfizer 2022). These practices raise the following key question: can and
to what extent such feedback improve the efficiency of online resource allocation?
To answer this question, we consider a decision-maker who aims to allocate her available
resource to two types of unit-demand agents with unknown (expected) rewards, where type
1 has a higher expected reward than type 2.1The total number of agents (i.e., the market
size), as well as, the number of agents of type 1 and 2, denoted by hand `respectively,
are chosen adversarially, and hence are unknown to the decision-maker. Before the alloca-
tion period starts, the decision-maker tests the market by, for example, making a public
announcement and offering resources for free. We assume that with probability p(0,1),
each of the h+`agents sees and reacts to the announcement,2and gets one unit of the
resource, where we assume that pis known to the decision-maker. (We will discuss this
assumption later in this section.) These agents then provide feedback on their realized
reward for the resource in return. That is, we assume that all the agents in the test period
will get a resource. As we will show in Section 7.1, this assumption can be relaxed by
limiting the number of available resources during the test period. The test procedure sup-
plies the decision-maker with some information about agents’ expected rewards, as well
as, the size of the market for each type of agent. We refer to this information as sample
information.
After the test period ends, the remaining agents arrive over time according to an
adversarially-chosen order. For each arriving agent, the decision-maker has to make an
irrevocable decision about accepting him and allocating him one unit of the resource or
rejecting him. The decision-maker who has munits of the resource when the allocation
period starts makes acceptance/rejection decisions while being uncertain about the num-
ber, type of future agents, and their expected rewards. The decision-maker is also uncertain
about which types of agents earn higher rewards upon receiving the resource. For such
a demanding setting, our goal is to design efficient resource allocation algorithms that
can optimally utilize the sample information under any possible arrival sequence. In other
words, we measure the performance of the algorithm in terms of its competitive ratio,
which is the expected ratio of the reward of the algorithm to the reward of the optimal
1In Section 7.4, we discuss the case of having more than two types of agents.
2In Section 7.3, we allow the probability pto be different for type 1 and 2 agents.
Golrezaei, Jaillet, and Zhou: Online Resource Allocation with Samples 3
clairvoyant solution that knows the arrival sequence and the expected reward of agents in
advance; see Section 2 for the model and the formal definition of the competitive ratio.
Before presenting our contributions, we make two remarks about our model. First, our
model bears resemblance to the proposed models in Correa et al. (2021), Kaplan et al.
(2022) for secretary and online bipartite matching problems, respectively. In Correa et al.
(2021), each of the secretaries is placed in a sample set with probability p, where the value
of the sampled secretaries will be disclosed to the decision-maker before the decision period
starts. In Kaplan et al. (2022) that studies an online bipartite matching problem, each
agent independently will be placed in a sample set with probability p. While at a high
level, these works seek to design algorithms that can take advantage of samples, the nature
of their considered problems is different from ours, preventing us to use their designed
algorithms for our setting. We discuss the details in Section 1.2.
Second, our model is a special case of the single-leg revenue management problem, which
has been widely studied in the literature; see, e.g., Littlewood (1972), Amaruchkul et al.
(2007), Ball and Queyranne (2009), Gallego et al. (2009), Ferreira et al. (2018), Jasin
(2015), Hwang et al. (2021), Golrezaei and Yao (2021). In all of these aforementioned
works, while the decision-maker may be uncertain about the demand (i.e., the number and
the order of the arrivals), the obtained reward of different types of demand upon receiving
the resource is fully known to the decision-maker. This is in sharp contrast with our model
in which these rewards are not known to the decision-maker, adding extra complexity
to our problem. (See also Section 1.2 for a discussion about previous works on revenue
management problems with limited demand information, but full knowledge of rewards.)
1.1. Our Contributions
In addition to our modeling contribution, our work makes the following contributions.
Impossibility results. To shed light on the value of sample information, in Section 3,
we consider alternative scenarios under which either the sample information is not available
or the sample information is not very informative due to the lack of some additional
knowledge (e.g., the sampling probability). In all of the considered scenarios, we show
that it is not possible to design asymptotically optimal algorithms whose competitive ratio
goes to one as the number of resources mincreases. While in the first scenario, no sample
information is not available (i.e., the sampling probability p= 0), in the second scenario,
the sample information is available but the sampling probability pis not known to the
4Golrezaei, Jaillet, and Zhou: Online Resource Allocation with Samples
decision-maker. The result for the second scenario justifies our assumption about knowing
the sampling probability; see (Correa et al. 2021) for a similar assumption. Nonetheless,
this assumption can be also justified by the fact that the outreach program is designed
by the decision-maker herself, and hence pcan be well estimated using historical data
for similar outreach programs. See also our case study in Section 6 where we examine
the robustness of our proposed algorithm (which we will present shortly) to the lack of
knowledge of the sampling probability.3
Asymptotically optimal protection level algorithm. In light of our impossibility
results in Section 3, we design a simple, yet effective protection level algorithm that opti-
mally utilizes the sample information; see Algorithm 1 in Section 4. The algorithm uses the
sample information to obtain an estimate of the expected reward of each type of agents. If
the estimated reward of type 1 is greater than that of type 2, the algorithm protects type
1 agents, otherwise type 2 agents will be protected. In each of these cases, the algorithm
uses the sample information to estimate the protection level for the type that has a higher
estimated average reward.
In Theorem 1, we present the competitive ratio of our proposed algorithm for any finite
value of m. In addition, in Proposition 2, we show that our algorithm is asymptotically
optimal as mgoes to infinity. More precisely, we show that for any p=ω(1/m)4(which
includes constant values for pthat does not scale with m), the asymptotic competitive ratio
of our algorithm is on the order of 1 Θ(1/(pm)).5As we show in Section 7.1, the same
asymptotic competitive ratio continues to hold even when there is a capacity constraint
during the test period. (There we show that when the number of resources during the test
period is in the order of ω(m), the asymptotic competitive ratio of (a slightly modified
version of) Algorithm 1 is in the order of 1 Θ(1/(pm)).)
3In Appendix K, using numerical studies and under adversarial arrival processes, we further test the robustness of
our proposed algorithm to the lack of knowledge of p. There we show that when the estimation error in pis 10%, the
competitive ratio decreases by at most 6%. Similar results are obtained when the estimation error in pis 20%.
4We scale pwith mbecause of the cost of testing the market. With a large value of m, if the market size turns out
to be large too (e.g., proportional to m), the decision-maker will incur a large cost during the test period, justifying
our choice of p=ω(1/m). Nonetheless, our results hold for constant p’s, and theoretically speaking, handling the
case of p=ω(1/m) is more challenging than that of constant p’s. This is because, under constant p’s, the sample
information is more informative, easing the analysis of the algorithm.
5For the case of p=O(1
m) where pgoes to zero very fast as mgrows, in Section J, we design another algorithm whose
competitive ratio is 1/2. We note that our upper bound in Section 5.2 confirms the challenges of such an extreme case
for the sampling probability. As we show there, when pgoes to zero very fast (i.e., p=O(1/m)), obtaining asymptotic
optimality is impossible.
Golrezaei, Jaillet, and Zhou: Online Resource Allocation with Samples 5
This result shows that the sample information can be extremely useful to improve the
performance of resource allocation algorithms. This is because in the absence of the sample
information, and even when the expected rewards of agents are fully known, as shown in
Ball and Queyranne (2009), the competitive ratio of any algorithm cannot exceed 1/(2
r2/r1) no matter how large mis, where r1> r2are the expected rewards for the the two
types of agents. Here, we show that we can break the barrier of 1/(2 r2/r1) by taking
advantage of the sample information in a very challenging setting, where the expected
reward of agents is not known to the decision-maker. This is mainly because the sample
information can be used to infer some knowledge about the number of agents of different
types in the online arrival sequence, gaining some knowledge about the adversarially-chosen
arrival sequence.
That being said, due to the adversarial nature of the demand process, it is not obvious
if the sample information can lead to asymptotic optimality when p=ω(1/m). Even
when the sampling probability pis constant, with a large number of resources, the sample
information may not be very illuminating. Consider the case where either the market size
hor `is small. In such cases, the number of agents in the sample set for at least one
of the types will be small and hence the decision-maker cannot correctly estimate the
expected rewards of the agents. This, in turn, can lead to the decision-maker protecting
the wrong (low-reward) type. Considering this, it is quite remarkable that our algorithm
obtains asymptotic optimality even when pshrinks as a function of m. As we will discuss
shortly, the competitive ratio of our algorithm is tight concerning both mand p.
We now comment on our technical contributions in characterizing the competitive ratio
of our algorithm. The proof of Theorem 1 is quite involved and is divided into three
main cases, where each case bounds the competitive ratio of the algorithm when the total
number of agents of type 1 and 2 falls into a certain region. The most challenging region
is the one in which the total number of type 1 agents (i.e., h) is large. In this case, the
algorithm may lose reward in three aspects: not protecting type 1 agents, over-protecting
type 1 agents, and under-protecting type 1 agents. Recall that in our setting, the decision-
maker does not even know which type has the higher reward, and hence our algorithm may
wrongfully protect type 2 agents. In addition, for large values of h, even when the right
type is protected, by over- and under-protecting the protected type, our algorithm can lose
some rewards. At a high level, we overcome these challenges, by showing that either (i)
摘要:

OnlineResourceAllocationwithSamplesNeginGolrezaeiSloanSchoolofManagement,MassachusettsInstituteofTechnology,golrezaei@mit.eduPatrickJailletDepartmentofElectricalEngineeringandComputerScience,MassachusettsInstituteofTechnology,jaillet@mit.eduZijieZhouOperationsResearchCenter,MassachusettsInstituteofT...

展开>> 收起<<
Online Resource Allocation with Samples Negin Golrezaei Sloan School of Management Massachusetts Institute of Technology golrezaeimit.edu.pdf

共75页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:75 页 大小:4.41MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 75
客服
关注