Online Resource Allocation with Samples Negin Golrezaei Sloan School of Management Massachusetts Institute of Technology golrezaeimit.edu

2025-05-02 0 0 4.41MB 75 页 10玖币

侵权投诉

Online Resource Allocation with Samples

Negin Golrezaei

Sloan School of Management, Massachusetts Institute of Technology, golrezaei@mit.edu

Patrick Jaillet

Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, jaillet@mit.edu

Zijie Zhou

Operations Research Center, Massachusetts Institute of Technology, Cambridge, MA, 02139, zhou98@mit.edu

We study an online resource allocation problem under uncertainty about demand and about the reward of

each type of demand (agents) for the resource. Even though dealing with demand uncertainty in resource

allocation problems has been the topic of many papers in the literature, the challenge of not knowing

rewards has been barely explored. The lack of knowledge about agents’ rewards is inspired by the problem of

allocating units of a new resource (e.g., newly developed vaccines or drugs) with unknown eﬀectiveness/value.

For such settings, we assume that we can test the market before the allocation period starts. During the

test period, we sample each agent in the market with probability p. We study how to optimally exploit

the sample information in our online resource allocation problem under adversarial arrival processes. We

present an asymptotically optimal algorithm that achieves 1 −Θ(1/(p√m)) competitive ratio, where mis the

number of available units of the resource. By characterizing an upper bound on the competitive ratio of any

randomized and deterministic algorithm, we show that our competitive ratio of 1 −Θ(1/(p√m)) is tight for

any p=ω(1/√m). That asymptotic optimality is possible with sample information highlights the signiﬁcant

advantage of running a test period for new resources. We demonstrate the eﬃcacy of our proposed algorithm

using a dataset that contains the number of COVID-19 related hospitalized patients across diﬀerent age

groups.

Key words : online resource allocation, sample information, new resources, competitive ratio, COVID-19

1. Introduction

In online resource allocation problems, the goal is to allocate a limited number of a given

resource to heterogeneous demand/agents that arrive over time. These problems are noto-

riously challenging mainly because of the demand uncertainty and scarcity of the resource.

Such problems get even more challenging for newly developed resources (e.g., new drugs,

products, and services). For such a resource, the eﬀectiveness/value of the resource for

diﬀerent types of agents may not be fully known. To overcome this additional challenge,

businesses, for example, oﬀer free products in an exchange for honest feedback (produc-

treviewmom.com 2022), and pharmaceutical companies test potential treatments/drugs in

arXiv:2210.04774v1 [math.OC] 10 Oct 2022

2Golrezaei, Jaillet, and Zhou: Online Resource Allocation with Samples

human volunteers (Pﬁzer 2022). These practices raise the following key question: can and

to what extent such feedback improve the eﬃciency of online resource allocation?

To answer this question, we consider a decision-maker who aims to allocate her available

resource to two types of unit-demand agents with unknown (expected) rewards, where type

1 has a higher expected reward than type 2.1The total number of agents (i.e., the market

size), as well as, the number of agents of type 1 and 2, denoted by hand `respectively,

are chosen adversarially, and hence are unknown to the decision-maker. Before the alloca-

tion period starts, the decision-maker tests the market by, for example, making a public

announcement and oﬀering resources for free. We assume that with probability p∈(0,1),

each of the h+`agents sees and reacts to the announcement,2and gets one unit of the

resource, where we assume that pis known to the decision-maker. (We will discuss this

assumption later in this section.) These agents then provide feedback on their realized

reward for the resource in return. That is, we assume that all the agents in the test period

will get a resource. As we will show in Section 7.1, this assumption can be relaxed by

limiting the number of available resources during the test period. The test procedure sup-

plies the decision-maker with some information about agents’ expected rewards, as well

as, the size of the market for each type of agent. We refer to this information as sample

information.

After the test period ends, the remaining agents arrive over time according to an

adversarially-chosen order. For each arriving agent, the decision-maker has to make an

irrevocable decision about accepting him and allocating him one unit of the resource or

rejecting him. The decision-maker who has munits of the resource when the allocation

period starts makes acceptance/rejection decisions while being uncertain about the num-

ber, type of future agents, and their expected rewards. The decision-maker is also uncertain

about which types of agents earn higher rewards upon receiving the resource. For such

a demanding setting, our goal is to design eﬃcient resource allocation algorithms that

can optimally utilize the sample information under any possible arrival sequence. In other

words, we measure the performance of the algorithm in terms of its competitive ratio,

which is the expected ratio of the reward of the algorithm to the reward of the optimal

1In Section 7.4, we discuss the case of having more than two types of agents.

2In Section 7.3, we allow the probability pto be diﬀerent for type 1 and 2 agents.

Golrezaei, Jaillet, and Zhou: Online Resource Allocation with Samples 3

clairvoyant solution that knows the arrival sequence and the expected reward of agents in

advance; see Section 2 for the model and the formal deﬁnition of the competitive ratio.

Before presenting our contributions, we make two remarks about our model. First, our

model bears resemblance to the proposed models in Correa et al. (2021), Kaplan et al.

(2022) for secretary and online bipartite matching problems, respectively. In Correa et al.

(2021), each of the secretaries is placed in a sample set with probability p, where the value

of the sampled secretaries will be disclosed to the decision-maker before the decision period

starts. In Kaplan et al. (2022) that studies an online bipartite matching problem, each

agent independently will be placed in a sample set with probability p. While at a high

level, these works seek to design algorithms that can take advantage of samples, the nature

of their considered problems is diﬀerent from ours, preventing us to use their designed

algorithms for our setting. We discuss the details in Section 1.2.

Second, our model is a special case of the single-leg revenue management problem, which

has been widely studied in the literature; see, e.g., Littlewood (1972), Amaruchkul et al.

(2007), Ball and Queyranne (2009), Gallego et al. (2009), Ferreira et al. (2018), Jasin

(2015), Hwang et al. (2021), Golrezaei and Yao (2021). In all of these aforementioned

works, while the decision-maker may be uncertain about the demand (i.e., the number and

the order of the arrivals), the obtained reward of diﬀerent types of demand upon receiving

the resource is fully known to the decision-maker. This is in sharp contrast with our model

in which these rewards are not known to the decision-maker, adding extra complexity

to our problem. (See also Section 1.2 for a discussion about previous works on revenue

management problems with limited demand information, but full knowledge of rewards.)

1.1. Our Contributions

In addition to our modeling contribution, our work makes the following contributions.

Impossibility results. To shed light on the value of sample information, in Section 3,

we consider alternative scenarios under which either the sample information is not available

or the sample information is not very informative due to the lack of some additional

knowledge (e.g., the sampling probability). In all of the considered scenarios, we show

that it is not possible to design asymptotically optimal algorithms whose competitive ratio

goes to one as the number of resources mincreases. While in the ﬁrst scenario, no sample

information is not available (i.e., the sampling probability p= 0), in the second scenario,

the sample information is available but the sampling probability pis not known to the

4Golrezaei, Jaillet, and Zhou: Online Resource Allocation with Samples

decision-maker. The result for the second scenario justiﬁes our assumption about knowing

the sampling probability; see (Correa et al. 2021) for a similar assumption. Nonetheless,

this assumption can be also justiﬁed by the fact that the outreach program is designed

by the decision-maker herself, and hence pcan be well estimated using historical data

for similar outreach programs. See also our case study in Section 6 where we examine

the robustness of our proposed algorithm (which we will present shortly) to the lack of

knowledge of the sampling probability.3

Asymptotically optimal protection level algorithm. In light of our impossibility

results in Section 3, we design a simple, yet eﬀective protection level algorithm that opti-

mally utilizes the sample information; see Algorithm 1 in Section 4. The algorithm uses the

sample information to obtain an estimate of the expected reward of each type of agents. If

the estimated reward of type 1 is greater than that of type 2, the algorithm protects type

1 agents, otherwise type 2 agents will be protected. In each of these cases, the algorithm

uses the sample information to estimate the protection level for the type that has a higher

estimated average reward.

In Theorem 1, we present the competitive ratio of our proposed algorithm for any ﬁnite

value of m. In addition, in Proposition 2, we show that our algorithm is asymptotically

optimal as mgoes to inﬁnity. More precisely, we show that for any p=ω(1/√m)4(which

includes constant values for pthat does not scale with m), the asymptotic competitive ratio

of our algorithm is on the order of 1 −Θ(1/(p√m)).5As we show in Section 7.1, the same

asymptotic competitive ratio continues to hold even when there is a capacity constraint

during the test period. (There we show that when the number of resources during the test

period is in the order of ω(√m), the asymptotic competitive ratio of (a slightly modiﬁed

version of) Algorithm 1 is in the order of 1 −Θ(1/(p√m)).)

3In Appendix K, using numerical studies and under adversarial arrival processes, we further test the robustness of

our proposed algorithm to the lack of knowledge of p. There we show that when the estimation error in pis 10%, the

competitive ratio decreases by at most 6%. Similar results are obtained when the estimation error in pis 20%.

4We scale pwith mbecause of the cost of testing the market. With a large value of m, if the market size turns out

to be large too (e.g., proportional to m), the decision-maker will incur a large cost during the test period, justifying

our choice of p=ω(1/√m). Nonetheless, our results hold for constant p’s, and theoretically speaking, handling the

case of p=ω(1/√m) is more challenging than that of constant p’s. This is because, under constant p’s, the sample

information is more informative, easing the analysis of the algorithm.

5For the case of p=O(1

√m) where pgoes to zero very fast as mgrows, in Section J, we design another algorithm whose

competitive ratio is 1/2. We note that our upper bound in Section 5.2 conﬁrms the challenges of such an extreme case

for the sampling probability. As we show there, when pgoes to zero very fast (i.e., p=O(1/m)), obtaining asymptotic

optimality is impossible.

Golrezaei, Jaillet, and Zhou: Online Resource Allocation with Samples 5

This result shows that the sample information can be extremely useful to improve the

performance of resource allocation algorithms. This is because in the absence of the sample

information, and even when the expected rewards of agents are fully known, as shown in

Ball and Queyranne (2009), the competitive ratio of any algorithm cannot exceed 1/(2 −

r2/r1) no matter how large mis, where r1> r2are the expected rewards for the the two

types of agents. Here, we show that we can break the barrier of 1/(2 −r2/r1) by taking

advantage of the sample information in a very challenging setting, where the expected

reward of agents is not known to the decision-maker. This is mainly because the sample

information can be used to infer some knowledge about the number of agents of diﬀerent

types in the online arrival sequence, gaining some knowledge about the adversarially-chosen

arrival sequence.

That being said, due to the adversarial nature of the demand process, it is not obvious

if the sample information can lead to asymptotic optimality when p=ω(1/√m). Even

when the sampling probability pis constant, with a large number of resources, the sample

information may not be very illuminating. Consider the case where either the market size

hor `is small. In such cases, the number of agents in the sample set for at least one

of the types will be small and hence the decision-maker cannot correctly estimate the

expected rewards of the agents. This, in turn, can lead to the decision-maker protecting

the wrong (low-reward) type. Considering this, it is quite remarkable that our algorithm

obtains asymptotic optimality even when pshrinks as a function of m. As we will discuss

shortly, the competitive ratio of our algorithm is tight concerning both mand p.

We now comment on our technical contributions in characterizing the competitive ratio

of our algorithm. The proof of Theorem 1 is quite involved and is divided into three

main cases, where each case bounds the competitive ratio of the algorithm when the total

number of agents of type 1 and 2 falls into a certain region. The most challenging region

is the one in which the total number of type 1 agents (i.e., h) is large. In this case, the

algorithm may lose reward in three aspects: not protecting type 1 agents, over-protecting

type 1 agents, and under-protecting type 1 agents. Recall that in our setting, the decision-

maker does not even know which type has the higher reward, and hence our algorithm may

wrongfully protect type 2 agents. In addition, for large values of h, even when the right

type is protected, by over- and under-protecting the protected type, our algorithm can lose

some rewards. At a high level, we overcome these challenges, by showing that either (i)

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

OnlineResourceAllocationwithSamplesNeginGolrezaeiSloanSchoolofManagement,MassachusettsInstituteofTechnology,golrezaei@mit.eduPatrickJailletDepartmentofElectricalEngineeringandComputerScience,MassachusettsInstituteofTechnology,jaillet@mit.eduZijieZhouOperationsResearchCenter,MassachusettsInstituteofT...

展开>> 收起<<

Online Resource Allocation with Samples Negin Golrezaei Sloan School of Management Massachusetts Institute of Technology golrezaeimit.edu.pdf

共75页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Online Resource Allocation with Samples Negin Golrezaei Sloan School of Management Massachusetts Institute of Technology golrezaeimit.edu

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: