Online Team Formation under Dierent Synergies Matthew Eichhorn Cornell University

2025-04-27 0 0 693.67KB 37 页 10玖币

侵权投诉

Online Team Formation under Diﬀerent Synergies

Matthew Eichhorn

Cornell University

Ithaca, NY 14850

meichhorn@cornell.edu

Siddhartha Banerjee

Cornell University

Ithaca, NY 14850

sbanerjee@cornell.edu

David Kempe

University of Southern California

Los Angeles, CA 90089

david.m.kempe@gmail.com

Abstract

Team formation is ubiquitous in many sectors: education, labor markets, sports,

etc. A team’s success depends on its members’ latent types, which are not directly

observable but can be (partially) inferred from past performances. From the viewpoint of

a principal trying to select teams, this leads to a natural exploration-exploitation trade-

oﬀ: retain successful teams that are discovered early, or reassign agents to learn more

about their types? We study a natural model for online team formation, where a principal

repeatedly partitions a group of agents into teams. Agents have binary latent types, each

team comprises two members, and a team’s performance is a symmetric function of its

members’ types. Over multiple rounds, the principal selects matchings over agents and

incurs regret equal to the deﬁcit in the number of successful teams versus the optimal

matching for the given function. Our work provides a complete characterization of the

regret landscape for all symmetric functions of two binary inputs. In particular, we

develop team-selection policies that, despite being agnostic of model parameters, achieve

optimal or near-optimal regret against an adaptive adversary.

1 Introduction

An instructor teaching a large online course wants to pair up students for assignments. The

instructor knows that a team performs well as long as at least one of its members has some

past experience with coding, but unfortunately, there is no available information on the

students’ prior experience. However, the course staﬀ can observe the performance of each

team on assignments, and so, over multiple assignments, would like to reshuﬄe teams to try

and quickly maximize the overall number of successful teams. How well can one do in such

a situation?

Team formation is ubiquitous across many domains: homework groups in large courses,

workers assigned to projects on online labor platforms, police oﬃcers paired up for patrols,

athletes assigned to teams, etc. Such teams must often be formed without prior information

arXiv:2210.05795v2 [cs.GT] 14 Oct 2022

on each individual’s latent skills or personality traits, albeit with knowledge of how these

latent traits aﬀect team performance. The lack of information necessitates a natural trade-

oﬀ: a principal must decide whether to exploit successful teams located early or reassign

teammates to gain insight into the abilities of other individuals. The latter choice may

temporarily reduce the overall rate of success.

To study this problem, we consider a setting (described in detail in Section 2) where agents

have binary latent types, each team comprises two members, and the performance of each

team is given by the same synergy function, i.e., some given symmetric function of its mem-

bers’ types. Over multiple rounds, the principal selects matchings over agents, with the goal

of minimizing the cumulative regret, i.e., the diﬀerence between the number of successful

teams in a round versus the number of successful teams under an optimal matching. Our

main results concern the special case of symmetric Boolean synergy functions — in particular,

we study the functions EQ and XOR (in Section 3), OR (in Section 4) and AND (in Section 5).

While this may at ﬁrst appear to be a limited class of synergy functions, in Section 2.3, we

argue that these four functions are in a sense the atomic primitives for this problem; our

results for these four settings are suﬃcient to handle arbitrary symmetric synergy functions.

The above model was ﬁrst introduced by Johari et al. [12], who considered the case where

agent types are i.i.d. Bernoulli(p) (for known p) and provide asymptotically optimal regret

guarantees under AND (and preliminary results for OR). As with any bandit setting, it is

natural to ask whether one can go beyond a stochastic model to admit adversarial inputs.

In particular, the strongest adversary one can consider here is an adaptive adversary, which

observes the choice of teams in each round, and only then ﬁxes the latent types of agents.

In most bandit settings, such an adversary is too strong to get any meaningful guarantees;

among other things, adaptivity precludes the use of randomization as an algorithmic tool,

and typically results in every policy being as bad as any other. Nevertheless, in this work,

we provide a near-complete characterization of the regret landscape for team formation under

an adaptive adversary. In particular, in a setting with nagents of which khave type ‘1’, we

present algorithms that are agnostic of the parameter k, and yet when faced with an adaptive

adversary, achieve optimal regret for EQ and XOR, and near-optimal regret bounds under OR

and AND (and therefore, using our reduction in Section 2.3, achieve near-optimal regret for

any symmetric function).

While our results are speciﬁc to particulars of the model, they exhibit several noteworthy

features. First, despite the adversary being fully adaptive, our regret bounds diﬀer only by a

small constant factor from prior results for AND under i.i.d. Bernoulli types [12]; such a small

gap between stochastic and adversarial bandit models is uncommon and surprising. Next,

our bounds under diﬀerent synergy functions highlight the critical role of these functions

in determining the regret landscape. Additionally, our algorithms expose a sharp contrast

between learning and regret minimization in our setting: while the rate of learning increases

with more exploration, minimizing regret beneﬁts from maximal exploitation. Finally, to deal

with adaptive adversaries in our model, we use techniques from extremal graph theory that

are atypical in regret minimization; we hope that these ideas prove useful in other complex

bandit settings.

1.1 Related Work

Regret minimization in team formation, although reminiscent of combinatorial bandits/semi-

bandits [4, 5, 6, 8, 16], poses fundamentally new challenges arising from diﬀerent synergy

functions. In particular, a crucial aspect of bandit models is that rewards and/or feedback

are linear functions of individual arms’ latent types. Some models allow rewards/feedback to

be given by a non-linear link function of the sum of arm rewards [7, 10], but typically require

the link function to be well-approximated by a linear function [17]. In contrast, our team

synergy functions are non-linear, and moreover, are not well-approximated by any non-linear

function of the sums of the agents’ types.

One way to go beyond semi-bandit models and incorporate pairwise interactions is by assum-

ing that the resulting reward matrix is low-rank [13, 20, 24]. The critical property here is that

under perfect feedback, one can learn all agent types via a few ‘orthogonal’ explorations; this

is true in our setting under the XOR function (Section 3), but not for other Boolean functions.

Another approach for handling complex rewards/feedback is via a Bayesian heuristic such as

Thompson sampling or information-directed sampling [9, 14, 19, 23]. While such approaches

achieve near-optimal regret in many settings, the challenge in our setting is in updating priors

over agents’ types given team scores. We hope that the new approaches we introduce could,

in the future, be combined with low-rank decomposition and sampling approaches to handle

more complex scenarios such as shifting types and corrupted feedback.

In addition to the bandit literature, there is a parallel stream on learning for team formation.

Rajkumar et al. [18] consider the problem of learning to partition workers into teams, where

team compatibility depends on individual types. Kleinberg and Raghu [15] consider the use of

individual scores to estimate team scores and use these to approximately determine the best

team from a pool of agents. Singla et al. [21] present algorithms for learning individual types

to form a single team under an online budgeted learning setting. These works concentrate

on pure learning. In contrast, our focus is on minimizing regret. Finally, there is a line of

work on strategic behavior in teams, studying how to incentivize workers to exert eﬀort [2, 3],

and how to use signaling to inﬂuence team formation [11]. While our work eschews strategic

considerations, it suggests extensions that combine learning by the principal with strategic

actions by agents.

2 Model

2.1 Agents, Types, and Teams

We consider nagents who must be paired by a principal into teams of two over a number

of rounds; throughout, we assume that nis even. Each agent has an unknown latent type

θi∈ {0,1}. These types can represent any dichotomous attribute: “left-brain” vs. “right-

brain” (Section 3), “low-skill” vs. “high-skill” (Sections 4 and 5), etc. We let kdenote the

number of agents with type 1, and assume that kis ﬁxed a priori but unknown.

In each round t, the principal selects a matching Mt, with each edge (i, j)∈Mtrepresenting a

team. We use the terms “edge” and “team” interchangeably. The success of a team (i, j)∈Mt

is f(θi, θj), where f:{0,1}2→Ris some known symmetric function of the agents’ types. In

Sections 3-5, we restrict our focus to Boolean functions, interpreting f(θi, θj) = 1 as a success

and f(θi, θj) = 0 as a failure. The algorithm observes the success of each team, and may use

this to select the matchings in subsequent rounds; however, the algorithm cannot directly

observe agents’ types. For any matching M, we deﬁne its score as S(M) := P(i,j)∈Mf(θi, θj)

— in the special case of Boolean functions, this is the number of successful teams.

A convenient way to view the Boolean setting is as constructing an edge-labeled exploration

graph G(V, E1, E2, . . .), where nodes in Vare agents, and the edge set Et:= St0≤tMt0

represents all pairings played up to round t. Upon being played for the ﬁrst time, an edge is

assigned a label {0,1}corresponding to the success value of its team. Known 0-agents and

known 1-agents are those whose types can be inferred from the edge labels. The remaining

agents are unknown. The unresolved subgraph is the induced subgraph on the unknown

agents.

2.2 Adversarial Types and Regret

The principal makes decisions facing an adaptive adversary, who knows k(unlike the principal,

who only knows n) and, in each round, is free to assign agent types after seeing the matching

chosen by the principal, as long as (1) the assignment is consistent with prior observations

(i.e., with the exploration graph), and (2) the number of 1-agents is k. Note that this is

the strongest notion of an adversary we can consider in this setting; in particular, since the

adversary is fully adaptive and knows the matching before making decisions, randomizing does

not help, and so it is without loss of generality to consider only deterministic algorithms.

We evaluate the performance of algorithms in terms of additive regret against such an ad-

versary. Formally, let M∗be any matching maximizing S(M∗) — note that for any Boolean

team success function, S(M∗) is a ﬁxed function of nand k. In round t, an algorithm incurs

regret rt:= S(M∗)−S(Mt), and its total regret is the sum of its per-round regret over an

a priori inﬁnite time horizon. Note, however, that after a ﬁnite number of rounds, a na¨ıve

algorithm that enumerates all matchings can determine, and henceforth play, M∗; thus, the

optimal regret is always ﬁnite. Moreover, the “eﬀective” horizon (i.e., the time until the

algorithm learns M∗) of our algorithms is small.

2.3 Symmetry Synergy Functions and Atomic Primitives

In subsequent sections, we consider the problem of minimizing regret under four Boolean

synergy functions f:{0,1}2→ {0,1}:EQ,XOR,OR, and AND. Interestingly, the algorithms

for these four settings suﬃce to handle any symmetric synergy function f:{0,1}2→R. We

argue this below for synergy functions that take at most two values; We handle the case of

synergy functions ftaking three diﬀerent values at the end of Section 3.1.

Lemma 1. Fix some `≤u, let f:{0,1}2→ {`, u}be any symmetric synergy function, and

let rf(n, k)denote the optimal regret with nagents, of which khave type 1.

Then, rf(n, k) = (u−`)·rg(n, k)for one of g∈ {EQ,XOR,AND,OR}.

Proof First, note that without loss of generality, we may assume that f(0,0) ≤f(1,1).

Otherwise, we can swap the labels of the agent types without altering the problem. Note

that this immediately allows us to reduce team formation under the Boolean NAND and

NOR function to the same problem under AND and OR, respectively. Next, note that if

f(0,0) = f(1,0) = f(1,1), then the problem is trivial, as all matchings have the same score.

Otherwise, we may apply the aﬃne transformation f7→ 1

u−`·f−`

u−`to the output to recover

a Boolean function:

•When f(0,1) < f(0,0) = f(1,1), we recover the EQ function.

•When f(0,0) = f(1,1) < f(0,1), we recover the XOR function.

•When f(0,0) = f(0,1) < f(1,1), we recover the AND function.

•When f(0,0) < f(0,1) = f(1,1), we recover the OR function.

The structure of the problem remains unchanged since total regret is linear in the number

of each type of team played over the course of the algorithm. The regret simply scales by a

factor of u−`.

3 Uniform and Diverse Teams

We ﬁrst focus on forming teams that promote uniformity (captured by the Boolean EQ

function) or diversity (captured by the XOR function). In addition, we also show that the

algorithm for EQ minimizes regret under any general symmetric synergy function taking three

diﬀerent values.

3.1 Uniformity (EQ)

We ﬁrst consider the equality (or EQ) synergy function, fEQ(θi, θj) = θi⊕θj. Here, an

optimal matching M∗includes as few (0,1)-teams as possible, and thus S(M∗) = n

2−(k

mod 2). If k(and thus n−k) is even, then all agents can be paired in successful teams;

else, any optimal matching must include one unsuccessful team with diﬀerent types. For this

setting, Theorem 1 shows that a simple policy (Algorithm 1) achieves optimal regret for all

parameters nand k.

Algorithm 1 Form Uniform Teams

Round 1: Play an arbitrary matching.

Round 2: Swap unsuccessful teams in pairs as {(a, b),(c, d)} → {(a, c),(b, d)}. Repeat

remaining teams (including one unsuccessful team when kis odd).

Round 3: If {(a, b),(c, d)},{(a, c),(b, d)}are both unsuccessful, play {(a, d),(b, c)}. Re-

peat remaining teams.

Theorem 1. Deﬁne rEQ(n, k) := 2 ·min(k, n −k)−(kmod 2).Then,

1. Algorithm 1 learns an optimal matching by round 3, and incurs regret at most rEQ(n, k).

2. Any algorithm incurs regret at least rEQ(n, k)in the worst case.

Proof For the upper bound on the regret, note that every unsuccessful team includes a 0-

agent and a 1-agent. Thus, there is a re-pairing of any two unsuccessful teams that gives rise

to two successful teams. If the re-pairing in round 2 is unsuccessful, the only other re-pairing,

selected in round 3, must be successful. There will be kmod 2 unsuccessful teams in round

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

OnlineTeamFormationunderDierentSynergiesMatthewEichhornCornellUniversityIthaca,NY14850meichhorn@cornell.eduSiddharthaBanerjeeCornellUniversityIthaca,NY14850sbanerjee@cornell.eduDavidKempeUniversityofSouthernCaliforniaLosAngeles,CA90089david.m.kempe@gmail.comAbstractTeamformationisubiquitousinmanyse...

展开>> 收起<<

Online Team Formation under Dierent Synergies Matthew Eichhorn Cornell University.pdf

共37页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Online Team Formation under Dierent Synergies Matthew Eichhorn Cornell University

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: