Online Team Formation under Dierent Synergies Matthew Eichhorn Cornell University

2025-04-27 0 0 693.67KB 37 页 10玖币
侵权投诉
Online Team Formation under Different Synergies
Matthew Eichhorn
Cornell University
Ithaca, NY 14850
meichhorn@cornell.edu
Siddhartha Banerjee
Cornell University
Ithaca, NY 14850
sbanerjee@cornell.edu
David Kempe
University of Southern California
Los Angeles, CA 90089
david.m.kempe@gmail.com
Abstract
Team formation is ubiquitous in many sectors: education, labor markets, sports,
etc. A team’s success depends on its members’ latent types, which are not directly
observable but can be (partially) inferred from past performances. From the viewpoint of
a principal trying to select teams, this leads to a natural exploration-exploitation trade-
off: retain successful teams that are discovered early, or reassign agents to learn more
about their types? We study a natural model for online team formation, where a principal
repeatedly partitions a group of agents into teams. Agents have binary latent types, each
team comprises two members, and a team’s performance is a symmetric function of its
members’ types. Over multiple rounds, the principal selects matchings over agents and
incurs regret equal to the deficit in the number of successful teams versus the optimal
matching for the given function. Our work provides a complete characterization of the
regret landscape for all symmetric functions of two binary inputs. In particular, we
develop team-selection policies that, despite being agnostic of model parameters, achieve
optimal or near-optimal regret against an adaptive adversary.
1 Introduction
An instructor teaching a large online course wants to pair up students for assignments. The
instructor knows that a team performs well as long as at least one of its members has some
past experience with coding, but unfortunately, there is no available information on the
students’ prior experience. However, the course staff can observe the performance of each
team on assignments, and so, over multiple assignments, would like to reshuffle teams to try
and quickly maximize the overall number of successful teams. How well can one do in such
a situation?
Team formation is ubiquitous across many domains: homework groups in large courses,
workers assigned to projects on online labor platforms, police officers paired up for patrols,
athletes assigned to teams, etc. Such teams must often be formed without prior information
1
arXiv:2210.05795v2 [cs.GT] 14 Oct 2022
on each individual’s latent skills or personality traits, albeit with knowledge of how these
latent traits affect team performance. The lack of information necessitates a natural trade-
off: a principal must decide whether to exploit successful teams located early or reassign
teammates to gain insight into the abilities of other individuals. The latter choice may
temporarily reduce the overall rate of success.
To study this problem, we consider a setting (described in detail in Section 2) where agents
have binary latent types, each team comprises two members, and the performance of each
team is given by the same synergy function, i.e., some given symmetric function of its mem-
bers’ types. Over multiple rounds, the principal selects matchings over agents, with the goal
of minimizing the cumulative regret, i.e., the difference between the number of successful
teams in a round versus the number of successful teams under an optimal matching. Our
main results concern the special case of symmetric Boolean synergy functions — in particular,
we study the functions EQ and XOR (in Section 3), OR (in Section 4) and AND (in Section 5).
While this may at first appear to be a limited class of synergy functions, in Section 2.3, we
argue that these four functions are in a sense the atomic primitives for this problem; our
results for these four settings are sufficient to handle arbitrary symmetric synergy functions.
The above model was first introduced by Johari et al. [12], who considered the case where
agent types are i.i.d. Bernoulli(p) (for known p) and provide asymptotically optimal regret
guarantees under AND (and preliminary results for OR). As with any bandit setting, it is
natural to ask whether one can go beyond a stochastic model to admit adversarial inputs.
In particular, the strongest adversary one can consider here is an adaptive adversary, which
observes the choice of teams in each round, and only then fixes the latent types of agents.
In most bandit settings, such an adversary is too strong to get any meaningful guarantees;
among other things, adaptivity precludes the use of randomization as an algorithmic tool,
and typically results in every policy being as bad as any other. Nevertheless, in this work,
we provide a near-complete characterization of the regret landscape for team formation under
an adaptive adversary. In particular, in a setting with nagents of which khave type ‘1’, we
present algorithms that are agnostic of the parameter k, and yet when faced with an adaptive
adversary, achieve optimal regret for EQ and XOR, and near-optimal regret bounds under OR
and AND (and therefore, using our reduction in Section 2.3, achieve near-optimal regret for
any symmetric function).
While our results are specific to particulars of the model, they exhibit several noteworthy
features. First, despite the adversary being fully adaptive, our regret bounds differ only by a
small constant factor from prior results for AND under i.i.d. Bernoulli types [12]; such a small
gap between stochastic and adversarial bandit models is uncommon and surprising. Next,
our bounds under different synergy functions highlight the critical role of these functions
in determining the regret landscape. Additionally, our algorithms expose a sharp contrast
between learning and regret minimization in our setting: while the rate of learning increases
with more exploration, minimizing regret benefits from maximal exploitation. Finally, to deal
with adaptive adversaries in our model, we use techniques from extremal graph theory that
are atypical in regret minimization; we hope that these ideas prove useful in other complex
bandit settings.
2
1.1 Related Work
Regret minimization in team formation, although reminiscent of combinatorial bandits/semi-
bandits [4, 5, 6, 8, 16], poses fundamentally new challenges arising from different synergy
functions. In particular, a crucial aspect of bandit models is that rewards and/or feedback
are linear functions of individual arms’ latent types. Some models allow rewards/feedback to
be given by a non-linear link function of the sum of arm rewards [7, 10], but typically require
the link function to be well-approximated by a linear function [17]. In contrast, our team
synergy functions are non-linear, and moreover, are not well-approximated by any non-linear
function of the sums of the agents’ types.
One way to go beyond semi-bandit models and incorporate pairwise interactions is by assum-
ing that the resulting reward matrix is low-rank [13, 20, 24]. The critical property here is that
under perfect feedback, one can learn all agent types via a few ‘orthogonal’ explorations; this
is true in our setting under the XOR function (Section 3), but not for other Boolean functions.
Another approach for handling complex rewards/feedback is via a Bayesian heuristic such as
Thompson sampling or information-directed sampling [9, 14, 19, 23]. While such approaches
achieve near-optimal regret in many settings, the challenge in our setting is in updating priors
over agents’ types given team scores. We hope that the new approaches we introduce could,
in the future, be combined with low-rank decomposition and sampling approaches to handle
more complex scenarios such as shifting types and corrupted feedback.
In addition to the bandit literature, there is a parallel stream on learning for team formation.
Rajkumar et al. [18] consider the problem of learning to partition workers into teams, where
team compatibility depends on individual types. Kleinberg and Raghu [15] consider the use of
individual scores to estimate team scores and use these to approximately determine the best
team from a pool of agents. Singla et al. [21] present algorithms for learning individual types
to form a single team under an online budgeted learning setting. These works concentrate
on pure learning. In contrast, our focus is on minimizing regret. Finally, there is a line of
work on strategic behavior in teams, studying how to incentivize workers to exert effort [2, 3],
and how to use signaling to influence team formation [11]. While our work eschews strategic
considerations, it suggests extensions that combine learning by the principal with strategic
actions by agents.
2 Model
2.1 Agents, Types, and Teams
We consider nagents who must be paired by a principal into teams of two over a number
of rounds; throughout, we assume that nis even. Each agent has an unknown latent type
θi∈ {0,1}. These types can represent any dichotomous attribute: “left-brain” vs. “right-
brain” (Section 3), “low-skill” vs. “high-skill” (Sections 4 and 5), etc. We let kdenote the
number of agents with type 1, and assume that kis fixed a priori but unknown.
In each round t, the principal selects a matching Mt, with each edge (i, j)Mtrepresenting a
team. We use the terms “edge” and “team” interchangeably. The success of a team (i, j)Mt
is f(θi, θj), where f:{0,1}2Ris some known symmetric function of the agents’ types. In
Sections 3-5, we restrict our focus to Boolean functions, interpreting f(θi, θj) = 1 as a success
3
and f(θi, θj) = 0 as a failure. The algorithm observes the success of each team, and may use
this to select the matchings in subsequent rounds; however, the algorithm cannot directly
observe agents’ types. For any matching M, we define its score as S(M) := P(i,j)Mf(θi, θj)
— in the special case of Boolean functions, this is the number of successful teams.
A convenient way to view the Boolean setting is as constructing an edge-labeled exploration
graph G(V, E1, E2, . . .), where nodes in Vare agents, and the edge set Et:= St0tMt0
represents all pairings played up to round t. Upon being played for the first time, an edge is
assigned a label {0,1}corresponding to the success value of its team. Known 0-agents and
known 1-agents are those whose types can be inferred from the edge labels. The remaining
agents are unknown. The unresolved subgraph is the induced subgraph on the unknown
agents.
2.2 Adversarial Types and Regret
The principal makes decisions facing an adaptive adversary, who knows k(unlike the principal,
who only knows n) and, in each round, is free to assign agent types after seeing the matching
chosen by the principal, as long as (1) the assignment is consistent with prior observations
(i.e., with the exploration graph), and (2) the number of 1-agents is k. Note that this is
the strongest notion of an adversary we can consider in this setting; in particular, since the
adversary is fully adaptive and knows the matching before making decisions, randomizing does
not help, and so it is without loss of generality to consider only deterministic algorithms.
We evaluate the performance of algorithms in terms of additive regret against such an ad-
versary. Formally, let Mbe any matching maximizing S(M) — note that for any Boolean
team success function, S(M) is a fixed function of nand k. In round t, an algorithm incurs
regret rt:= S(M)S(Mt), and its total regret is the sum of its per-round regret over an
a priori infinite time horizon. Note, however, that after a finite number of rounds, a na¨ıve
algorithm that enumerates all matchings can determine, and henceforth play, M; thus, the
optimal regret is always finite. Moreover, the “effective” horizon (i.e., the time until the
algorithm learns M) of our algorithms is small.
2.3 Symmetry Synergy Functions and Atomic Primitives
In subsequent sections, we consider the problem of minimizing regret under four Boolean
synergy functions f:{0,1}2→ {0,1}:EQ,XOR,OR, and AND. Interestingly, the algorithms
for these four settings suffice to handle any symmetric synergy function f:{0,1}2R. We
argue this below for synergy functions that take at most two values; We handle the case of
synergy functions ftaking three different values at the end of Section 3.1.
Lemma 1. Fix some `u, let f:{0,1}2→ {`, u}be any symmetric synergy function, and
let rf(n, k)denote the optimal regret with nagents, of which khave type 1.
Then, rf(n, k) = (u`)·rg(n, k)for one of g∈ {EQ,XOR,AND,OR}.
Proof First, note that without loss of generality, we may assume that f(0,0) f(1,1).
Otherwise, we can swap the labels of the agent types without altering the problem. Note
that this immediately allows us to reduce team formation under the Boolean NAND and
NOR function to the same problem under AND and OR, respectively. Next, note that if
4
f(0,0) = f(1,0) = f(1,1), then the problem is trivial, as all matchings have the same score.
Otherwise, we may apply the affine transformation f7→ 1
u`·f`
u`to the output to recover
a Boolean function:
When f(0,1) < f(0,0) = f(1,1), we recover the EQ function.
When f(0,0) = f(1,1) < f(0,1), we recover the XOR function.
When f(0,0) = f(0,1) < f(1,1), we recover the AND function.
When f(0,0) < f(0,1) = f(1,1), we recover the OR function.
The structure of the problem remains unchanged since total regret is linear in the number
of each type of team played over the course of the algorithm. The regret simply scales by a
factor of u`.
3 Uniform and Diverse Teams
We first focus on forming teams that promote uniformity (captured by the Boolean EQ
function) or diversity (captured by the XOR function). In addition, we also show that the
algorithm for EQ minimizes regret under any general symmetric synergy function taking three
different values.
3.1 Uniformity (EQ)
We first consider the equality (or EQ) synergy function, fEQ(θi, θj) = θiθj. Here, an
optimal matching Mincludes as few (0,1)-teams as possible, and thus S(M) = n
2(k
mod 2). If k(and thus nk) is even, then all agents can be paired in successful teams;
else, any optimal matching must include one unsuccessful team with different types. For this
setting, Theorem 1 shows that a simple policy (Algorithm 1) achieves optimal regret for all
parameters nand k.
Algorithm 1 Form Uniform Teams
Round 1: Play an arbitrary matching.
Round 2: Swap unsuccessful teams in pairs as {(a, b),(c, d)} → {(a, c),(b, d)}. Repeat
remaining teams (including one unsuccessful team when kis odd).
Round 3: If {(a, b),(c, d)},{(a, c),(b, d)}are both unsuccessful, play {(a, d),(b, c)}. Re-
peat remaining teams.
Theorem 1. Define rEQ(n, k) := 2 ·min(k, n k)(kmod 2).Then,
1. Algorithm 1 learns an optimal matching by round 3, and incurs regret at most rEQ(n, k).
2. Any algorithm incurs regret at least rEQ(n, k)in the worst case.
Proof For the upper bound on the regret, note that every unsuccessful team includes a 0-
agent and a 1-agent. Thus, there is a re-pairing of any two unsuccessful teams that gives rise
to two successful teams. If the re-pairing in round 2 is unsuccessful, the only other re-pairing,
selected in round 3, must be successful. There will be kmod 2 unsuccessful teams in round
5
摘要:

OnlineTeamFormationunderDi erentSynergiesMatthewEichhornCornellUniversityIthaca,NY14850meichhorn@cornell.eduSiddharthaBanerjeeCornellUniversityIthaca,NY14850sbanerjee@cornell.eduDavidKempeUniversityofSouthernCaliforniaLosAngeles,CA90089david.m.kempe@gmail.comAbstractTeamformationisubiquitousinmanyse...

展开>> 收起<<
Online Team Formation under Dierent Synergies Matthew Eichhorn Cornell University.pdf

共37页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:37 页 大小:693.67KB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 37
客服
关注