Hypergraph patterns and collaboration structure

2025-04-22 0 0 2.31MB 25 页 10玖币
侵权投诉
Hypergraph patterns and collaboration structure
Jonas L. Juul , Austin R. Benson,and Jon Kleinberg
Abstract. Humans collaborate in different contexts such as in creative or scientific projects, in workplaces and
in sports. Depending on the project and external circumstances, a newly formed collaboration may
include people that have collaborated before in the past, and people with no collaboration history.
Such existing relationships between team members have been reported to influence the performance
of teams. However, it is not clear how existing relationships between team members should be
quantified, and whether some relationships are more likely to occur in new collaborations than others.
Here we introduce a new family of structural patterns, m-patterns, which formalize relationships
between collaborators and we study the prevalence of such structures in data and a simple random-
hypergraph null model. We analyze the frequency with which different collaboration structures
appear in our null model and show how such frequencies depend on size and hyperedge density in
the hypergraphs. Comparing the null model to data of human and non-human collaborations, we
find that some collaboration structures are vastly under- and overrepresented in empirical datasets.
Finally, we find that structures of scientific collaborations on COVID-19 papers in some cases are
statistically significantly different from those of non-COVID-19 papers. Examining citation counts
for 4 different scientific fields, we also find indications that repeat collaborations are more successful
for 2-author scientific publications and less successful for 3-author scientific publications as compared
to other collaboration structures.
Key words. Hypergraphs, team performance, collaboration structure, COVID-19, random graphs
AMS subject classifications. 05C65, 05C80, 91D30, 91F99
1. Introduction. When a new team forms, who are likely to be members of this team?
Who are unlikely to join forces? Are some team constellations better suited for solving some
tasks than others? How do external circumstances such as tight deadlines or empty schedules
affect how and which teams form?
The questions above arise in all of the different settings where team formation and per-
formance are important. Indeed, in online collaboration over the Web [34], creative undertak-
ings [20], technology and science [39] and school [37], group size and the structure of social
ties in the group have been reported to be of importance for the performance of teams. Al-
though this diversity of settings already make the questions rich, they become even richer
when one considers the plethora of external circumstances that can influence team formation
in each of the settings. Take the COVID-19 pandemic; when researchers needed to quickly
mobilize, analyze the spread of the disease, and its impact on society, did they work primar-
ily in tightly-knit groups with a history of collaboration? Or did the interdisciplinary and
high-stakes nature of the research questions make scholars work in diverse and untried teams?
Both of the hypotheses above are reasonable and demand serious consideration. But how
does one formalize the notion of a tightly-knit or a novel team structure? The essential thing
to quantify in these concepts is the relationship between the members of the newly formed
team. What were these people doing before they joined forces? Did subsets of the team work
Center for Applied Mathematics, Cornell University, Ithaca, New York 14853, USA (jjuul@cornell.edu)
Department of Computer Science, Cornell University, Ithaca, New York 14853, USA.
1
arXiv:2210.02163v1 [cs.SI] 5 Oct 2022
2 J. L. JUUL, A. R. BENSON, AND J. KLEINBERG
together before, and did others not?
Examples from popular culture richly illustrate the relevance of examining the existing
relationships between team members in successful undertakings. For example, the American
rock band Audioslave rose to popularity after being formed by Soundgarten singer Chris Cor-
nell and 3 former members of Rage Against the Machine: Tom Morello, Tim Commerford, and
Brad Wilk. In studio sessions, it is also common for groups of musicians to perform together
repeatedly; the horn section of the legendary R&B-band Tower of Power have appeared to-
gether on a large number of other artists’ recordings. In technology, the company Bumble was
founded by three Tinder departees (Whitney Wolfe Herd, Chris Gulczynski and Sarah Mick)
and Badoo-CEO and acquaintance of Wolfe Herd’s, Andrey Andreev. In movies, Samuel L.
Jackson stars in several Quentin Tarantino movies, and Charlotte Gainsbourg plays leading
roles in 3 of director Lars Von Trier’s recent works.
To formally study the formation of teams and existing relationships between team mem-
bers, it is useful to use the language of hypergraphs. In the hypergraph framework, people are
represented by nodes, and connections – called hyperedges – can connect groups of nodes of
any size that have worked together in the past. The focus on hypergraphs as representations
of networked systems, has gained considerable traction in recent years [6,4,3,7], following
two decades of intense study of graphs with only dyadic interactions [33,21,19].
Many of the questions being pursued in this recent work on hypergraphs are generalizations
of concepts from the well-known world of dyadic interactions. These include questions regard-
ing hypergraph modularity [24,27,13,41,40,5,11,1], higher-order assortativity [38,28],
simplicial closure [4], hypergraph motifs and other structural patterns [30,26], construction of
synthetic hypergraphs with certain characteristics [15,16,9,43,25,12,18], and how to infer
higher-order network structure from data [2,42]. The introduction of higher-order connections
also makes it possible to ask completely new questions about the structure of the networked
system. For example, a recent paper examined how hyperedges overlap in empirical hyper-
graphs [29]. Such a question would be trivial in the world of dyadic interactions, as dyadic
interactions can only overlap in their two endpoints. In hypergraphs, however, the question
is meaningful since different hyperedges could contain identical subsets of the network nodes.
In this paper, we introduce a new family of structural patterns in hypergraphs, designed
to capture the prior associations of the nodes making up a given hyperedge. We call these
m-patterns, and they represent the existing relationship between groups of mnodes. These
relationships are exactly the above-mentioned quantity of interest when studying the formation
of teams of size m.
Formally, m-patterns are subhypergraphs of size m. The subhypergraph consists of the m
nodes under consideration, all hyperedges connecting subsets of the m-nodes, and fractions
of hyperedges that connect subsets of the m-nodes to hypergraph nodes other than the m
under consideration. The inclusion of fractions of hyperedges causes m-patterns to quantify
structure between the level of nodes and hyperedges. This makes m-patterns different from
motifs and a new kind of microstructure that exists in hypergraphs, but not in graphs with
dyadic interactions only.
After having introduced m-patterns, we argue that the prevalence of different m-patterns
are expected to depend on hypergraph characteristics such as hyperedge density. To under-
stand this dependency, we examine how prevalence of m-patterns change with parameters in
HYPERGRAPH PATTERNS AND COLLABORATION STRUCTURE 3
aG(N, p)-like model. We proceed to compare these null-model results to m-pattern preva-
lence in a wide range of datasets on human collaborations, drug networks, email networks and
online tagging data. We then examine whether collaboration structure can be influenced by
external circumstances such as tight schedules. We do this by comparing collaboration struc-
ture in scientific preprints and early preprints of COVID-19 papers. Finally, we investigate
whether future citations of academic publications correlate with collaboration team structure;
specifically, we compare citation counts for repeat collaborations and first-time collaborations
without first-time authors.
2. m-patterns in random hypergraphs. Let us now proceed to studying past relationships
between nodes in hyperedge formation. Our first step will be to study a simple model of
random hypergraphs. Later, we will move from such synethetic hypergraphs and analyze
node relationships in empirical hypergraphs. Before we can make any of these analyses,
however, we must introduce the mathematical structures that we will use to understand node
relationships in hyperedge formation.
2.1. A structural pattern to summarize past relationships. To define the topic of this
paper, m-patterns, we will need some other concepts. The first of these is the notion of an
induced subhypergraph [10].
Definition 2.1. Induced subhypergraph. The induced subhypergraph of a hypergraph H=
(V,E)on mnodes, VI, is a hypergraph HI= (VI,EI). For each e∈ E that contains at least
one node from VI,EIcontains a hyperedge e0linking all nodes that are both in eand VI.
It is clear that an induced subhypergraph completely summarizes all existing relationships
between its constituting nodes. The final sentence of Definition 2.1 means that HIcontains
fractions of the hyperedges of H. This makes the induced subhypergraph an interesting object
for hypergraphs. For graphs, fractions of edges are simply vertices, and so the graph equivalent
of this definition would just be a subgraph on mchosen nodes. If we do not need the entire
relationship history between nodes, but are content with summarizing the largest subsets of
nodes that have collaborated in the past, the following definition is useful.
Definition 2.2. Maximal induced subhypergraph. The maximal induced subhypergraph
HI= (VI,EI)of a hypergraph H= (V,E)on mnodes, VI, is the corresponding induced
subhypergraph made simple by removing all hyperedges from EIthat are entirely contained in
other hyperedges in EI.
The key difference between an induced subhypergraph and a maximal induced subhyper-
graph is that the latter is simple. A simple hypergraph is defined as follows [8].
Definition 2.3. Simple hypergraph. A hypergraph H= (V,E)is simple if none of its
hyperedges are entirely contained in another, @si, sj∈ E :sisj.
Notice that simple hypergraphs are different from simple graphs in that simple hypergraphs
can contain self-looping hyperedges. We note that the hypergraphs we consider in this paper
generally are not simple. Simple hypergraphs play a different role in this story. Because
simple hypergraphs cannot have parallel edges there exists only a finite number of different
such hypergraphs of size m. This is a nice feature if we are interested in quantifying typical
relationship structures among people that choose to form teams. This is exactly what we
4 J. L. JUUL, A. R. BENSON, AND J. KLEINBERG
Rage Against the Machine
T. MorelloC. Cornell
T. Commerford
B. Wilk
Bumble
S. MickA. Andreev
C. GulczynskiW. W. Herd
Regndans - Danseorkestret
S. Kupha
R. Elliot
G.Adams
E. Castillo
L. Thornburg J. Andersen
M. Mørch
K. Hornum P. Andersen
O. Arnfred
A. Jacoby R. Kærså
Figure 1. Illustration of the relationship between individual members of Audioslave, Bumble founders and
musicians on recording of Regndans by Danseorkestret.
are interested in, so we refer to these finitely many relationship structures on mnodes as
m-patterns.
Definition 2.4. m-pattern. A simple hypergraph with mvertices is an m-pattern.
With the concept of an m-pattern in hand, we are now ready to look for instances of
m-patterns in larger hypergraphs.
Definition 2.5. Instance of an m-pattern. An instance of an m-pattern Xin the hy-
pergraph H= (V,E)is a maximal induced subhypergraph X0on mnodes which is isomorphic
to X.
Figure 1illustrates what such m-patterns from maximally induced subhypergraphs might
look like. The figure shows three collaborations. Some people in these collaborations have
worked together previously – perhaps in larger groups. Such larger collaborations become
k-node hyperedges in the m-patterns that the collaboration structure form.
With the definition of m-patterns, and their instances in hypergraphs, we now have a
formal way of talking about existing relationships between hypergraph vertices. In particular,
when a new team of mindividuals appears, we consider the team members’ past history of
interactions to be the m-pattern consisting of all maximal subsets that have worked together
before.
Definition 2.6. Instance of a labelled m-pattern. An instance of a labelled m-pattern
Xwith assigned vertex labels 1,2. . . m in the hypergraph H= (V,E)is a maximal induced
subhypergraph X0on mnodes with assigned vertex labels 1,2, . . . , m which is isomorphic to X
and where corresponding vertices have the same assigned labels as in X.
In Appendix Section A, we illustrate connections between some of the concepts introduced
in this section.
We are now ready to examine what m-patterns among nodes precede hyperedge formation
in hypergraphs. In the following subsection, we will do so in a class of synthetic random
hypergraphs.
HYPERGRAPH PATTERNS AND COLLABORATION STRUCTURE 5
2.2. G(m)(N, p)model of random hypergraphs. From Definition 2.4 and 2.5, it is clear
that the structure of the underlying hypergraph Hgreatly influences what m-patterns that
can exist among sets of mnodes, and what multiplicity these m-patterns might have in the
hypergraph. If the hypergraph is very sparse, most sets of mnodes have never collaborated
before. For sparse hypergraphs, this presents us with the following question. When a newly
formed team consists of mnodes with no past collaborations, is this because people tend
to team up with strangers, or because the underlying hypergraph is sparse? If we want to
understand whether some existing relationship structures are more likely to give rise to future
team formations, we must know what to expect by chance alone. Studying m-patterns in a
null-model of hypergraphs can help us gain intuition about what m-patterns we should expect
to dominate at different hyperedge densities.
We choose to study m-patterns in a hypergraph generalization of the widely-studied
random-graph family known as Erd˝os–R´enyi graphs – or G(N, p). G(N, p) is known to create
unrealistically simple graph structures. Nonetheless, the dyadic G(n, p) model has been a
major driver in the development of the study of networks: it is the simplest random-graph
model, analytically tractable, and its phenomena are correspondingly clear to articulate. We
study a G(N, p)-type model for the same reasons.
In the classic G(N, p) model, an N-vertex random graph is created by inserting each
possible edge with probability p[33]. Various hypergraph generalizations of the G(N, p)
model have been studied in the past [31,32,14,22,23]. We choose to study a version where
a hypergraph with Nnodes and m-vertex hyperedges is created by inserting each possible
hyperedge connecting mnodes with probability p. Since the parameters N,pand mdefine
this hypergraph family, G(m)(N, p) is a natural name to summarize the family. The dyadic
Erd˝os–R´enyi graphs, normally known as G(N, p), would be G(2)(N, p) in this notation.
With the G(m)(N, p) model in hand, we set out to examine how often a new hyperedge
would join mnodes with m-pattern Xby chance, given that the hyperedge is forming in a
hypergraph created using the G(m)(N, p) model with parameters N,pand m. To quantify
this, we create a large number of G(m)(N, p) hypergraphs and count the average fraction
of sets of mnodes that form each pattern Xacross these many random hypergraphs for
choices of hypergraph size N, set size mand as a function of hyperedge probability p. In
Figure 2, we show results obtained for two such simulations. In Figure 2A, the constructed
hypergraphs have size N= 50 and contain hyperedges joining m= 3 nodes. In Figure 2B,
the hypergraphs have size N= 100 and hyperedges join m= 4 nodes. The first thing to
notice about these figures is that, when increasing pfrom 0, all but two m-patterns increase
in prevalence, experience peak prevalence, and finally become less common again. The two
patterns that do not take such journeys are: 1) the pattern in which noone collaborated with
anyone before; and 2) the repeat collaboration. The occurrences of the no-past-collaboration
pattern monotonically declines with p, whereas the repeat-collaboration pattern increases
monotonically with p. These “exceptions” are easily understood: As pincreases, more nodes
become part of m-node hyperedges. A higher pmeans that fewer nodes avoid collaborations
altogether, whereas m-node collaborations (what we also call repeat collaborations) increase
linearly with p.
Having noticed regularities in the general shape of prevalence curves in Fig. 2A, another
interesting observation is that not all patterns get to be the most common for any pin Fig. 2B.
摘要:

HypergraphpatternsandcollaborationstructureJonasL.Juul,AustinR.Bensony,andJonKleinbergyAbstract.Humanscollaborateindi erentcontextssuchasincreativeorscienti cprojects,inworkplacesandinsports.Dependingontheprojectandexternalcircumstances,anewlyformedcollaborationmayincludepeoplethathavecollaboratedb...

展开>> 收起<<
Hypergraph patterns and collaboration structure.pdf

共25页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:25 页 大小:2.31MB 格式:PDF 时间:2025-04-22

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 25
客服
关注