Hypergraph patterns and collaboration structure

2025-04-22 0 0 2.31MB 25 页 10玖币

侵权投诉

Jonas L. Juul ∗, Austin R. Benson†,and Jon Kleinberg†

Abstract. Humans collaborate in diﬀerent contexts such as in creative or scientiﬁc projects, in workplaces and

in sports. Depending on the project and external circumstances, a newly formed collaboration may

include people that have collaborated before in the past, and people with no collaboration history.

Such existing relationships between team members have been reported to inﬂuence the performance

of teams. However, it is not clear how existing relationships between team members should be

quantiﬁed, and whether some relationships are more likely to occur in new collaborations than others.

Here we introduce a new family of structural patterns, m-patterns, which formalize relationships

between collaborators and we study the prevalence of such structures in data and a simple random-

hypergraph null model. We analyze the frequency with which diﬀerent collaboration structures

appear in our null model and show how such frequencies depend on size and hyperedge density in

the hypergraphs. Comparing the null model to data of human and non-human collaborations, we

ﬁnd that some collaboration structures are vastly under- and overrepresented in empirical datasets.

Finally, we ﬁnd that structures of scientiﬁc collaborations on COVID-19 papers in some cases are

statistically signiﬁcantly diﬀerent from those of non-COVID-19 papers. Examining citation counts

for 4 diﬀerent scientiﬁc ﬁelds, we also ﬁnd indications that repeat collaborations are more successful

for 2-author scientiﬁc publications and less successful for 3-author scientiﬁc publications as compared

to other collaboration structures.

Key words. Hypergraphs, team performance, collaboration structure, COVID-19, random graphs

AMS subject classiﬁcations. 05C65, 05C80, 91D30, 91F99

1. Introduction. When a new team forms, who are likely to be members of this team?

Who are unlikely to join forces? Are some team constellations better suited for solving some

tasks than others? How do external circumstances such as tight deadlines or empty schedules

aﬀect how and which teams form?

The questions above arise in all of the diﬀerent settings where team formation and per-

formance are important. Indeed, in online collaboration over the Web [34], creative undertak-

ings [20], technology and science [39] and school [37], group size and the structure of social

ties in the group have been reported to be of importance for the performance of teams. Al-

though this diversity of settings already make the questions rich, they become even richer

when one considers the plethora of external circumstances that can inﬂuence team formation

in each of the settings. Take the COVID-19 pandemic; when researchers needed to quickly

mobilize, analyze the spread of the disease, and its impact on society, did they work primar-

ily in tightly-knit groups with a history of collaboration? Or did the interdisciplinary and

high-stakes nature of the research questions make scholars work in diverse and untried teams?

Both of the hypotheses above are reasonable and demand serious consideration. But how

does one formalize the notion of a tightly-knit or a novel team structure? The essential thing

to quantify in these concepts is the relationship between the members of the newly formed

team. What were these people doing before they joined forces? Did subsets of the team work

∗Center for Applied Mathematics, Cornell University, Ithaca, New York 14853, USA (jjuul@cornell.edu)

†Department of Computer Science, Cornell University, Ithaca, New York 14853, USA.

arXiv:2210.02163v1 [cs.SI] 5 Oct 2022

2 J. L. JUUL, A. R. BENSON, AND J. KLEINBERG

together before, and did others not?

Examples from popular culture richly illustrate the relevance of examining the existing

relationships between team members in successful undertakings. For example, the American

rock band Audioslave rose to popularity after being formed by Soundgarten singer Chris Cor-

nell and 3 former members of Rage Against the Machine: Tom Morello, Tim Commerford, and

Brad Wilk. In studio sessions, it is also common for groups of musicians to perform together

repeatedly; the horn section of the legendary R&B-band Tower of Power have appeared to-

gether on a large number of other artists’ recordings. In technology, the company Bumble was

founded by three Tinder departees (Whitney Wolfe Herd, Chris Gulczynski and Sarah Mick)

and Badoo-CEO and acquaintance of Wolfe Herd’s, Andrey Andreev. In movies, Samuel L.

Jackson stars in several Quentin Tarantino movies, and Charlotte Gainsbourg plays leading

roles in 3 of director Lars Von Trier’s recent works.

To formally study the formation of teams and existing relationships between team mem-

bers, it is useful to use the language of hypergraphs. In the hypergraph framework, people are

represented by nodes, and connections – called hyperedges – can connect groups of nodes of

any size that have worked together in the past. The focus on hypergraphs as representations

of networked systems, has gained considerable traction in recent years [6,4,3,7], following

two decades of intense study of graphs with only dyadic interactions [33,21,19].

Many of the questions being pursued in this recent work on hypergraphs are generalizations

of concepts from the well-known world of dyadic interactions. These include questions regard-

ing hypergraph modularity [24,27,13,41,40,5,11,1], higher-order assortativity [38,28],

simplicial closure [4], hypergraph motifs and other structural patterns [30,26], construction of

synthetic hypergraphs with certain characteristics [15,16,9,43,25,12,18], and how to infer

higher-order network structure from data [2,42]. The introduction of higher-order connections

also makes it possible to ask completely new questions about the structure of the networked

system. For example, a recent paper examined how hyperedges overlap in empirical hyper-

graphs [29]. Such a question would be trivial in the world of dyadic interactions, as dyadic

interactions can only overlap in their two endpoints. In hypergraphs, however, the question

is meaningful since diﬀerent hyperedges could contain identical subsets of the network nodes.

In this paper, we introduce a new family of structural patterns in hypergraphs, designed

to capture the prior associations of the nodes making up a given hyperedge. We call these

m-patterns, and they represent the existing relationship between groups of mnodes. These

relationships are exactly the above-mentioned quantity of interest when studying the formation

of teams of size m.

Formally, m-patterns are subhypergraphs of size m. The subhypergraph consists of the m

nodes under consideration, all hyperedges connecting subsets of the m-nodes, and fractions

of hyperedges that connect subsets of the m-nodes to hypergraph nodes other than the m

under consideration. The inclusion of fractions of hyperedges causes m-patterns to quantify

structure between the level of nodes and hyperedges. This makes m-patterns diﬀerent from

motifs and a new kind of microstructure that exists in hypergraphs, but not in graphs with

dyadic interactions only.

After having introduced m-patterns, we argue that the prevalence of diﬀerent m-patterns

are expected to depend on hypergraph characteristics such as hyperedge density. To under-

stand this dependency, we examine how prevalence of m-patterns change with parameters in

HYPERGRAPH PATTERNS AND COLLABORATION STRUCTURE 3

aG(N, p)-like model. We proceed to compare these null-model results to m-pattern preva-

lence in a wide range of datasets on human collaborations, drug networks, email networks and

online tagging data. We then examine whether collaboration structure can be inﬂuenced by

external circumstances such as tight schedules. We do this by comparing collaboration struc-

ture in scientiﬁc preprints and early preprints of COVID-19 papers. Finally, we investigate

whether future citations of academic publications correlate with collaboration team structure;

speciﬁcally, we compare citation counts for repeat collaborations and ﬁrst-time collaborations

without ﬁrst-time authors.

2. m-patterns in random hypergraphs. Let us now proceed to studying past relationships

between nodes in hyperedge formation. Our ﬁrst step will be to study a simple model of

random hypergraphs. Later, we will move from such synethetic hypergraphs and analyze

node relationships in empirical hypergraphs. Before we can make any of these analyses,

however, we must introduce the mathematical structures that we will use to understand node

relationships in hyperedge formation.

2.1. A structural pattern to summarize past relationships. To deﬁne the topic of this

paper, m-patterns, we will need some other concepts. The ﬁrst of these is the notion of an

induced subhypergraph [10].

Deﬁnition 2.1. Induced subhypergraph. The induced subhypergraph of a hypergraph H=

(V,E)on mnodes, VI, is a hypergraph HI= (VI,EI). For each e∈ E that contains at least

one node from VI,EIcontains a hyperedge e0linking all nodes that are both in eand VI.

It is clear that an induced subhypergraph completely summarizes all existing relationships

between its constituting nodes. The ﬁnal sentence of Deﬁnition 2.1 means that HIcontains

fractions of the hyperedges of H. This makes the induced subhypergraph an interesting object

for hypergraphs. For graphs, fractions of edges are simply vertices, and so the graph equivalent

of this deﬁnition would just be a subgraph on mchosen nodes. If we do not need the entire

relationship history between nodes, but are content with summarizing the largest subsets of

nodes that have collaborated in the past, the following deﬁnition is useful.

Deﬁnition 2.2. Maximal induced subhypergraph. The maximal induced subhypergraph

HI= (VI,EI)of a hypergraph H= (V,E)on mnodes, VI, is the corresponding induced

subhypergraph made simple by removing all hyperedges from EIthat are entirely contained in

other hyperedges in EI.

The key diﬀerence between an induced subhypergraph and a maximal induced subhyper-

graph is that the latter is simple. A simple hypergraph is deﬁned as follows [8].

Deﬁnition 2.3. Simple hypergraph. A hypergraph H= (V,E)is simple if none of its

hyperedges are entirely contained in another, @si, sj∈ E :si⊆sj.

Notice that simple hypergraphs are diﬀerent from simple graphs in that simple hypergraphs

can contain self-looping hyperedges. We note that the hypergraphs we consider in this paper

generally are not simple. Simple hypergraphs play a diﬀerent role in this story. Because

simple hypergraphs cannot have parallel edges there exists only a ﬁnite number of diﬀerent

such hypergraphs of size m. This is a nice feature if we are interested in quantifying typical

relationship structures among people that choose to form teams. This is exactly what we

4 J. L. JUUL, A. R. BENSON, AND J. KLEINBERG

Rage Against the Machine

T. MorelloC. Cornell

T. Commerford

B. Wilk

Bumble

S. MickA. Andreev

C. GulczynskiW. W. Herd

Regndans - Danseorkestret

S. Kupha

R. Elliot

G.Adams

E. Castillo

L. Thornburg J. Andersen

M. Mørch

K. Hornum P. Andersen

O. Arnfred

A. Jacoby R. Kærså

Figure 1. Illustration of the relationship between individual members of Audioslave, Bumble founders and

musicians on recording of Regndans by Danseorkestret.

are interested in, so we refer to these ﬁnitely many relationship structures on mnodes as

m-patterns.

Deﬁnition 2.4. m-pattern. A simple hypergraph with mvertices is an m-pattern.

With the concept of an m-pattern in hand, we are now ready to look for instances of

m-patterns in larger hypergraphs.

Deﬁnition 2.5. Instance of an m-pattern. An instance of an m-pattern Xin the hy-

pergraph H= (V,E)is a maximal induced subhypergraph X0on mnodes which is isomorphic

to X.

Figure 1illustrates what such m-patterns from maximally induced subhypergraphs might

look like. The ﬁgure shows three collaborations. Some people in these collaborations have

worked together previously – perhaps in larger groups. Such larger collaborations become

k-node hyperedges in the m-patterns that the collaboration structure form.

With the deﬁnition of m-patterns, and their instances in hypergraphs, we now have a

formal way of talking about existing relationships between hypergraph vertices. In particular,

when a new team of mindividuals appears, we consider the team members’ past history of

interactions to be the m-pattern consisting of all maximal subsets that have worked together

before.

Deﬁnition 2.6. Instance of a labelled m-pattern. An instance of a labelled m-pattern

Xwith assigned vertex labels 1,2. . . m in the hypergraph H= (V,E)is a maximal induced

subhypergraph X0on mnodes with assigned vertex labels 1,2, . . . , m which is isomorphic to X

and where corresponding vertices have the same assigned labels as in X.

In Appendix Section A, we illustrate connections between some of the concepts introduced

in this section.

We are now ready to examine what m-patterns among nodes precede hyperedge formation

in hypergraphs. In the following subsection, we will do so in a class of synthetic random

hypergraphs.

HYPERGRAPH PATTERNS AND COLLABORATION STRUCTURE 5

2.2. G(m)(N, p)model of random hypergraphs. From Deﬁnition 2.4 and 2.5, it is clear

that the structure of the underlying hypergraph Hgreatly inﬂuences what m-patterns that

can exist among sets of mnodes, and what multiplicity these m-patterns might have in the

hypergraph. If the hypergraph is very sparse, most sets of mnodes have never collaborated

before. For sparse hypergraphs, this presents us with the following question. When a newly

formed team consists of mnodes with no past collaborations, is this because people tend

to team up with strangers, or because the underlying hypergraph is sparse? If we want to

understand whether some existing relationship structures are more likely to give rise to future

team formations, we must know what to expect by chance alone. Studying m-patterns in a

null-model of hypergraphs can help us gain intuition about what m-patterns we should expect

to dominate at diﬀerent hyperedge densities.

We choose to study m-patterns in a hypergraph generalization of the widely-studied

random-graph family known as Erd˝os–R´enyi graphs – or G(N, p). G(N, p) is known to create

unrealistically simple graph structures. Nonetheless, the dyadic G(n, p) model has been a

major driver in the development of the study of networks: it is the simplest random-graph

model, analytically tractable, and its phenomena are correspondingly clear to articulate. We

study a G(N, p)-type model for the same reasons.

In the classic G(N, p) model, an N-vertex random graph is created by inserting each

possible edge with probability p[33]. Various hypergraph generalizations of the G(N, p)

model have been studied in the past [31,32,14,22,23]. We choose to study a version where

a hypergraph with Nnodes and m-vertex hyperedges is created by inserting each possible

hyperedge connecting mnodes with probability p. Since the parameters N,pand mdeﬁne

this hypergraph family, G(m)(N, p) is a natural name to summarize the family. The dyadic

Erd˝os–R´enyi graphs, normally known as G(N, p), would be G(2)(N, p) in this notation.

With the G(m)(N, p) model in hand, we set out to examine how often a new hyperedge

would join mnodes with m-pattern Xby chance, given that the hyperedge is forming in a

hypergraph created using the G(m)(N, p) model with parameters N,pand m. To quantify

this, we create a large number of G(m)(N, p) hypergraphs and count the average fraction

of sets of mnodes that form each pattern Xacross these many random hypergraphs for

choices of hypergraph size N, set size mand as a function of hyperedge probability p. In

Figure 2, we show results obtained for two such simulations. In Figure 2A, the constructed

hypergraphs have size N= 50 and contain hyperedges joining m= 3 nodes. In Figure 2B,

the hypergraphs have size N= 100 and hyperedges join m= 4 nodes. The ﬁrst thing to

notice about these ﬁgures is that, when increasing pfrom 0, all but two m-patterns increase

in prevalence, experience peak prevalence, and ﬁnally become less common again. The two

patterns that do not take such journeys are: 1) the pattern in which noone collaborated with

anyone before; and 2) the repeat collaboration. The occurrences of the no-past-collaboration

pattern monotonically declines with p, whereas the repeat-collaboration pattern increases

monotonically with p. These “exceptions” are easily understood: As pincreases, more nodes

become part of m-node hyperedges. A higher pmeans that fewer nodes avoid collaborations

altogether, whereas m-node collaborations (what we also call repeat collaborations) increase

linearly with p.

Having noticed regularities in the general shape of prevalence curves in Fig. 2A, another

interesting observation is that not all patterns get to be the most common for any pin Fig. 2B.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

HypergraphpatternsandcollaborationstructureJonasL.Juul,AustinR.Bensony,andJonKleinbergyAbstract.Humanscollaborateindierentcontextssuchasincreativeorscienticprojects,inworkplacesandinsports.Dependingontheprojectandexternalcircumstances,anewlyformedcollaborationmayincludepeoplethathavecollaboratedb...

展开>> 收起<<

Hypergraph patterns and collaboration structure.pdf

共25页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Hypergraph patterns and collaboration structure

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: