Community as a Vague Operator Epistemological Questions for a Critical Heuristics of Community Detection Algorithms Dominik J. SchindlerMatthew Fuller

2025-04-27 0 0 4.41MB 31 页 10玖币

侵权投诉

Community as a Vague Operator: Epistemological Questions for a

Critical Heuristics of Community Detection Algorithms

Dominik J. Schindler∗Matthew Fuller†

May 25, 2023

Abstract

In this article, we aim to analyse the nature and epistemic consequences of what ﬁgures in

network science as patterns of nodes and edges called ‘communities’. Tracing these patterns

as multi-faceted and ambivalent, we propose to describe the concept of community as a ‘vague

operator’, a variant of Susan Leigh Star’s notion of the boundary object, and propose that

the ability to construct diﬀerent modes of description that are both vague in some registers

and hyper-precise in others, is core both to digital politics and the analysis of ‘communities’.

Engaging with these formations in terms drawn from mathematics and software studies enables

a wider mapping of their formation. Disentangling diﬀerent lineages in network science then

allows us to contextualise the founding account of ‘community’ popularised by Michelle Girvan

and Mark Newman in 2002. After studying one particular community detection algorithm, the

widely-used ‘Louvain algorithm’, we comment on controversies arising with some of their more

ambiguous applications. We argue that ‘community’ can act as a real abstraction with the

power to reshape social relations such as producing echo chambers in social networking sites.

To rework the epistemological terms of community detection and propose a reconsideration of

vague operators, we draw on debates and propositions within the literature of network science

to imagine a ‘critical heuristics’ that embraces partiality, epistemic humbleness, reﬂexivity

and artiﬁciality.

Keywords. community detection; vague operator; boundary object; critical heuristics; network

science; social network analysis; Louvain algorithm; software studies

1 Introduction

Network science emerges as a term in the late nineteen-nineties and consists of a series of ‘content

agnostic’ ways to analyse structures of various kinds as networks or graphs.1It can be understood

as a revival of the much older social network analysis through the inﬂuence of physics.2The kind of

things network scientists work on range from the structure of proteins, to relations between social

media posts, to chains of inﬂuence in academic research. Tools and approaches from network

science are also often drawn into other ﬁelds, to show connections amongst entities as diverse

as members of the ruling class or of criminal trading networks—as developed for instance in the

meticulous work of artist Mark Lombardi3—or to construct a taxonomic characterisation of the

∗Department of Mathematics, Imperial College London, UK; dominik.schindler19@imperial.ac.uk

†Department of Media, Communications and Cultural Studies, Goldsmiths, University of London, UK;

m.fuller@gold.ac.uk

1. M. E. J. Newman et al., eds., The Structure and Dynamics of Networks, Princeton Studies in Complexity

(Princeton: Princeton University Press, 2006).

2. Linton Freeman, “Going the Wrong Way on a One-Way Street: Centrality in Physics and Biology,” Journal

of Social Structure - JoSS, January 1, 2008,

3. Robert Carleton Hobbs, Mark Lombardi: Global Networks, in collab. with Independent Curators International

(New York: Independent Curators International, 2004).

arXiv:2210.02753v2 [cs.SI] 24 May 2023

intestinal microbiota involved in gout.4Work in the ﬁeld and in the applications of its tools seems

to suggest the possibility of ﬁnding shared ‘hidden laws’ amongst often very diﬀerent kinds of

formations.

By the present day, the working vernacular of network visualisations has become a familiar

part of contemporary culture. For instance, Figure 1 and Figure 2 below typify such images. They

are composed of two types of entity, edges or connecting lines and vertices or dots where two or

more lines meet. But what is meant by these patterns of dots and lines? In network science,

the notion of ‘community’ was coined to grapple with these patterns5and ‘community detection

algorithms’ such as the ‘Louvain algorithm’ are used today to discriminate such patterns in large

networks with millions of nodes and edges.6In particular, community detection algorithms can be

interpreted as methods for unsupervised machine learning that are supposed to ﬁnd patterns in

data without a given ground truth.7To delve into these patterns requires asking questions of their

meaning: what do they stand in for, what do they signify, and what do they create? Further, what

are the ways in which these arrangements of dots and lines, and the calculations that produce

them, have potential cultural and political eﬀects? To address this means recognising these

patterns as a visual articulation of mathematical relationships. In order to hold these two aspects

together, recognising their mutual inherence and diﬀerentiation, their particular and conjoint

epistemic dimensions need to be addressed. One of the ways to do this is by understanding the

way in which the notion of community provides in itself something of a conceptual vertex between

diﬀerent modes of analysis and understanding.

Since social media have incorporated the form of the graph, without, oddly enough, giving

users actual sight of it, social networks have become part of the everyday furniture of social

relations, given for instance in the brute facticity of artifacts like the following to follower ratios

on Twitter, the commonplace of ‘virality’8and the social role of the inﬂuencer, a social function

that is in some ways predicated upon the operation of graphs. Such graphs play numerous roles.

We move from a society understood, from some disciplinary or technical perspectives, to

be composed of individuals in networks that can be analysed by means of reserved or neutral

observation to a society of analysis whose givens are networks in which power operations are

implemented. In this set-up it should be of scant surprise that the word community appears

as capable of interpreting many kinds of phenomena at the exact point in time when, if it has

not entirely vanished, community, in its hitherto understood senses—in the social—seems often

to have been mechanised, and often by the very means that redescribe it in more generalisable

terms. In this condition, it is perhaps rather wince-inducing to riﬂe through the techniques of

network analysis to try, not only to understand them, but to evaluate the conditions in which

they might be worked. Nevertheless, there is something fascinating here, and one of the ways of

understanding the way these techniques not only address but compose the present is by delving

into them.

In this article, we aim to analyse the nature of what ﬁgures in network science as a community,

trace the historical lineages of community detection algorithms and examine a speciﬁc case study

of an algorithm for community detection and the notion of community it addresses. We introduce

the notion of the ‘vague operator’, a speciﬁc kind of boundary object, to describe the various kinds

of interplay between the hyper-precise and the vague that are embodied in the conjuncture of

community and community detection algorithms. We then look into the broader standing of

4. Zhuang Guo et al., “Intestinal Microbiota Distinguish Gout Patients from Healthy Humans,” Scientiﬁc Reports

6, no. 1 (1 2016): 20602.

5. M. Girvan and M. E. J. Newman, “Community Structure in Social and Biological Networks,” Proceedings of

the National Academy of Sciences 99, no. 12 (June 11, 2002): 7821–7826.

6. Vincent D. Blondel et al., “Fast Unfolding of Communities in Large Networks,” Journal of Statistical Me-

chanics: Theory and Experiment 2008, no. 10 (October 2008).

7. Trevor Hastie et al., The Elements of Statistical Learning, Springer Series in Statistics (New York: Springer

New York, 2009).

8. Tony D. Sampson, Virality: Contagion Theory in the Age of Networks (Minneapolis: University of Minnesota

Press, 2012).

heuristics in relation to algorithmic practices and suggest a ‘critical’ heuristics attuned to the

epistemic politics of ‘vague operators’.

2 Community / Detection

2.1 Lineages of Community Detection Algorithms

Mathematical practices are interwoven with their historical and technological gestation, but are

rarely reducible to them. Computation in turn has changed mathematical ideas and modes

of calculation in multiple ways. 9The uptake of graph theory for network science purposes

coincides with the increased availability of network datasets during the 1990s development of

computer networks and the internet10—which in some ways become both its metaphor and locus

of veridiction, the space where it became true as something natively artiﬁcial. To say this is not

to claim that mathematics is simply on the receiving end of history, nor of technical histories.

Mathematics, as a means of thinking that has great capacity of abstraction also contains some

possibility of thinking outside of historical constraints, of over-leaping them, and in this way may

also act as one of their determinants.

Whilst we can take the above considerations into account, the focus of our paper lies on the

mathematical practices that have shaped the central concept of community in network science.

A genealogy of community detection needs to disentangle diﬀerent lineages that have roots in

other techniques (not named after community) and run in parallel across disciplines, mostly

the social sciences and statistical physics. We can only approximate these lineages due to the

enormous amount of publications involved and so present one narrative only, one that is inﬂuenced

by discussions with diﬀerent practitioners in network science. A certain amount of reticence is

therefore present in this account as we map an initial development in the social sciences and a

subsequent, and initially separate, one developed in statistical physics.

In sociology, social network analysis has a twentieth century history, admirably given by

Katja Mayer in a 2009 article that traces its links to search engine technologies.11 Mayer ar-

gues that social network analysis or sociometry developed alongside related techniques such as

citation analysis, formulated as means for measuring authority and participation in academic

publishing, techniques that soon became extended as a measure for centrality, opportunities for

‘self-realisation’, cultural signiﬁcance and optimisation amongst other factors. This phenomena

is also perceptively described by Bernhard Rieder in his account of the genealogy of PageRank.12

Aside from this thread of work, the development of methods for what is today called ‘community

detection’ has a longer tradition under diﬀerent names such as ‘network partitioning’ or ‘clus-

tering’.13 One important predecessor from social network analysis is the mathematically simpler

concept of a graph ‘clique’,14 deﬁned as a set of nodes of which each pair of nodes is connected in

the graph. This concept was used by Duncan Luce and Albert Perry in 1949 to algorithmically

obtain group structures from experimental data about human interactions, arguing “that a set

of more than two people form a clique if they are all mutual friends of one another”.15 Although

9. It has for instance introduced pathways to certain kinds of mathematical objects whose development only

took oﬀ with suﬃcient capacity of calculation. An example would be the development of a renewed interest in

what came to be called fractals, (re)emerging with the PCs of the 1980s. Benoˆıt B. Mandelbrot, The Fractalist:

Memoir of a Scientiﬁc Maverick, First vintage books edition (New York: Vintage Books, 2013)

10. Newman et al., The Structure and Dynamics of Networks.

11. Katja Mayer, “On the Sociometry of Search Engines: A Historical Review of Methods,” in Deep Search. The

Politics of Search beyond Google, ed. Konrad Becker and Felix Stalder (Edison, NJ: Transaction, December 9, 2009),

54–72.

12. Bernhard Rieder, “What Is in PageRank? A Historical and Conceptual Investigation of a Recursive Status

Index,” Computational Culture, no. 2 (September 28, 2012).

13. Santo Fortunato, “Community Detection in Graphs,” Physics Reports 486, nos. 3-5 (February 2010): 75–174.

14. Stanley Wasserman and Katherine Faust, Social Network Analysis: Methods and Applications, 8 (Cambridge

; New York: Cambridge University Press, 1994).

15. R. Duncan Luce and Albert D. Perry, “A Method of Matrix Analysis of Group Structure,” Psychometrika

their matrix-based approach was less prone to errors than a cumbersome manual investigation of

the data, the mathematical deﬁnition of a clique is often too restrictive in applications. Hence,

later concepts in the diﬀerent lineages of ‘community’ can often be understood as weaker or looser

versions of cliques that allow for sparser relations within groups.

In a review of community detection algorithms, Fortunato traces the origins of community

detection back to a 1955 paper in sociometry by Robert Weiss and Eugene Jacobson, who proposed

a method to deduce working groups from a matrix of work relationships in a complex government

agency.16 Their method of ﬁnding groups by reorganizing the matrix representation of a graph

(see Section 2.3 for a deﬁnition of the ‘adjacency matrix’ of a graph) corresponding to a sociogram

was ﬁrst introduced by Elaine Forsyth and Leo Katz in 1946 who in turn developed the famous

sociometric approach to groups introduced by Jacob Moreno in the 1930s.17

We can also trace origins of community detection in psychology and anthropology. In a 1956

paper in psychology, Dorwin Cartwright and Frank Harary used graph theory to introduce the

concept of structural balance to describe “conﬁgurations of many diﬀerent sorts, such as com-

munication networks, power systems, sociometric structures, systems of orientations, or perhaps

neural networks”.18 The image of the later broad applicability of the techniques concerned can be

glimpsed here. Harary, who was a mathematician at the University of Michigan, was interested in

the translation of social science concepts into graph theory and later also worked on applications

in anthropology, where he developed clustering methods for signed graphs to study homophily.19

Yet another thread of the lineage is formed by the use of what are called ‘stochastic block

models’ that ﬁnd their origins in the social science literature from the 1970s. For a review of

this very wide ﬁeld see an overview by Lee and Wilkinson.20 In general, stochastic block models

provide notions of ‘structural equivalence’ in graphs where the ‘role’ of a node is determined by

its link structure. Deterministic models were ﬁrst introduced by a group of sociologists around

Ronald Breiger in 197521 and stochastic models by Paul Holland et al. in 1983.22

A common feature of the techniques developed in the social sciences described above is their

shared goal of determining structurally similar nodes in graphs to identify individuals in social

networks playing similar roles. However, we want to emphasise that social scientists from the

diﬀerent lineages described above did not use the term ‘community’. Other terms like ‘cohesive

subgroups’23 or ‘balance and clustering phenomena’24 were used instead, each meaning diﬀerent

things. Moreover, a limiting factor for the development of community detection algorithms in

the social sciences was the absence of computational power in the early years of social network

analysis, where algorithms had to be performed manually in a cumbersome process.

As social network forms become signiﬁcant in how people understand society, Mayer argues

14, no. 2 (June 1, 1949): p. 97 f.

16. Fortunato, “Community Detection in Graphs”; Robert S. Weiss and Eugene Jacobson, “A Method for the

Analysis of the Structure of Complex Organizations,” American Sociological Review 20, no. 6 (1955): 661–668,

JSTOR: 2088670.

17. Elaine Forsyth and Leo Katz, “A Matrix Approach to the Analysis of Sociometric Data: Preliminary Report,”

Sociometry 9, no. 4 (1946): 340–347, JSTOR: 2785498; Jacob Levy Moreno, Who Shall Survive? A New Approach

to the Problem of Human Interrelations. (Washington: Nervous and Mental Disease Pub. Co., 1934).

18. Dorwin Cartwright and Frank Harary, “Structural Balance: A Generalization of Heider’s Theory,” Psycholog-

ical Review (US) 63, no. 5 (1956): 277–293.

19. Per Hage and Frank Harary, Structural Models in Anthropology, 1st ed. (Cambridge University Press, Febru-

ary 24, 1984).

20. Clement Lee and Darren J. Wilkinson, “A Review of Stochastic Block Models and Extensions for Graph

Clustering,” Applied Network Science 4, no. 1 (1 2019): 1–50.

21. Ronald L Breiger et al., “An Algorithm for Clustering Relational Data with Applications to Social Network

Analysis and Comparison with Multidimensional Scaling,” Journal of Mathematical Psychology 12, no. 3 (August 1,

1975): 328–383.

22. Paul W. Holland et al., “Stochastic Blockmodels: First Steps,” Social Networks 5, no. 2 (June 1, 1983): 109–

137.

23. Wasserman and Faust, Social Network Analysis.

24. Hage and Harary, Structural Models in Anthropology.

that they eﬀectively become “behavioural instructions”.25 It is these “instructions”—before the

advent of their machining in social media—that also provide the grounds for another current

of work that sets out approaches in which the idea of the network or a set of contacts has

become something that is more self-consciously to be used or manipulated in order to achieve

certain political ends or social beneﬁts. Work such as Manfred Kochen and Ithiel de Sola Pool’s

“Contacts and Inﬂuences”, a manuscript circulating from the early 1950s and published in 1978,26

Stanley Milgram’s 1967 direct experimental work,27 and Mark Granovetter’s 1973 article “The

Strength of Weak Ties”28 exemplify this tendency.

The notion of “weak ties” addressed by such researchers was embraced in mathematical terms

by Watts and Strogatz in 1998.29 One of the interesting aspects of such work that is the idiomatic

kind of movement from the very speciﬁc to the general that it stages. This work is predicated

on a particular kind of social connection, a friendship, knowledge of or acquaintance with an

other, a social link, the passing of information from one entity to another, as the key, indeed

sole, unit of analysis. It is predicated on a wager that from this base unit, if precisely logged,

something larger can be agglomerated. Whereas other approaches to understanding the social in

mathematical terms have often worked on the basis of surveying or assembling a population as a

statistics-yielding mass, to be probed by averages and the deviations that yield them, this work

starts ‘from the bottom up’ in a certain way by narrowly ﬁxating on the choreography of what

each diﬀerent method takes to be a link. It is this movement from the speciﬁc to the general that

its enduring attraction also lies, and, it wagers, something like a community can be measured.

As far as we have been able to trace, the physicists Michelle Girvan and Mark Newman were

ﬁrst to use the term ‘community’ to describe a computational object in network science. In a

highly inﬂuential paper from 2002, Girvan and Newman, who were both working at the Santa

Fe Institute in New Mexico at that time, coined the term ‘community’ in this context and also

present what one might call the ‘founding articulation’ of community detection:

“Consider for a moment the case of social networks—networks of friendships or other

acquaintances between individuals. It is a matter of common experience that such

networks seem to have communities in them: subsets of vertices within which vertex-

vertex connections are dense, but between which connections are less dense. [...]

Communities in a social network might represent real social groupings, perhaps by

interest or background”.30

In this description of communities, Girvan and Newman call to the experience of other network

scientists who have noticed similar patterns of dense subgraphs in social interaction networks

before, to suggest that a metaphorical or “commonsense” framing of community can be translated

into network science.31 While ‘community’ refers to the groups of nodes, the problem of ﬁnding

communities in networks is called ‘community detection’.32 Interestingly, both terms were ﬁrst

introduced by physicists and not social scientists, but have become hegemonic since then.33

25. Mayer, “On the Sociometry of Search Engines,” p. 54.

26. Ithiel de Sola Pool and Manfred Kochen, “Contacts and Inﬂuence,” Social Networks 1, no. 1 (January 1,

1978): 5–51.

27. S. Milgram, “The Small World Problem,” Psychology Today 2 (1967): 60–67.

28. Mark S. Granovetter, “The Strength of Weak Ties,” American Journal of Sociology 78, no. 6 (May 1973):

1360–1380.

29. Duncan J. Watts and Steven H. Strogatz, “Collective Dynamics of ‘Small-World’ Networks,” Nature 393, no.

6684 (6684 1998): 440–442.

30. Girvan and Newman, “Community Structure in Social and Biological Networks,” p. 7821, our emphasis.

31. The term ‘community’ was also coined as an alternative to ‘cluster’, a popular notion to describe groups of

points in computer science, because the ‘clustering coeﬃcient’ was already an established concept with a diﬀerent

meaning in network science.

32. M. E. J. Newman, Networks, Second edition (Oxford, United Kingdom ; New York, NY, United States of

America: Oxford University Press, 2018).

33. The 2002 article by Girvan and Newman has become very inﬂuential in the ﬁeld with 13,876 citations [as

of May 2023] according to Semantic Scholar. Waleed Ammar et al., “Construction of the Literature Graph in

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

CommunityasaVagueOperator:EpistemologicalQuestionsforaCriticalHeuristicsofCommunityDetectionAlgorithmsDominikJ.Schindler∗MatthewFuller†May25,2023AbstractInthisarticle,weaimtoanalysethenatureandepistemicconsequencesofwhatfiguresinnetworkscienceaspatternsofnodesandedgescalled‘communities’.Tracingthese...

展开>> 收起<<

Community as a Vague Operator Epistemological Questions for a Critical Heuristics of Community Detection Algorithms Dominik J. SchindlerMatthew Fuller.pdf

共31页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Community as a Vague Operator Epistemological Questions for a Critical Heuristics of Community Detection Algorithms Dominik J. SchindlerMatthew Fuller

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: