Community as a Vague Operator Epistemological Questions for a Critical Heuristics of Community Detection Algorithms Dominik J. SchindlerMatthew Fuller

2025-04-27 0 0 4.41MB 31 页 10玖币
侵权投诉
Community as a Vague Operator: Epistemological Questions for a
Critical Heuristics of Community Detection Algorithms
Dominik J. SchindlerMatthew Fuller
May 25, 2023
Abstract
In this article, we aim to analyse the nature and epistemic consequences of what figures in
network science as patterns of nodes and edges called ‘communities’. Tracing these patterns
as multi-faceted and ambivalent, we propose to describe the concept of community as a ‘vague
operator’, a variant of Susan Leigh Star’s notion of the boundary object, and propose that
the ability to construct different modes of description that are both vague in some registers
and hyper-precise in others, is core both to digital politics and the analysis of ‘communities’.
Engaging with these formations in terms drawn from mathematics and software studies enables
a wider mapping of their formation. Disentangling different lineages in network science then
allows us to contextualise the founding account of ‘community’ popularised by Michelle Girvan
and Mark Newman in 2002. After studying one particular community detection algorithm, the
widely-used ‘Louvain algorithm’, we comment on controversies arising with some of their more
ambiguous applications. We argue that ‘community’ can act as a real abstraction with the
power to reshape social relations such as producing echo chambers in social networking sites.
To rework the epistemological terms of community detection and propose a reconsideration of
vague operators, we draw on debates and propositions within the literature of network science
to imagine a ‘critical heuristics’ that embraces partiality, epistemic humbleness, reflexivity
and artificiality.
Keywords. community detection; vague operator; boundary object; critical heuristics; network
science; social network analysis; Louvain algorithm; software studies
1 Introduction
Network science emerges as a term in the late nineteen-nineties and consists of a series of ‘content
agnostic’ ways to analyse structures of various kinds as networks or graphs.1It can be understood
as a revival of the much older social network analysis through the influence of physics.2The kind of
things network scientists work on range from the structure of proteins, to relations between social
media posts, to chains of influence in academic research. Tools and approaches from network
science are also often drawn into other fields, to show connections amongst entities as diverse
as members of the ruling class or of criminal trading networks—as developed for instance in the
meticulous work of artist Mark Lombardi3—or to construct a taxonomic characterisation of the
Department of Mathematics, Imperial College London, UK; dominik.schindler19@imperial.ac.uk
Department of Media, Communications and Cultural Studies, Goldsmiths, University of London, UK;
m.fuller@gold.ac.uk
1. M. E. J. Newman et al., eds., The Structure and Dynamics of Networks, Princeton Studies in Complexity
(Princeton: Princeton University Press, 2006).
2. Linton Freeman, “Going the Wrong Way on a One-Way Street: Centrality in Physics and Biology,” Journal
of Social Structure - JoSS, January 1, 2008,
3. Robert Carleton Hobbs, Mark Lombardi: Global Networks, in collab. with Independent Curators International
(New York: Independent Curators International, 2004).
1
arXiv:2210.02753v2 [cs.SI] 24 May 2023
intestinal microbiota involved in gout.4Work in the field and in the applications of its tools seems
to suggest the possibility of finding shared ‘hidden laws’ amongst often very different kinds of
formations.
By the present day, the working vernacular of network visualisations has become a familiar
part of contemporary culture. For instance, Figure 1 and Figure 2 below typify such images. They
are composed of two types of entity, edges or connecting lines and vertices or dots where two or
more lines meet. But what is meant by these patterns of dots and lines? In network science,
the notion of ‘community’ was coined to grapple with these patterns5and ‘community detection
algorithms’ such as the ‘Louvain algorithm’ are used today to discriminate such patterns in large
networks with millions of nodes and edges.6In particular, community detection algorithms can be
interpreted as methods for unsupervised machine learning that are supposed to find patterns in
data without a given ground truth.7To delve into these patterns requires asking questions of their
meaning: what do they stand in for, what do they signify, and what do they create? Further, what
are the ways in which these arrangements of dots and lines, and the calculations that produce
them, have potential cultural and political effects? To address this means recognising these
patterns as a visual articulation of mathematical relationships. In order to hold these two aspects
together, recognising their mutual inherence and differentiation, their particular and conjoint
epistemic dimensions need to be addressed. One of the ways to do this is by understanding the
way in which the notion of community provides in itself something of a conceptual vertex between
different modes of analysis and understanding.
Since social media have incorporated the form of the graph, without, oddly enough, giving
users actual sight of it, social networks have become part of the everyday furniture of social
relations, given for instance in the brute facticity of artifacts like the following to follower ratios
on Twitter, the commonplace of ‘virality’8and the social role of the influencer, a social function
that is in some ways predicated upon the operation of graphs. Such graphs play numerous roles.
We move from a society understood, from some disciplinary or technical perspectives, to
be composed of individuals in networks that can be analysed by means of reserved or neutral
observation to a society of analysis whose givens are networks in which power operations are
implemented. In this set-up it should be of scant surprise that the word community appears
as capable of interpreting many kinds of phenomena at the exact point in time when, if it has
not entirely vanished, community, in its hitherto understood senses—in the social—seems often
to have been mechanised, and often by the very means that redescribe it in more generalisable
terms. In this condition, it is perhaps rather wince-inducing to rifle through the techniques of
network analysis to try, not only to understand them, but to evaluate the conditions in which
they might be worked. Nevertheless, there is something fascinating here, and one of the ways of
understanding the way these techniques not only address but compose the present is by delving
into them.
In this article, we aim to analyse the nature of what figures in network science as a community,
trace the historical lineages of community detection algorithms and examine a specific case study
of an algorithm for community detection and the notion of community it addresses. We introduce
the notion of the ‘vague operator’, a specific kind of boundary object, to describe the various kinds
of interplay between the hyper-precise and the vague that are embodied in the conjuncture of
community and community detection algorithms. We then look into the broader standing of
4. Zhuang Guo et al., “Intestinal Microbiota Distinguish Gout Patients from Healthy Humans,” Scientific Reports
6, no. 1 (1 2016): 20602.
5. M. Girvan and M. E. J. Newman, “Community Structure in Social and Biological Networks,” Proceedings of
the National Academy of Sciences 99, no. 12 (June 11, 2002): 7821–7826.
6. Vincent D. Blondel et al., “Fast Unfolding of Communities in Large Networks,” Journal of Statistical Me-
chanics: Theory and Experiment 2008, no. 10 (October 2008).
7. Trevor Hastie et al., The Elements of Statistical Learning, Springer Series in Statistics (New York: Springer
New York, 2009).
8. Tony D. Sampson, Virality: Contagion Theory in the Age of Networks (Minneapolis: University of Minnesota
Press, 2012).
2
heuristics in relation to algorithmic practices and suggest a ‘critical’ heuristics attuned to the
epistemic politics of ‘vague operators’.
2 Community / Detection
2.1 Lineages of Community Detection Algorithms
Mathematical practices are interwoven with their historical and technological gestation, but are
rarely reducible to them. Computation in turn has changed mathematical ideas and modes
of calculation in multiple ways. 9The uptake of graph theory for network science purposes
coincides with the increased availability of network datasets during the 1990s development of
computer networks and the internet10—which in some ways become both its metaphor and locus
of veridiction, the space where it became true as something natively artificial. To say this is not
to claim that mathematics is simply on the receiving end of history, nor of technical histories.
Mathematics, as a means of thinking that has great capacity of abstraction also contains some
possibility of thinking outside of historical constraints, of over-leaping them, and in this way may
also act as one of their determinants.
Whilst we can take the above considerations into account, the focus of our paper lies on the
mathematical practices that have shaped the central concept of community in network science.
A genealogy of community detection needs to disentangle different lineages that have roots in
other techniques (not named after community) and run in parallel across disciplines, mostly
the social sciences and statistical physics. We can only approximate these lineages due to the
enormous amount of publications involved and so present one narrative only, one that is influenced
by discussions with different practitioners in network science. A certain amount of reticence is
therefore present in this account as we map an initial development in the social sciences and a
subsequent, and initially separate, one developed in statistical physics.
In sociology, social network analysis has a twentieth century history, admirably given by
Katja Mayer in a 2009 article that traces its links to search engine technologies.11 Mayer ar-
gues that social network analysis or sociometry developed alongside related techniques such as
citation analysis, formulated as means for measuring authority and participation in academic
publishing, techniques that soon became extended as a measure for centrality, opportunities for
‘self-realisation’, cultural significance and optimisation amongst other factors. This phenomena
is also perceptively described by Bernhard Rieder in his account of the genealogy of PageRank.12
Aside from this thread of work, the development of methods for what is today called ‘community
detection’ has a longer tradition under different names such as ‘network partitioning’ or ‘clus-
tering’.13 One important predecessor from social network analysis is the mathematically simpler
concept of a graph ‘clique’,14 defined as a set of nodes of which each pair of nodes is connected in
the graph. This concept was used by Duncan Luce and Albert Perry in 1949 to algorithmically
obtain group structures from experimental data about human interactions, arguing “that a set
of more than two people form a clique if they are all mutual friends of one another”.15 Although
9. It has for instance introduced pathways to certain kinds of mathematical objects whose development only
took off with sufficient capacity of calculation. An example would be the development of a renewed interest in
what came to be called fractals, (re)emerging with the PCs of the 1980s. Benoˆıt B. Mandelbrot, The Fractalist:
Memoir of a Scientific Maverick, First vintage books edition (New York: Vintage Books, 2013)
10. Newman et al., The Structure and Dynamics of Networks.
11. Katja Mayer, “On the Sociometry of Search Engines: A Historical Review of Methods,” in Deep Search. The
Politics of Search beyond Google, ed. Konrad Becker and Felix Stalder (Edison, NJ: Transaction, December 9, 2009),
54–72.
12. Bernhard Rieder, “What Is in PageRank? A Historical and Conceptual Investigation of a Recursive Status
Index,” Computational Culture, no. 2 (September 28, 2012).
13. Santo Fortunato, “Community Detection in Graphs,” Physics Reports 486, nos. 3-5 (February 2010): 75–174.
14. Stanley Wasserman and Katherine Faust, Social Network Analysis: Methods and Applications, 8 (Cambridge
; New York: Cambridge University Press, 1994).
15. R. Duncan Luce and Albert D. Perry, “A Method of Matrix Analysis of Group Structure,” Psychometrika
3
their matrix-based approach was less prone to errors than a cumbersome manual investigation of
the data, the mathematical definition of a clique is often too restrictive in applications. Hence,
later concepts in the different lineages of ‘community’ can often be understood as weaker or looser
versions of cliques that allow for sparser relations within groups.
In a review of community detection algorithms, Fortunato traces the origins of community
detection back to a 1955 paper in sociometry by Robert Weiss and Eugene Jacobson, who proposed
a method to deduce working groups from a matrix of work relationships in a complex government
agency.16 Their method of finding groups by reorganizing the matrix representation of a graph
(see Section 2.3 for a definition of the ‘adjacency matrix’ of a graph) corresponding to a sociogram
was first introduced by Elaine Forsyth and Leo Katz in 1946 who in turn developed the famous
sociometric approach to groups introduced by Jacob Moreno in the 1930s.17
We can also trace origins of community detection in psychology and anthropology. In a 1956
paper in psychology, Dorwin Cartwright and Frank Harary used graph theory to introduce the
concept of structural balance to describe “configurations of many different sorts, such as com-
munication networks, power systems, sociometric structures, systems of orientations, or perhaps
neural networks”.18 The image of the later broad applicability of the techniques concerned can be
glimpsed here. Harary, who was a mathematician at the University of Michigan, was interested in
the translation of social science concepts into graph theory and later also worked on applications
in anthropology, where he developed clustering methods for signed graphs to study homophily.19
Yet another thread of the lineage is formed by the use of what are called ‘stochastic block
models’ that find their origins in the social science literature from the 1970s. For a review of
this very wide field see an overview by Lee and Wilkinson.20 In general, stochastic block models
provide notions of ‘structural equivalence’ in graphs where the ‘role’ of a node is determined by
its link structure. Deterministic models were first introduced by a group of sociologists around
Ronald Breiger in 197521 and stochastic models by Paul Holland et al. in 1983.22
A common feature of the techniques developed in the social sciences described above is their
shared goal of determining structurally similar nodes in graphs to identify individuals in social
networks playing similar roles. However, we want to emphasise that social scientists from the
different lineages described above did not use the term ‘community’. Other terms like ‘cohesive
subgroups’23 or ‘balance and clustering phenomena’24 were used instead, each meaning different
things. Moreover, a limiting factor for the development of community detection algorithms in
the social sciences was the absence of computational power in the early years of social network
analysis, where algorithms had to be performed manually in a cumbersome process.
As social network forms become significant in how people understand society, Mayer argues
14, no. 2 (June 1, 1949): p. 97 f.
16. Fortunato, “Community Detection in Graphs”; Robert S. Weiss and Eugene Jacobson, “A Method for the
Analysis of the Structure of Complex Organizations,” American Sociological Review 20, no. 6 (1955): 661–668,
JSTOR: 2088670.
17. Elaine Forsyth and Leo Katz, “A Matrix Approach to the Analysis of Sociometric Data: Preliminary Report,”
Sociometry 9, no. 4 (1946): 340–347, JSTOR: 2785498; Jacob Levy Moreno, Who Shall Survive? A New Approach
to the Problem of Human Interrelations. (Washington: Nervous and Mental Disease Pub. Co., 1934).
18. Dorwin Cartwright and Frank Harary, “Structural Balance: A Generalization of Heider’s Theory,” Psycholog-
ical Review (US) 63, no. 5 (1956): 277–293.
19. Per Hage and Frank Harary, Structural Models in Anthropology, 1st ed. (Cambridge University Press, Febru-
ary 24, 1984).
20. Clement Lee and Darren J. Wilkinson, “A Review of Stochastic Block Models and Extensions for Graph
Clustering,” Applied Network Science 4, no. 1 (1 2019): 1–50.
21. Ronald L Breiger et al., “An Algorithm for Clustering Relational Data with Applications to Social Network
Analysis and Comparison with Multidimensional Scaling,” Journal of Mathematical Psychology 12, no. 3 (August 1,
1975): 328–383.
22. Paul W. Holland et al., “Stochastic Blockmodels: First Steps,” Social Networks 5, no. 2 (June 1, 1983): 109–
137.
23. Wasserman and Faust, Social Network Analysis.
24. Hage and Harary, Structural Models in Anthropology.
4
that they effectively become “behavioural instructions”.25 It is these “instructions”—before the
advent of their machining in social media—that also provide the grounds for another current
of work that sets out approaches in which the idea of the network or a set of contacts has
become something that is more self-consciously to be used or manipulated in order to achieve
certain political ends or social benefits. Work such as Manfred Kochen and Ithiel de Sola Pool’s
“Contacts and Influences”, a manuscript circulating from the early 1950s and published in 1978,26
Stanley Milgram’s 1967 direct experimental work,27 and Mark Granovetter’s 1973 article “The
Strength of Weak Ties”28 exemplify this tendency.
The notion of “weak ties” addressed by such researchers was embraced in mathematical terms
by Watts and Strogatz in 1998.29 One of the interesting aspects of such work that is the idiomatic
kind of movement from the very specific to the general that it stages. This work is predicated
on a particular kind of social connection, a friendship, knowledge of or acquaintance with an
other, a social link, the passing of information from one entity to another, as the key, indeed
sole, unit of analysis. It is predicated on a wager that from this base unit, if precisely logged,
something larger can be agglomerated. Whereas other approaches to understanding the social in
mathematical terms have often worked on the basis of surveying or assembling a population as a
statistics-yielding mass, to be probed by averages and the deviations that yield them, this work
starts ‘from the bottom up’ in a certain way by narrowly fixating on the choreography of what
each different method takes to be a link. It is this movement from the specific to the general that
its enduring attraction also lies, and, it wagers, something like a community can be measured.
As far as we have been able to trace, the physicists Michelle Girvan and Mark Newman were
first to use the term ‘community’ to describe a computational object in network science. In a
highly influential paper from 2002, Girvan and Newman, who were both working at the Santa
Fe Institute in New Mexico at that time, coined the term ‘community’ in this context and also
present what one might call the ‘founding articulation’ of community detection:
“Consider for a moment the case of social networks—networks of friendships or other
acquaintances between individuals. It is a matter of common experience that such
networks seem to have communities in them: subsets of vertices within which vertex-
vertex connections are dense, but between which connections are less dense. [...]
Communities in a social network might represent real social groupings, perhaps by
interest or background”.30
In this description of communities, Girvan and Newman call to the experience of other network
scientists who have noticed similar patterns of dense subgraphs in social interaction networks
before, to suggest that a metaphorical or “commonsense” framing of community can be translated
into network science.31 While ‘community’ refers to the groups of nodes, the problem of finding
communities in networks is called ‘community detection’.32 Interestingly, both terms were first
introduced by physicists and not social scientists, but have become hegemonic since then.33
25. Mayer, “On the Sociometry of Search Engines,” p. 54.
26. Ithiel de Sola Pool and Manfred Kochen, “Contacts and Influence,” Social Networks 1, no. 1 (January 1,
1978): 5–51.
27. S. Milgram, “The Small World Problem,” Psychology Today 2 (1967): 60–67.
28. Mark S. Granovetter, “The Strength of Weak Ties,” American Journal of Sociology 78, no. 6 (May 1973):
1360–1380.
29. Duncan J. Watts and Steven H. Strogatz, “Collective Dynamics of ‘Small-World’ Networks,” Nature 393, no.
6684 (6684 1998): 440–442.
30. Girvan and Newman, “Community Structure in Social and Biological Networks,” p. 7821, our emphasis.
31. The term ‘community’ was also coined as an alternative to ‘cluster’, a popular notion to describe groups of
points in computer science, because the ‘clustering coefficient’ was already an established concept with a different
meaning in network science.
32. M. E. J. Newman, Networks, Second edition (Oxford, United Kingdom ; New York, NY, United States of
America: Oxford University Press, 2018).
33. The 2002 article by Girvan and Newman has become very influential in the field with 13,876 citations [as
of May 2023] according to Semantic Scholar. Waleed Ammar et al., “Construction of the Literature Graph in
5
摘要:

CommunityasaVagueOperator:EpistemologicalQuestionsforaCriticalHeuristicsofCommunityDetectionAlgorithmsDominikJ.Schindler∗MatthewFuller†May25,2023AbstractInthisarticle,weaimtoanalysethenatureandepistemicconsequencesofwhatfiguresinnetworkscienceaspatternsofnodesandedgescalled‘communities’.Tracingthese...

展开>> 收起<<
Community as a Vague Operator Epistemological Questions for a Critical Heuristics of Community Detection Algorithms Dominik J. SchindlerMatthew Fuller.pdf

共31页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:31 页 大小:4.41MB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 31
客服
关注