THEME AND TOPIC HOWQUALITATIVE RESEARCH AND TOPIC MODELING CANBEBROUGHT TOGETHER Marco Gillies

2025-05-06 0 0 336.47KB 12 页 10玖币

侵权投诉

THEME AND TOPIC: HOW QUALITATIVE RESEARCH AND TOPIC

MODELING CAN BEBROUGHT TOGETHER

Marco Gillies

Department of Computing

Goldsmiths, University of London, UK

m.gillies@gold.ac.uk

Dhiraj Murthy

School of Journalism and Media

University of Texas at Austin, USA

dhiraj.murthy@austin.utexas.edu

Harry Brenton

BespokeVR

London, UK

harry@bespokeVR.com

Rapheal Olaniyan

Department of Computing

Goldsmiths, University of London, UK

rolan001@gold.ac.uk

ABSTRACT

Qualitative research is an approach to understanding social phenomenon based around human

interpretation of data, particularly text. Probabilistic topic modelling is a machine learning approach

that is also based around the analysis of text and often is used to in order to understand social

phenomena. Both of these approaches aim to extract important themes or topics in a textual corpus

and therefore we may see them as analogous to each other. However there are also considerable

differences in how the two approaches function. One is a highly human interpretive process, the other

is automated and statistical. In this paper we use this analogy as the basis for our Theme and Topic

system, a tool for qualitative researchers to conduct textual research that integrates topic modelling

into an accessible interface. This is an example of a more general approach to the design of interactive

machine learning systems in which existing human professional processes can be used as the model

for processes involving machine learning. This has the particular beneﬁt of providing a familiar

approach to existing professionals, that may can make machine learning seem less alien and easier to

learn. Our design approach has two elements. We ﬁrst investigate the steps professionals go through

when performing tasks and design a workﬂow for Theme and Topic that integrates machine learning.

We then designed interfaces for topic modelling in which familiar concepts from qualitative research

are mapped onto machine learning concepts. This makes these the machine learning concepts more

familiar and easier to learn for qualitative researchers.

Keywords Topic Modeling ·Qualitative Research ·Conceptual Models ·Social Media ·HCI ·Mixed Methods

1 Introduction

This paper investigates an analogy between two seemingly different research methods: qualitative research and proba-

bilistic topic modeling. Qualitative research involves detailed reading of texts resulting in a rich human interpretation. It

can give highly nuanced and insightful analyses but is very time consuming and generally cannot be scaled up to the

volumes involved with “Big Data”. Topic modeling on the other hand is an automated method that is very well suited

to big data, but which lacks the nuance of human interpretation. However, we argue that they both share a common

goal: discovering a number of underlying themes within data. This paper investigates this analogy and ask whether it is

possible to use this analogy to bring the two methods closer together. We assess the possibility critically and raise as

many questions as it answers, while being a starting point for future research.

This analogy presents an example of a more general approach to designing interactive systems that are based on machine

learning by taking an analogy between a machine learning approach and an existing human task. Following in this vein,

we seek to design software that is more accessible to existing professionals than a traditional machine learning system,

arXiv:2210.00707v1 [cs.HC] 3 Oct 2022

Theme and Topic

which can be daunting at ﬁrst. In particular by working around existing concepts and workﬂows we may present the

elements of a machine learning system to users in a way that is readily interpretable using their prior knowledge.

This paper presents an example of this design approach via the design of a research system that integrates machine

learning with workﬂows drawn from qualitative research. One pitfall of this approach is that inevitably the machine

learning algorithm and the human task will differ in potentially subtle ways that could result in confusion or be

misleading. Another important challenge of this approach is to be aware of these potential differences and just as we

highlight similarities in the design of an interactive interface we must particularly highlight cases where the actions of

the machine learning algorithm may differ from what will be a standard human interpretation. The example presented

in this paper highlights both these factors of similarities and differences.

2 Qualitative Research

Qualitative Research is a name given to a wide range of research techniques in the social sciences and related disciplines

(including HCI[

]) based on a detailed, human reading of textual or similar data with the aim of developing themes or

theories that take a qualitative form. Qualitative methods are often contrasted with quantitative research methods based

on statistical analysis[

] (which generally take the form of hypothesis testing, and so differ from the machine learning

techniques often used in big data analysis). Qualitative research encompasses a very wide range of diverse methods and

methodologies [

]. In this paper we will focus on two of the most popular and broadly applicable: Grounded

Theory[3, 7, 8] and Thematic Analysis[4].

Grounded Theory ([

]) is a methodology for developing fully formed, novel theories that are “grounded” in a

close analysis of qualitative data. It has been used extensively in HCI[

]. Grounded Theory is usually an emergent

process. Speciﬁcally, data collection and analysis can involve several passes, new variables, and emergent research

questions[

]. Many diverse approaches to grounded theory have emerged over the years, for example the divergent

approaches of the two founders Glaser[

] and Strauss[

] or the constructivist approach of Charmaz[

]. These all differ

as much in their epistemological foundations as they do in their practical methods. In this paper we will attempt to

focus on the commonalities between approaches. Thematic Analysis[

] on the other hand aims not for a full theory

but and understanding of “themes” that emerge from the data, which are generally at a lower level of analysis than a

full theory. Thematic analysis may variously be thought of as a stage in full grounded theory that precedes full theory

development, a form of grounded theory “lite” that does not go all the way to theory development, or, as Braun and

Clarke[

] a method in its own right, with a different set of aims. We begin our discussion with Thematic Analysis,

because it identiﬁes a number of methods that a common across many qualitative research methodologies. Second, we

believe the analogy with Topic Modeling is closer as the themes of thematic analysis are at a similar conceptual level to

the topics of topic modeling. Developing full qualitative theories is well beyond the scope of current machine learning.

we will then discuss important differences with full grounded theory.

Grounded theory and thematic analysis both begin with a period of familiarization with the data in which researchers

begin by reading the data as a whole to get an overall sense of what is being said before doing detailed analysis. This if

following by a process of coding which involves a close reading of the data. The research selects important passages

in the data and applies “codes” to them. Codes are single words or short phrases that summarize and identify the

topic of the text. Codes should be sufﬁciently general that they can apply to multiple parts of the text and so can

bring together different passages that are about the same thing. Once an initial close coding has been performed, the

researcher goes back through the codes in an attempt to ﬁnd higher level themes, by combining codes and looking at

their relationships. This stage includes a number of variants and different terminologies, Strauss and Corbin[

] refer to

ﬁnding “concepts”, Braun and Clarke[

] to searching for “themes” and Charmaz[

] to “focused coding” (in the rest of

this paper we will refer to “themes” on the understanding that these can stand for a number of concepts with diverse

philosophical foundations). Once a number of themes have been discovered, they are reviewed and reﬁned by going

back to the original data and comparing to see how well it matches the data.

In thematic analysis the aim is to produce a number of reﬁned themes. However, grounded theory aims to go deeper

and develop theories that relate and explain the themes. This can use a number of further approaches such axial[

]

or theoretical coding[

], but the most important method is Constant Comparison: passages of data are compared

to other passages, codes are compared to data, code to other codes and themes to both codes and data. The aim of

this comparison is to understand relationships between themes and their relationship to data in order to deepen the

researchers’ understanding of the phenomena being studied. This is a detailed process of interacting with the data,

reading, theorizing and comparing to reﬁne the theory. Another important part of the Grounded Theory process is

theoretical sampling: data is not collected prior to analysis but as part of an iterative research process in which initial

analysis informs the questions and approaches used in later data collection. Later data is collected speciﬁcally to better

understand the themes discovered in earlier analysis.

Theme and Topic

Qualitative research can provide a very rich, nuanced and human understanding of complex phenomena [

]. It can

highlight subtle and particular themes that can be lost in statistical analysis and can be open to surprising phenomena in

a way that hypothesis testing cannot. However, it also has problems. It is an extremely labor intensive process requiring

an expert to do very close reading(s) of the data. The time taken limits the scope of what is possible with qualitative

research, making it unfeasible for even medium sized data sets, let alone the big data setting where even an overview

reading of the entire dataset is not possible [12].

3 Topic Modelling

The problem of data size means that big data analysis is done by machine-based approaches. One of the most popular

is Probabilistic Topic Modeling[

], a set of machine learning approaches that analyze large corpora of textual data,

consisting of many individual documents, to extract the underlying topics or themes within the data. Topic modeling

has been applied to a number of domains such as the analysis of academic literature[

], social media “big data”[

news stories[15] or transcripts of crisis counseling sessions[16].

One of the most popular methods for topic modelling is Latent Dirichlet Allocation (LDA)[

]. That being said,

much newer, deep approaches such as BERTopic [

] are quickly growing in popularity, particularly in social media

applications. Unlike many earlier machine learning methods, LDA allows individual documents to contain multiple

topics, and not be about just one thing. A full description of the algorithm is found in Blei et al.[

] here we will

highlight a number of important features. In LDA, topics are represented as probability distributions on words,

for example the word "education" may be much more likely to appear in one topic than another. Documents are

represented as mixtures of topics, with the probabilities of words being determined by the probability of that word in a

topic multiplied by the probability of the topic within the document. LDA models are learned using an Expectation

Maximization algorithm, which alternates a phase of determining the probabilities of words within a topic (essentially a

process of counting words) based on the assignment of documents to topics with a phase of reassigning the documents

to topics based on their word probabilities.

As mentioned in the introduction, there is an interesting analogy between the aims of qualitative research and topic

modeling. In his survey paper, Blei[

] describes topic models as: “statistical methods that analyze the words of the

original texts to discover the themes that run through them, how those themes are connected to each other, and how they

change over time.”. Not only is the word “themes” directly analogous to Braun and Clarke[

] but the focus on themes,

the relationships and variations is very close to qualitative research ideas of constant comparison between themes and

data.

However, there are many differences. The automated nature of topic models make them applicable to very big data,

but in many ways it also means that it is impoverished relative to the rich human interpretation involved in qualitative

research. For example, LDA essentially consists in counting words and misses much of the contextual understanding that

human reading gives, it does not even take account of the order of words in sentences, the only contextual information

used is the co-occurrence of words in documents.

4 Interactive Machine Learning

If machine learning methods like topic models can handle very large data sets but ignore the complexities available to

human interpretation within qualitative research, is it possible to bring the two methods closer to bring the beneﬁts

to both (after all isn’t the aim of HCI to bring humans and computers closer)? Recently the challenge of reconciling

human and machine learning processes has been studied in Interactive Machine Learning[

] and Human-Centred

Machine Learning[21].

Machine Learning is traditionally viewed as a batch process in which a large, pre-existing dataset is fed into the

algorithm, which processes it and returns an answer. The role of the human in the process is simple to collect the data,

ideally in as passive a way as possible to ensure that the data is independent and identically distributed. Interactive

machine learning on the other hand sees the role of the human as much more active. They actively select data items to

be most representative and appropriate to the task. The selection of data is not done prior to running the algorithm, but

in a tight interactive loop with machine learning. The human selects a small amount of initial data which the computer

uses to generate an initial model. The human then adds more data speciﬁcally to reﬁne the model and correct errors in it.

This new data is not arbitrary or randomly selected, but is speciﬁcally chosen, through human judgment, as the best data

to improve misconceptions in the learned model (this may seem similar to active learning, but is fundamentally different

because it is a human that judges the usefulness of a data item, not a computer). In fact, the human may not know ahead

of time exactly what they want the computer to learn, their concept of the problem also develops via interaction with

the learned model and ﬁnding new data.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

THEMEANDTOPIC:HOWQUALITATIVERESEARCHANDTOPICMODELINGCANBEBROUGHTTOGETHERMarcoGilliesDepartmentofComputingGoldsmiths,UniversityofLondon,UKm.gillies@gold.ac.ukDhirajMurthySchoolofJournalismandMediaUniversityofTexasatAustin,USAdhiraj.murthy@austin.utexas.eduHarryBrentonBespokeVRLondon,UKharry@bespokeVR...

展开>> 收起<<

THEME AND TOPIC HOWQUALITATIVE RESEARCH AND TOPIC MODELING CANBEBROUGHT TOGETHER Marco Gillies.pdf

共12页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

THEME AND TOPIC HOWQUALITATIVE RESEARCH AND TOPIC MODELING CANBEBROUGHT TOGETHER Marco Gillies

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: