THEME AND TOPIC HOWQUALITATIVE RESEARCH AND TOPIC MODELING CANBEBROUGHT TOGETHER Marco Gillies

2025-05-06 0 0 336.47KB 12 页 10玖币
侵权投诉
THEME AND TOPIC: HOW QUALITATIVE RESEARCH AND TOPIC
MODELING CAN BEBROUGHT TOGETHER
Marco Gillies
Department of Computing
Goldsmiths, University of London, UK
m.gillies@gold.ac.uk
Dhiraj Murthy
School of Journalism and Media
University of Texas at Austin, USA
dhiraj.murthy@austin.utexas.edu
Harry Brenton
BespokeVR
London, UK
harry@bespokeVR.com
Rapheal Olaniyan
Department of Computing
Goldsmiths, University of London, UK
rolan001@gold.ac.uk
ABSTRACT
Qualitative research is an approach to understanding social phenomenon based around human
interpretation of data, particularly text. Probabilistic topic modelling is a machine learning approach
that is also based around the analysis of text and often is used to in order to understand social
phenomena. Both of these approaches aim to extract important themes or topics in a textual corpus
and therefore we may see them as analogous to each other. However there are also considerable
differences in how the two approaches function. One is a highly human interpretive process, the other
is automated and statistical. In this paper we use this analogy as the basis for our Theme and Topic
system, a tool for qualitative researchers to conduct textual research that integrates topic modelling
into an accessible interface. This is an example of a more general approach to the design of interactive
machine learning systems in which existing human professional processes can be used as the model
for processes involving machine learning. This has the particular benefit of providing a familiar
approach to existing professionals, that may can make machine learning seem less alien and easier to
learn. Our design approach has two elements. We first investigate the steps professionals go through
when performing tasks and design a workflow for Theme and Topic that integrates machine learning.
We then designed interfaces for topic modelling in which familiar concepts from qualitative research
are mapped onto machine learning concepts. This makes these the machine learning concepts more
familiar and easier to learn for qualitative researchers.
Keywords Topic Modeling ·Qualitative Research ·Conceptual Models ·Social Media ·HCI ·Mixed Methods
1 Introduction
This paper investigates an analogy between two seemingly different research methods: qualitative research and proba-
bilistic topic modeling. Qualitative research involves detailed reading of texts resulting in a rich human interpretation. It
can give highly nuanced and insightful analyses but is very time consuming and generally cannot be scaled up to the
volumes involved with “Big Data”. Topic modeling on the other hand is an automated method that is very well suited
to big data, but which lacks the nuance of human interpretation. However, we argue that they both share a common
goal: discovering a number of underlying themes within data. This paper investigates this analogy and ask whether it is
possible to use this analogy to bring the two methods closer together. We assess the possibility critically and raise as
many questions as it answers, while being a starting point for future research.
This analogy presents an example of a more general approach to designing interactive systems that are based on machine
learning by taking an analogy between a machine learning approach and an existing human task. Following in this vein,
we seek to design software that is more accessible to existing professionals than a traditional machine learning system,
arXiv:2210.00707v1 [cs.HC] 3 Oct 2022
Theme and Topic
which can be daunting at first. In particular by working around existing concepts and workflows we may present the
elements of a machine learning system to users in a way that is readily interpretable using their prior knowledge.
This paper presents an example of this design approach via the design of a research system that integrates machine
learning with workflows drawn from qualitative research. One pitfall of this approach is that inevitably the machine
learning algorithm and the human task will differ in potentially subtle ways that could result in confusion or be
misleading. Another important challenge of this approach is to be aware of these potential differences and just as we
highlight similarities in the design of an interactive interface we must particularly highlight cases where the actions of
the machine learning algorithm may differ from what will be a standard human interpretation. The example presented
in this paper highlights both these factors of similarities and differences.
2 Qualitative Research
Qualitative Research is a name given to a wide range of research techniques in the social sciences and related disciplines
(including HCI[
1
]) based on a detailed, human reading of textual or similar data with the aim of developing themes or
theories that take a qualitative form. Qualitative methods are often contrasted with quantitative research methods based
on statistical analysis[
2
] (which generally take the form of hypothesis testing, and so differ from the machine learning
techniques often used in big data analysis). Qualitative research encompasses a very wide range of diverse methods and
methodologies [
3
,
4
,
5
,
6
]. In this paper we will focus on two of the most popular and broadly applicable: Grounded
Theory[3, 7, 8] and Thematic Analysis[4].
Grounded Theory ([
3
,
7
,
8
]) is a methodology for developing fully formed, novel theories that are “grounded” in a
close analysis of qualitative data. It has been used extensively in HCI[
1
]. Grounded Theory is usually an emergent
process. Specifically, data collection and analysis can involve several passes, new variables, and emergent research
questions[
9
]. Many diverse approaches to grounded theory have emerged over the years, for example the divergent
approaches of the two founders Glaser[
10
] and Strauss[
7
] or the constructivist approach of Charmaz[
8
]. These all differ
as much in their epistemological foundations as they do in their practical methods. In this paper we will attempt to
focus on the commonalities between approaches. Thematic Analysis[
4
] on the other hand aims not for a full theory
but and understanding of “themes” that emerge from the data, which are generally at a lower level of analysis than a
full theory. Thematic analysis may variously be thought of as a stage in full grounded theory that precedes full theory
development, a form of grounded theory “lite” that does not go all the way to theory development, or, as Braun and
Clarke[
4
] a method in its own right, with a different set of aims. We begin our discussion with Thematic Analysis,
because it identifies a number of methods that a common across many qualitative research methodologies. Second, we
believe the analogy with Topic Modeling is closer as the themes of thematic analysis are at a similar conceptual level to
the topics of topic modeling. Developing full qualitative theories is well beyond the scope of current machine learning.
we will then discuss important differences with full grounded theory.
Grounded theory and thematic analysis both begin with a period of familiarization with the data in which researchers
begin by reading the data as a whole to get an overall sense of what is being said before doing detailed analysis. This if
following by a process of coding which involves a close reading of the data. The research selects important passages
in the data and applies “codes” to them. Codes are single words or short phrases that summarize and identify the
topic of the text. Codes should be sufficiently general that they can apply to multiple parts of the text and so can
bring together different passages that are about the same thing. Once an initial close coding has been performed, the
researcher goes back through the codes in an attempt to find higher level themes, by combining codes and looking at
their relationships. This stage includes a number of variants and different terminologies, Strauss and Corbin[
7
] refer to
finding “concepts”, Braun and Clarke[
4
] to searching for “themes” and Charmaz[
8
] to “focused coding” (in the rest of
this paper we will refer to “themes” on the understanding that these can stand for a number of concepts with diverse
philosophical foundations). Once a number of themes have been discovered, they are reviewed and refined by going
back to the original data and comparing to see how well it matches the data.
In thematic analysis the aim is to produce a number of refined themes. However, grounded theory aims to go deeper
and develop theories that relate and explain the themes. This can use a number of further approaches such axial[
7
]
or theoretical coding[
8
], but the most important method is Constant Comparison: passages of data are compared
to other passages, codes are compared to data, code to other codes and themes to both codes and data. The aim of
this comparison is to understand relationships between themes and their relationship to data in order to deepen the
researchers’ understanding of the phenomena being studied. This is a detailed process of interacting with the data,
reading, theorizing and comparing to refine the theory. Another important part of the Grounded Theory process is
theoretical sampling: data is not collected prior to analysis but as part of an iterative research process in which initial
analysis informs the questions and approaches used in later data collection. Later data is collected specifically to better
understand the themes discovered in earlier analysis.
2
Theme and Topic
Qualitative research can provide a very rich, nuanced and human understanding of complex phenomena [
11
]. It can
highlight subtle and particular themes that can be lost in statistical analysis and can be open to surprising phenomena in
a way that hypothesis testing cannot. However, it also has problems. It is an extremely labor intensive process requiring
an expert to do very close reading(s) of the data. The time taken limits the scope of what is possible with qualitative
research, making it unfeasible for even medium sized data sets, let alone the big data setting where even an overview
reading of the entire dataset is not possible [12].
3 Topic Modelling
The problem of data size means that big data analysis is done by machine-based approaches. One of the most popular
is Probabilistic Topic Modeling[
13
], a set of machine learning approaches that analyze large corpora of textual data,
consisting of many individual documents, to extract the underlying topics or themes within the data. Topic modeling
has been applied to a number of domains such as the analysis of academic literature[
13
], social media “big data”[
14
],
news stories[15] or transcripts of crisis counseling sessions[16].
One of the most popular methods for topic modelling is Latent Dirichlet Allocation (LDA)[
17
]. That being said,
much newer, deep approaches such as BERTopic [
18
] are quickly growing in popularity, particularly in social media
applications. Unlike many earlier machine learning methods, LDA allows individual documents to contain multiple
topics, and not be about just one thing. A full description of the algorithm is found in Blei et al.[
17
] here we will
highlight a number of important features. In LDA, topics are represented as probability distributions on words,
for example the word "education" may be much more likely to appear in one topic than another. Documents are
represented as mixtures of topics, with the probabilities of words being determined by the probability of that word in a
topic multiplied by the probability of the topic within the document. LDA models are learned using an Expectation
Maximization algorithm, which alternates a phase of determining the probabilities of words within a topic (essentially a
process of counting words) based on the assignment of documents to topics with a phase of reassigning the documents
to topics based on their word probabilities.
As mentioned in the introduction, there is an interesting analogy between the aims of qualitative research and topic
modeling. In his survey paper, Blei[
13
] describes topic models as: “statistical methods that analyze the words of the
original texts to discover the themes that run through them, how those themes are connected to each other, and how they
change over time.. Not only is the word “themes” directly analogous to Braun and Clarke[
4
] but the focus on themes,
the relationships and variations is very close to qualitative research ideas of constant comparison between themes and
data.
However, there are many differences. The automated nature of topic models make them applicable to very big data,
but in many ways it also means that it is impoverished relative to the rich human interpretation involved in qualitative
research. For example, LDA essentially consists in counting words and misses much of the contextual understanding that
human reading gives, it does not even take account of the order of words in sentences, the only contextual information
used is the co-occurrence of words in documents.
4 Interactive Machine Learning
If machine learning methods like topic models can handle very large data sets but ignore the complexities available to
human interpretation within qualitative research, is it possible to bring the two methods closer to bring the benefits
to both (after all isn’t the aim of HCI to bring humans and computers closer)? Recently the challenge of reconciling
human and machine learning processes has been studied in Interactive Machine Learning[
19
,
20
] and Human-Centred
Machine Learning[21].
Machine Learning is traditionally viewed as a batch process in which a large, pre-existing dataset is fed into the
algorithm, which processes it and returns an answer. The role of the human in the process is simple to collect the data,
ideally in as passive a way as possible to ensure that the data is independent and identically distributed. Interactive
machine learning on the other hand sees the role of the human as much more active. They actively select data items to
be most representative and appropriate to the task. The selection of data is not done prior to running the algorithm, but
in a tight interactive loop with machine learning. The human selects a small amount of initial data which the computer
uses to generate an initial model. The human then adds more data specifically to refine the model and correct errors in it.
This new data is not arbitrary or randomly selected, but is specifically chosen, through human judgment, as the best data
to improve misconceptions in the learned model (this may seem similar to active learning, but is fundamentally different
because it is a human that judges the usefulness of a data item, not a computer). In fact, the human may not know ahead
of time exactly what they want the computer to learn, their concept of the problem also develops via interaction with
the learned model and finding new data.
3
摘要:

THEMEANDTOPIC:HOWQUALITATIVERESEARCHANDTOPICMODELINGCANBEBROUGHTTOGETHERMarcoGilliesDepartmentofComputingGoldsmiths,UniversityofLondon,UKm.gillies@gold.ac.ukDhirajMurthySchoolofJournalismandMediaUniversityofTexasatAustin,USAdhiraj.murthy@austin.utexas.eduHarryBrentonBespokeVRLondon,UKharry@bespokeVR...

展开>> 收起<<
THEME AND TOPIC HOWQUALITATIVE RESEARCH AND TOPIC MODELING CANBEBROUGHT TOGETHER Marco Gillies.pdf

共12页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:12 页 大小:336.47KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 12
客服
关注