Theme and Topic
which can be daunting at first. In particular by working around existing concepts and workflows we may present the
elements of a machine learning system to users in a way that is readily interpretable using their prior knowledge.
This paper presents an example of this design approach via the design of a research system that integrates machine
learning with workflows drawn from qualitative research. One pitfall of this approach is that inevitably the machine
learning algorithm and the human task will differ in potentially subtle ways that could result in confusion or be
misleading. Another important challenge of this approach is to be aware of these potential differences and just as we
highlight similarities in the design of an interactive interface we must particularly highlight cases where the actions of
the machine learning algorithm may differ from what will be a standard human interpretation. The example presented
in this paper highlights both these factors of similarities and differences.
2 Qualitative Research
Qualitative Research is a name given to a wide range of research techniques in the social sciences and related disciplines
(including HCI[
1
]) based on a detailed, human reading of textual or similar data with the aim of developing themes or
theories that take a qualitative form. Qualitative methods are often contrasted with quantitative research methods based
on statistical analysis[
2
] (which generally take the form of hypothesis testing, and so differ from the machine learning
techniques often used in big data analysis). Qualitative research encompasses a very wide range of diverse methods and
methodologies [
3
,
4
,
5
,
6
]. In this paper we will focus on two of the most popular and broadly applicable: Grounded
Theory[3, 7, 8] and Thematic Analysis[4].
Grounded Theory ([
3
,
7
,
8
]) is a methodology for developing fully formed, novel theories that are “grounded” in a
close analysis of qualitative data. It has been used extensively in HCI[
1
]. Grounded Theory is usually an emergent
process. Specifically, data collection and analysis can involve several passes, new variables, and emergent research
questions[
9
]. Many diverse approaches to grounded theory have emerged over the years, for example the divergent
approaches of the two founders Glaser[
10
] and Strauss[
7
] or the constructivist approach of Charmaz[
8
]. These all differ
as much in their epistemological foundations as they do in their practical methods. In this paper we will attempt to
focus on the commonalities between approaches. Thematic Analysis[
4
] on the other hand aims not for a full theory
but and understanding of “themes” that emerge from the data, which are generally at a lower level of analysis than a
full theory. Thematic analysis may variously be thought of as a stage in full grounded theory that precedes full theory
development, a form of grounded theory “lite” that does not go all the way to theory development, or, as Braun and
Clarke[
4
] a method in its own right, with a different set of aims. We begin our discussion with Thematic Analysis,
because it identifies a number of methods that a common across many qualitative research methodologies. Second, we
believe the analogy with Topic Modeling is closer as the themes of thematic analysis are at a similar conceptual level to
the topics of topic modeling. Developing full qualitative theories is well beyond the scope of current machine learning.
we will then discuss important differences with full grounded theory.
Grounded theory and thematic analysis both begin with a period of familiarization with the data in which researchers
begin by reading the data as a whole to get an overall sense of what is being said before doing detailed analysis. This if
following by a process of coding which involves a close reading of the data. The research selects important passages
in the data and applies “codes” to them. Codes are single words or short phrases that summarize and identify the
topic of the text. Codes should be sufficiently general that they can apply to multiple parts of the text and so can
bring together different passages that are about the same thing. Once an initial close coding has been performed, the
researcher goes back through the codes in an attempt to find higher level themes, by combining codes and looking at
their relationships. This stage includes a number of variants and different terminologies, Strauss and Corbin[
7
] refer to
finding “concepts”, Braun and Clarke[
4
] to searching for “themes” and Charmaz[
8
] to “focused coding” (in the rest of
this paper we will refer to “themes” on the understanding that these can stand for a number of concepts with diverse
philosophical foundations). Once a number of themes have been discovered, they are reviewed and refined by going
back to the original data and comparing to see how well it matches the data.
In thematic analysis the aim is to produce a number of refined themes. However, grounded theory aims to go deeper
and develop theories that relate and explain the themes. This can use a number of further approaches such axial[
7
]
or theoretical coding[
8
], but the most important method is Constant Comparison: passages of data are compared
to other passages, codes are compared to data, code to other codes and themes to both codes and data. The aim of
this comparison is to understand relationships between themes and their relationship to data in order to deepen the
researchers’ understanding of the phenomena being studied. This is a detailed process of interacting with the data,
reading, theorizing and comparing to refine the theory. Another important part of the Grounded Theory process is
theoretical sampling: data is not collected prior to analysis but as part of an iterative research process in which initial
analysis informs the questions and approaches used in later data collection. Later data is collected specifically to better
understand the themes discovered in earlier analysis.
2