
DEPTWEET: A Typology for Social Media Texts to Detect Depression Severities
the severity of the mental condition of depressed individuals,
(b) constructing a dataset named DEPTWEET1containing
around 40191 tweets with corresponding crowdsourced la-
bels and confidence scores. The labeling typology of the
dataset assigns a higher-level classification to each tweet,
such as (1) Non-depressed, (2) Mildly Depressed, (3) Mod-
erately Depressed, and (4) Severely Depressed. There is also
an associated confidence score (between 0.5 and 1) for each
label.
The procedure used to assess the severity of depres-
sion in this study was based on a well-established clinical
assessment method known as the Diagnostic and Statisti-
cal Manual of Mental Disorders, Fifth Edition (DSM-5)
(Arbanas,2015), and it was carried out under the supervi-
sion of two expert clinical psychologists. The DEPTWEET
dataset contributes further high-quality data on attributes
like none, mild, moderate or severe depression, adding to
existing datasets on these and related attributes (Ahmed
et al.,2021b;Mukhiya et al.,2020), and provides the first
dataset of this scale on depression severities to the best of
our knowledge. The approach utilized in this study can be
adopted to generate high-quality mental health data from
various platforms in future investigations. Moreover, given
that the data was collected in the latter half of 2021, topic
modeling on this dataset can provide useful insight into the
impact of the COVID-19 pandemic on individuals’ mental
health.
The remaining sections of the paper are structured as
follows: Section 2and 3outlines the motivation and back-
ground of the DEPTWEET dataset. The data collection,
quality control mechanisms, and the summary statistics of
the data are described in Section 4. The baseline classifica-
tion model for this dataset and evaluation metrics are pre-
sented in Section 5. Section 6discusses the classification re-
sults, potential sources of bias in the data, and the necessary
aspects to consider while conducting additional research in
this domain. Finally, Section 7draws a conclusion to the
current study and discusses future directions.
2. Related Work
Computational linguistics techniques are very difficult
to be opted as a complete substitute for in-person mental
illness diagnosis, but the successful application of this do-
main in identifying the progress and level of depression of
individuals in online therapy may provide clinicians with
more insights, allowing them to apply interventions more
effectively and efficiently. Studies analyzing web data, espe-
cially social media platforms, have piqued the interest of the
research community due to their scope and deep entangle-
ment in contemporary culture (Fuchs,2015). Coppersmith
et al. (2014) made a prominent contribution in this domain
by developing a procedure of extracting mental health data
from social media. In their study, tweets were crawled from
1The DEPTWEET dataset is available at
https://github.com/mohsinulkabir14/DEPTWEET
user profiles who publicly stated that they had been diag-
nosed with various mental illnesses on their Twitter feed.
They mixed control samples from the general population
(people who are not depressed) with the tweets of the self-
reported diagnosed group. Additionally, they conducted an
LIWC (Linguistic Inquiry Word Count) analysis to measure
deviations of each disorder group from the control group.
They focused on the analysis of four mental illnesses: Post-
Traumatic Stress Disorder (PTSD), Depression, Bipolar Dis-
order, and Seasonal Affective Disorder (SAD), and proposed
this novel method to gather data for a range of mental
illnesses quickly and cheaply. Numerous studies later fol-
lowed this approach to detect relevant mental health data for
various mental illnesses. For example, The Computational
Linguistics and Clinical Psychology (CLPsych) 2015 shared
task (Coppersmith et al.,2015) collected self-reported data
on Depression and PTSD. They further annotated the data
with human annotators to remove jokes, quotes, etc., from
the collected data. The shared task participants had three
binary classification tasks- identify depression vs. control,
identify PTSD vs. control, and identify depression vs. PTSD.
These datasets were used in a variety of studies to discover
patterns in the language use of users suffering from various
mental illnesses (Pedersen,2015;Coppersmith et al.,2016;
Amir et al.,2017). In particular, Resnik et al. (2015) con-
ducted several topic modeling (supervised Latent Dirichlet
Allocation (LDA), supervised anchor topic modeling, etc.)
to differentiate the language usage of depressed and non-
depressed individuals using the datasets of Coppersmith
et al. (2014) and CLPSych Shared Task (2015).
Following a similar approach, Chen et al. (2018) col-
lected tweets from self-reported depressed users and investi-
gated the potential of non-temporal and temporal measures
of emotions over time to identify depression symptoms from
their tweets by detecting eight basic emotions (e.g. anger,
fear, etc.). Additionally, classifiers were built to label Twitter
users as either depressed or non-depressed (control) groups
calculating the strength scores based on the intensity of each
emotion and a time series analysis of each user. Among other
social medias, Tian et al. (2016) explored sleep complaints
on Sina Weibo (a Chinese microblogging website) to dis-
cover users’ diurnal activity patterns and gain insight into the
mental health of insomniacs. Twitter data on mental health
had also been collected, with specific Twitter campaigns
being targeted. For instance, Jamil et al. (2017) prepared a
dataset from the users who participated in the #BellLetsTalk
2015 campaign that was inaugurated to promote awareness
about mental health issues. They collected public tweets
from 25362 Canadian users and built a user-level classifier
to detect at-risk users and a tweet-level classifier to predict
symptoms of depression in tweets. From this campaign, they
came across only 5% tweets that talk about depression and
95% non-depressed tweets. While these methods can extract
large volumes of data for a low cost, they do not ensure a
sufficient sample of interest and have inevitably resulted in a
low number of positive samples (mental-health related data).
Kabir et al.: Preprint submitted to Elsevier Page 2 of 17