
WWW ’24, MAY 13 – 17, 2024, Singapore Wu, et al.
in the midst of the COVID-19 pandemic. Negative sentiments to-
wards China and Chinese, as evidenced by derogatory labels such
as “Chinese virus,” along with implicit biases against Asians, have
increased during the pandemic [
6
,
38
,
45
]. Federal law enforcement
agencies in the U.S. have alerted the surge in anti-Asian hate crimes
during this period [
23
]. Various advocacy eorts, including hashtag
campaigns such as “#racismisvirus" and “#stopAsianhate" have also
emerged to counter such anti-Asian sentiments and hate crimes.
As a result, the majority of recent studies on anti-Asian hate
have utilized datasets pertaining to the inuence of the COVID-19
pandemic, focusing on the evidence and consequences of Sinopho-
bia [
34
,
35
,
37
]. While the pandemic has undoubtedly served as an
important backdrop for recent Asian hate research, existing liter-
ature has failed to fully acknowledge the problem of anti-Asian
sentiments as an enduring social issue that transcends being merely
a byproduct of the pandemic. Furthermore, it does not adequately
acknowledge that the problem of anti-Asian hate aects a wide
range of ethnic groups within Asian populations, extending beyond
the Chinese community.
The purpose of this study is to ll this void by examining online
anti-Asian hate using a disaggregated-data approach. In particular,
this study broadens the observation period to cover an extended
time frame that encompasses the pre-pandemic, peak pandemic, and
post-peak pandemic phases, and conducts comparative analyses
using disaggregated data based on both temporal and sub-ethnic
breakdowns. This disaggregated approach enables the identication
of nuanced distinctions in the animosity directed toward dierent
ethnic groups within Asian populations. Moreover, it facilitates a
deeper understanding of the intricate inter-ethnic dynamics within
pan-Asian communities.1
The study aims to contribute to the literature by (1) creating
a longitudinal multi-ethnic Asian hate dataset, (2) investigating
temporal trends of anti-Asian messages on X (formerly known as
Twitter), and (3) introducing techniques that enable comparisons of
anti-Asian topics across multiple ethnic communities within pan-
Asian populations. The empirical results presented in this paper
address the following research questions.
(1)
RQ1: (a) Are there changes in the magnitude of anti-Asian
messages over time? (b) How do the trends over time vary
across dierent ethnic groups?
(2)
RQ2: (a) How semantically distant are anti-Asian messages
when comparing those aimed at Asians in a general sense to
those directed at specic sub-ethnic groups? (b) How do the
semantic distances change over time?
(3)
RQ3: (a) How are these topics distributed among messages
targeting Asians in a general sense, those targeting major
ethnic groups like Chinese and Indian, and those directed at
smaller ethnic groups? (b) What are the prevalent topics of
anti-Asian messages?
We collect a 12-month-long social conversations on X (formerly
known as Twitter) that contain diverse sub-ethnic group represen-
tations within Asian communities. Using this dataset, we disag-
gregate anti-Asian toxic messages based on temporal and ethnic
1
https://www.pewresearch.org/race-ethnicity/2022/08/02/what-it-means-to-be-
asian-in-america/
breakdowns and conduct a series of comparative analyses of toxic
messages targeting various ethnic groups.
Findings from temporal persistence analysis,
𝑛
-gram-based corre-
spondence analysis, and topic modeling reveal several key insights.
First, there is a substantial increase in the number of anti-Asian mes-
sages (especially anti-Chinese) in response to the declaration of the
pandemic, but the average toxicity score has not much aected by
the pandemic. Second, results align with previous research focused
on online hatred towards the Chinese ethnicity, highlighting that
toxic messages, broadly referring to ‘Asians’, had more semantic
similarities with those targeting the Chinese ethnicity than mes-
sages aimed at other specic groups within the Asian community
and that the volume of messages targeting other sub-Asian ethnic
groups was relatively low. Third,
𝑛
-gram-based analysis shows that
toxic messages that attack minority ethnic groups display orthog-
onal semantic features compared to majority-ethnicity-attacking
(e.g., Chinese, Indian) or generic-Asian-attacking messages. In con-
trast, when analyzing minority ethnic groups collectively using
topic modeling, generic-Asian-attacking messages demonstrate
more similar narrative patterns to the collective set of minority
Asian ethnic groups than to a single large group such as Chinese
or Indian.
In essence, this study underscores the importance of recognizing
and addressing the diversity of anti-Asian hate speech. Online anti-
Asian hate speech is complex and nuanced, encompassing various
ethnic backgrounds and the intricate web of biases that exist both
within and beyond the Asian community. In this sense, a multifac-
eted and disaggregated data approach is necessary to understand
and combat the hateful discourse. The methodological approaches
we develop in this paper may be useful to researchers and policy-
makers striving to better comprehend and confront these pressing
challenges, fostering a more inclusive and equitable digital land-
scape for all. Importantly, while the primary focus of this study is
on Asians, “panethnicity” is a form of identication observed glob-
ally, encompassing communities like Latino, Yoruba, or Roma [
28
].
Therefore, disaggregated data practices have universal applicability
in addressing social issues relevant to panethnic communities.
2 RELATED WORK AND PROBLEM
STATEMENT
2.1 Online Hate/toxic Speech Research
Hate and toxic speech involves abusive and aggressive language
that attacks a person or group based on attributes such as race,
religion, ethnic origin, national origin, sex, disability, sexual orien-
tation, or gender identity [
4
,
11
,
20
,
33
]. Much eort in this research
domain has been put on message discovery solutions based on nat-
ural language techniques and models to detect and classify hate
speeches more eciently [
29
,
30
,
36
,
44
]. Especially, deep learn-
ing has emerged as a powerful technique that learns hidden data
representations and achieves better performance in detecting on-
line hate speech [
20
,
32
]. As a computational aide, state-of-the-art
deep learning models such as BERT
2
, a BERT ne-tuning model,
RoBERTa [22] have been extensively employed [10, 30].
2Bidirectional Encoder Representations from Transformers [7]