Not All Asians are the Same A Disaggregated Approach to Identifying Anti-Asian Racism in Social Media

2025-05-02 0 0 1.44MB 15 页 10玖币
侵权投诉
Not All Asians are the Same: A Disaggregated Approach to
Identifying Anti-Asian Racism in Social Media
Fan Wu
Arizona State University
Tempe, Arizona, USA
Sanyam Lakhanpal
Arizona State University
Tempe, Arizona, USA
Qian Li
Arizona State University
Tempe, Arizona, USA
Kookjin Lee
Arizona State University
Tempe, Arizona, USA
Kookjin.Lee@asu.edu
Doowon Kim
University of Tennessee, Knoxville
Knoxville, Tennessee, USA
Heewon Chae
Arizona State University
Tempe, Arizona, USA
Hazel K. Kwon
Arizona State University
Tempe, Arizona, USA
ABSTRACT
Recent policy initiatives have acknowledged the importance of dis-
aggregating data pertaining to diverse Asian ethnic communities
to gain a more comprehensive understanding of their current sta-
tus and to improve their overall well-being. However, research on
anti-Asian racism has thus far fallen short of properly incorporat-
ing data disaggregation practices. Our study addresses this gap by
collecting 12-month-long data from X (formerly known as Twit-
ter) that contain diverse sub-ethnic group representations within
Asian communities. In this dataset, we break down anti-Asian toxic
messages based on both temporal and ethnic factors and conduct a
series of comparative analyses of toxic messages, targeting dierent
ethnic groups. Using temporal persistence analysis, 𝑛-gram-based
correspondence analysis, and topic modeling, this study provides
compelling evidence that anti-Asian messages comprise various
distinctive narratives. Certain messages targeting sub-ethnic Asian
groups entail dierent topics that distinguish them from those tar-
geting Asians in a generic manner or those aimed at major ethnic
groups, such as Chinese and Indian. By introducing several tech-
niques that facilitate comparisons of online anti-Asian hate towards
diverse ethnic communities, this study highlights the importance
of taking a nuanced and disaggregated approach for understanding
racial hatred to formulate eective mitigation strategies.
CCS CONCEPTS
General and reference
General conference proceedings;
Social and professional topics
Race and ethnicity;Net-
works Social media networks.
Corresponding author
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from permissions@acm.org.
WWW ’24, MAY 13 – 17, 2024, Singapore
©2024 Association for Computing Machinery.
ACM ISBN 978-1-4503-XXXX-X/18/06. . . $15.00
https://doi.org/XXXXXXX.XXXXXXX
KEYWORDS
Anti-Asian sentiment, Racism against Asian, Panethnicity, Disaggre-
gated Asian American data, Topic modeling, Social media mining
ACM Reference Format:
Fan Wu, Sanyam Lakhanpal, Qian Li, Kookjin Lee, Doowon Kim, Hee-
won Chae, and Hazel K. Kwon. 2024. Not All Asians are the Same: A
Disaggregated Approach to Identifying Anti-Asian Racism in Social Me-
dia. In Proceedings of (WWW ’24). ACM, New York, NY, USA, 15 pages.
https://doi.org/XXXXXXX.XXXXXXX
1 INTRODUCTION
In 2023, the U.S. government released its inaugural report of the
White House Initiative on Asian Americans, Native Hawaiians, and
Pacic Islanders (WHIAANHPI) [
39
], which aims to develop strate-
gies to enhance justice, equity, and the overall well-being of this pop-
ulation (collectively referred to as Asians hereafter). One of the key
priorities of this initiative is to “make disaggregated data collection
and reporting the norm” across the federal agencies (WHIAANHI,
[
39
, p.22]). Given the diverse range of ethnic groups within the
Asian American population, the use of disaggregated data prac-
tices is imperative for attaining a thorough understanding of these
distinct Asian communities and relevant policy-making [
43
]. For
example, when information is reported in an aggregated manner,
the average cancer rate for Asian women is lower than that for
white women. However, when examining segmented records, it
becomes evident that Laotian women have cancer rates more than
nine times higher than those for white women (WHIAANHI, [
39
,
p.22]). This dierence highlights the critical need for disaggregated
data, as it reveals the signicant disparities within the Asian Amer-
ican population, enabling policymakers to develop targeted and
eective interventions for specic communities like Laotian women.
Indeed, the importance of collecting and reporting disaggregated
data extends beyond Asian Americans and should be applied to all
“panethnic” communities worldwide [28].
Addressing anti-Asian hate can also benet from disaggregated
data practices. Research on anti-Asian hate has attracted signicant
attention, especially in response to the surge in Sinophobia, a fear
or dislike of China or its people, and hate crimes targeting Asians
arXiv:2210.11640v2 [cs.SI] 12 Feb 2024
WWW ’24, MAY 13 – 17, 2024, Singapore Wu, et al.
in the midst of the COVID-19 pandemic. Negative sentiments to-
wards China and Chinese, as evidenced by derogatory labels such
as “Chinese virus,” along with implicit biases against Asians, have
increased during the pandemic [
6
,
38
,
45
]. Federal law enforcement
agencies in the U.S. have alerted the surge in anti-Asian hate crimes
during this period [
23
]. Various advocacy eorts, including hashtag
campaigns such as “#racismisvirus" and “#stopAsianhate" have also
emerged to counter such anti-Asian sentiments and hate crimes.
As a result, the majority of recent studies on anti-Asian hate
have utilized datasets pertaining to the inuence of the COVID-19
pandemic, focusing on the evidence and consequences of Sinopho-
bia [
34
,
35
,
37
]. While the pandemic has undoubtedly served as an
important backdrop for recent Asian hate research, existing liter-
ature has failed to fully acknowledge the problem of anti-Asian
sentiments as an enduring social issue that transcends being merely
a byproduct of the pandemic. Furthermore, it does not adequately
acknowledge that the problem of anti-Asian hate aects a wide
range of ethnic groups within Asian populations, extending beyond
the Chinese community.
The purpose of this study is to ll this void by examining online
anti-Asian hate using a disaggregated-data approach. In particular,
this study broadens the observation period to cover an extended
time frame that encompasses the pre-pandemic, peak pandemic, and
post-peak pandemic phases, and conducts comparative analyses
using disaggregated data based on both temporal and sub-ethnic
breakdowns. This disaggregated approach enables the identication
of nuanced distinctions in the animosity directed toward dierent
ethnic groups within Asian populations. Moreover, it facilitates a
deeper understanding of the intricate inter-ethnic dynamics within
pan-Asian communities.1
The study aims to contribute to the literature by (1) creating
a longitudinal multi-ethnic Asian hate dataset, (2) investigating
temporal trends of anti-Asian messages on X (formerly known as
Twitter), and (3) introducing techniques that enable comparisons of
anti-Asian topics across multiple ethnic communities within pan-
Asian populations. The empirical results presented in this paper
address the following research questions.
(1)
RQ1: (a) Are there changes in the magnitude of anti-Asian
messages over time? (b) How do the trends over time vary
across dierent ethnic groups?
(2)
RQ2: (a) How semantically distant are anti-Asian messages
when comparing those aimed at Asians in a general sense to
those directed at specic sub-ethnic groups? (b) How do the
semantic distances change over time?
(3)
RQ3: (a) How are these topics distributed among messages
targeting Asians in a general sense, those targeting major
ethnic groups like Chinese and Indian, and those directed at
smaller ethnic groups? (b) What are the prevalent topics of
anti-Asian messages?
We collect a 12-month-long social conversations on X (formerly
known as Twitter) that contain diverse sub-ethnic group represen-
tations within Asian communities. Using this dataset, we disag-
gregate anti-Asian toxic messages based on temporal and ethnic
1
https://www.pewresearch.org/race-ethnicity/2022/08/02/what-it-means-to-be-
asian-in-america/
breakdowns and conduct a series of comparative analyses of toxic
messages targeting various ethnic groups.
Findings from temporal persistence analysis,
𝑛
-gram-based corre-
spondence analysis, and topic modeling reveal several key insights.
First, there is a substantial increase in the number of anti-Asian mes-
sages (especially anti-Chinese) in response to the declaration of the
pandemic, but the average toxicity score has not much aected by
the pandemic. Second, results align with previous research focused
on online hatred towards the Chinese ethnicity, highlighting that
toxic messages, broadly referring to ‘Asians’, had more semantic
similarities with those targeting the Chinese ethnicity than mes-
sages aimed at other specic groups within the Asian community
and that the volume of messages targeting other sub-Asian ethnic
groups was relatively low. Third,
𝑛
-gram-based analysis shows that
toxic messages that attack minority ethnic groups display orthog-
onal semantic features compared to majority-ethnicity-attacking
(e.g., Chinese, Indian) or generic-Asian-attacking messages. In con-
trast, when analyzing minority ethnic groups collectively using
topic modeling, generic-Asian-attacking messages demonstrate
more similar narrative patterns to the collective set of minority
Asian ethnic groups than to a single large group such as Chinese
or Indian.
In essence, this study underscores the importance of recognizing
and addressing the diversity of anti-Asian hate speech. Online anti-
Asian hate speech is complex and nuanced, encompassing various
ethnic backgrounds and the intricate web of biases that exist both
within and beyond the Asian community. In this sense, a multifac-
eted and disaggregated data approach is necessary to understand
and combat the hateful discourse. The methodological approaches
we develop in this paper may be useful to researchers and policy-
makers striving to better comprehend and confront these pressing
challenges, fostering a more inclusive and equitable digital land-
scape for all. Importantly, while the primary focus of this study is
on Asians, “panethnicity” is a form of identication observed glob-
ally, encompassing communities like Latino, Yoruba, or Roma [
28
].
Therefore, disaggregated data practices have universal applicability
in addressing social issues relevant to panethnic communities.
2 RELATED WORK AND PROBLEM
STATEMENT
2.1 Online Hate/toxic Speech Research
Hate and toxic speech involves abusive and aggressive language
that attacks a person or group based on attributes such as race,
religion, ethnic origin, national origin, sex, disability, sexual orien-
tation, or gender identity [
4
,
11
,
20
,
33
]. Much eort in this research
domain has been put on message discovery solutions based on nat-
ural language techniques and models to detect and classify hate
speeches more eciently [
29
,
30
,
36
,
44
]. Especially, deep learn-
ing has emerged as a powerful technique that learns hidden data
representations and achieves better performance in detecting on-
line hate speech [
20
,
32
]. As a computational aide, state-of-the-art
deep learning models such as BERT
2
, a BERT ne-tuning model,
RoBERTa [22] have been extensively employed [10, 30].
2Bidirectional Encoder Representations from Transformers [7]
Not All Asians are the Same: A Disaggregated Approach to Identifying Anti-Asian Racism in Social Media WWW ’24, MAY 13 – 17, 2024, Singapore
2.2 Online Anti-Asian Hate Speech Research
Anti-Asian hate speech has recently received attention in response
to the outbreak of COVID-19, during which racism and hateful
messages against Asians have become rampant [
12
,
17
,
20
,
46
]. On-
line anti-Asian hate speech research has evolved into four types—
COVID-specic hate speech, general anti-Asian sentiments, anti-
Chinese political sentiments, and counter-hate movements such
as “#racismisvirus” and “#stopAsianhate” [
21
]. Like previous stud-
ies on racist hate speech, anti-Asian speech research has focused
on detecting and classifying anti-Asian toxic contents [
20
,
21
,
42
].
Most of these studies have centered specically on the COVID-19
pandemic. For example, a study introduced a new classier that
identies and categorizes online anti-Asian tweets during COVID-
19 into four classes: hostility against East Asia, criticism of East
Asia, meta-discussions of East Asian prejudice, and a neutral class
[
42
]. Several studies have focused on the trends and features of
anti-Asian sentiment during COVID-19 [
12
,
19
,
27
] and found that
antipathy against Chinese had spillover eects on Asians in general.
One study uses a large-scale web-based media database to compare
global sentiments toward Asians across 20 countries before and
after the pandemic, nding that even though anti-Asian sentiments
are deep-seated and predicated on structural undercurrents of cul-
ture, the pandemic has indirectly and inadvertently exacerbated
those anti-Asian sentiments [27].
2.3 Filling the Void: Considering Temporal and
Ethnic Heterogeneity in Asian Hate Speech
While existing research has developed various statistical/machine
learning (ML) techniques (e.g., hate speech detection) to identify
patterns in anti-Asian sentiments of online speech, the vast amount
of research has been situated in a specic empirical context, that
is, the COVID-19 pandemic, resulting in a rather skewed research
trend. Although COVID-19 has resurfaced the concerns about anti-
Asian hate, anti-Asian racism has been an enduring problem of
inter-ethnic relations. Furthermore, empirical datasets related to
COVID-19 often feature a disproportionately large number of mes-
sages concerning China and Chinese, leading to an assessment of
anti-Asian sentiments that is centered around Chinese-related con-
tents [
34
,
35
,
37
]. Even many studies, which examine a generically-
dened ‘Asians’, have (misleadingly) alluded to Asians as being a
homogeneous unity, dismissing the essence of “panethnicity” [
28
]
that Asian is a concept that bridges very diverse sub-ethnic groups.
While those statistical/ML methods have gained traction as a
pragmatic solution to mitigate the discursive “pollution” in digital
information commons [
26
], critics point out that such models often
miss contextual nuances, such as bias in dierent demographic and
psycho-graphic subgroups [
13
]. Some researchers have call for a
more proactive mitigation strategy beyond automated detection. For
example, one study suggested that the polarized opinions sentiment
analyzer system can be used as a plug-in by Twitter to detect and
stop hate speech on its platform [
41
]. This study recognizes this void
in the existing literature: the predominant focus on the context of
COVID-19 and the negligence of the importance of disaggregating
online hatred messages directed at Asians.
3 DATASETS
3.1 Data Collection
We collect 2.6 million messages from X (Twitter at the time of the
data collection) using its APIs for academic access. The search
period is set from August 2019 to July 2020 to include tweets
from pre-COVID-19 and post-COVID-19 peak periods. We use
search keywords that are related to Asia and 21 sub-ethnic cat-
egories based on the U.S. Census Bureau breakdown.
3
We pur-
posely choose generic keywords to avoid collecting tweets that are
only specic to an event (e.g., COVID-19). A complete list of the
chosen search keywords is shown in Appendix B.1. With the speci-
ed period and keywords, the initial data set includes 10 million
tweets, out of which 96.3% of tweets contain with eight major key-
words, ‘China’(+‘Chinese’) (31.5%), ‘India’(+‘Indian’) (19%), ‘Japan’
(+‘Japanese’) (16.7%), ‘Korea’(+‘Korean’) (11.‘%), ‘Asia’+(’Asian’)
(10.8%), ‘Pakistan’+(‘Pakistanis’) (3.1%), ‘Vietnam’+(‘Vietnamese’)
(2.3%), and ‘Indonesia’+(‘Indonesian’) (1.7%). Other search key-
words result in less than 12% of the collected tweets.
3.2 Preprocessing
3.2.1 Perspective API. Among the Perspective’s emotional at-
tributes, we refer to the ‘toxicity’ score for initial examination of
our data. Here, the score lies in between [0, 1], with the highest
score 1 being the most toxic. Toxicity is dened as “a rude, disre-
spectful, or unreasonable comment that is likely to make you leave
a discussion”. Toxicity is known to result in the most reliable score
and has been widely used in previous studies [
14
,
16
]. However,
solely relying on toxicity score could both include false positive and
omit false negative anti-Asian tweets because anti-Asian sentiment
is not always expressed in a toxic manner (see Table 4 in Appen-
dix for example). Accordingly, in addition to the toxicity score, we
introduce a manually annotated label, which indicates whether a
tweet contains anti-Asian sentiment. We elaborate it in detail in
the following.
3.2.2 Manual coding. Although the Perspective API provides the
scores that reect the likelihood of assessed tweets being toxic in a
reliable manner, it is challenging to see whether the toxic expression
was being made towards Asian or specic ethnic groups we are in-
terested in. Likewise, it is possible to dismiss anti-Asian tweets that
have low toxicity score. To address this issue, we manually annotate
subsampled tweets to obtain more target-indicative information.
For subsampling, we rst divide the collected tweets into weekly
batches and sort them based on the corresponding toxicity scores.
From each weekly batch, we randomly sample 20 tweets from ten
groups which are broken down based on the toxicity scores (=200
tweets per week), resulting in 10400 tweets in total:
Group 1: 20 tweets with the scores lie in [0, 0.1],
.
.
.
Group 10: 20 tweets with the scores lie in [0.9, 1.0].
Then human annotators manually label the tweets on:
[Anti-Asian] Does this tweet contain “anti-
Asian” sentiment? (True/False).
3
https://www.census.gov/library/stories/2022/05/aanhpi-population-diverse-
geographically-dispersed.html
摘要:

NotAllAsiansaretheSame:ADisaggregatedApproachtoIdentifyingAnti-AsianRacisminSocialMediaFanWuArizonaStateUniversityTempe,Arizona,USASanyamLakhanpalArizonaStateUniversityTempe,Arizona,USAQianLiArizonaStateUniversityTempe,Arizona,USAKookjinLee∗ArizonaStateUniversityTempe,Arizona,USAKookjin.Lee@asu.eduD...

展开>> 收起<<
Not All Asians are the Same A Disaggregated Approach to Identifying Anti-Asian Racism in Social Media.pdf

共15页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:15 页 大小:1.44MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 15
客服
关注