The Ethical Risks of Analyzing Crisis Events on Social Media with Machine Learning

2025-05-06 0 0 623.82KB 11 页 10玖币
侵权投诉
The Ethical Risks of Analyzing Crisis Events on Social
Media with Machine Learning
Angelie Kraft1,2,*,Ricardo Usbeck1,2
1Universität Hamburg, Department of Informatics, Vogt-Kölln-Straße 30, 22527 Hamburg
2Hamburger Informatik Technologie-Center e.V. (HITeC), Vogt-Kölln-Straße 30, 22527 Hamburg
Abstract
Social media platforms provide a continuous stream of real-time news regarding crisis events on a global
scale. Several machine learning methods utilize the crowd-sourced data for the automated detection of
crises and the characterization of their precursors and aftermaths. Early detection and localization of
crisis-related events can help save lives and economies. Yet, the applied automation methods introduce
ethical risks worthy of investigation — especially given their high-stakes societal context. This work
identies and critically examines ethical risk factors of social media analyses of crisis events focusing on
machine learning methods. We aim to sensitize researchers and practitioners to the ethical pitfalls and
promote fairer and more reliable designs.
Keywords
crisis informatics, machine learning, articial intelligence, social media, ethics, risks
1. Introduction
Social media platforms are a bottom-up community-driven means for real-time information
exchange during crisis events [
1
]. They are an important tool in keeping citizens and authorities
up-to-date in urgent situations [
2
,
3
]. The shared information can help to establish precautionary
measures, organize humanitarian aid, or keep track of missing people. Algorithmic approaches
are used to eciently lter, condense, and extract large amounts of social media posts [
4
,
5
].
Respective systems nowadays largely rely on deep learning (DL) methods for natural language
processing (NLP) [6], computer vision (CV) [7], or multimodal techniques [8].
The COVID-19 pandemic is a contemporary example where privacy and personal liberties
were sacriced for the quick development of new technologies [
9
]. Although crisis events ask for
fast responses, the innovation process must not happen at the cost of ethical considerations. In
this paper, we identify the main ethical risks when analyzing social media content via machine
learning (ML) to detect and characterize crises. To scrutinize ethical aspects of technology, we
take on a sociotechnical view [
10
]: We consider algorithms, their in-, and output data, as well
as the social system within which these are embedded. At the heart of this assessment is the
D2R2’22: International Workshop on Data-driven Resilience Research 2022, July 07, 2022, Leipzig
*Corresponding author.
"angelie.kraft@uni-hamburg.de (A. Kraft); ricardo.usbeck@uni-hamburg.de (R. Usbeck)
~https://krangelie.github.io/ (A. Kraft);
https://www.inf.uni-hamburg.de/en/inst/ab/sems/people/ricardo-usbeck.html (R. Usbeck)
0000-0002-2980-952X (A. Kraft); 0000-0002-0191-7211 (R. Usbeck)
©2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR
Workshop
Proceedings
http://ceur-ws.org
ISSN 1613-0073
CEUR Workshop Proceedings (CEUR-WS.org)
arXiv:2210.03352v1 [cs.LG] 7 Oct 2022
potential long-term impact on people’s well-being, values, expectations, and fair treatment, and
ultimately on whom a computer system serves and whom it harms. We elaborate on each of the
risks to sensitize practitioners and researchers developing and deploying respective systems.
2. Related Work
For several years now, ML methods have been used for the analysis of social media posts
regarding various types of natural disasters, like oods, hurricanes, earthquakes, res, and
draughts around the globe [
5
]. Systems have been developed to facilitate early warnings and to
support disaster responses or damage assessments [
4
]. NLP methods can help to distinguish
informative from uninformative texts posted on social media, classify the type of crisis event
the text belongs to [
6
,
11
], or the type of crisis-related content that is discussed (e.g., warnings,
utilities, needs, aected people [
4
]). The same can be done based on photos through CV
approaches [
8
]. The semantic content of posts can be further leveraged with spatial and/or
temporal information to facilitate crisis mapping. For the Chennai ood in 2015, Anbalagan
and Valliyammai [
2
] built a crisis mapping system that classied related tweets regarding their
content type (e.g., requests for help, sympathy, warnings, weather information, infrastructure
damages, etc.). This information was combined with the geographic coordinates derived from
textually mentioned locations via geoparsing. Tools like this which can identify and locate a
crisis-related event can help emergency responders navigate complex information streams.
In 2015, Crawford and Finn [
12
] outlined dierent classes of limitations of using social media
data in crisis informatics.
Ontological limitations
: Social media activities spike around more
sensational instances, although crises onsets are oftentimes followed-up by long-term eects.
So, the time frame of a virtual
discourse is not representative
of the actual crisis timeline.
Further, applications for humanitarian aid have in the past demonstrated a risk of reifying
power imbalances
: Although crowdsourcing projects can allow the voices of those closest
to a disaster to be heard, some projects most strongly enhance the agency of international
humanitarians” (p. 495, [
12
]).
Epistemological limitations
: The interpretability of social
media data is limited by the role that platforms play in shaping the data. Recommendation
systems determine what users get to see and share. Moreover, a platform can be seen as a
cultural context, with its trends and communicative patterns. Contents may exaggerate real
events and be charged with opinion and emotion. Finally, distinguishing between human- and
bot-generated messages is not always feasible.
Ethical issues
: The main point here is the issue
of
privacy
. Personal statements of users are gathered at a time in which they are especially
vulnerable. Their posts oftentimes include sensitive information about location or well-being
and the needs of themselves or others. Crawford and Finn [
12
] claim that consent must not be
sacriced for “the greater good”.
The privacy issue was also listed as one ethical risk factor by Alexander [
13
], alongside
the loss of discretion caused by a tendency for sharing intimate details. Moreover, the au-
thor pointed out that especially wealthy and technologically literate individuals benet from
digital means of disaster management. This adds to the previously mentioned reication of
power imbalances. Finally, the spread of rumors and misinformation through users, as well as
ideology-driven governance of platforms aect the reliability of details and can cause an overall
misrepresentation of crises and their causes.
Regarding the use of articial intelligence (AI) in crisis informatics, Tzachor et al. [
9
] highlight
issues of the
disparate impact
of algorithmic outputs, as well as the lack of
transparency
and
trustworthiness
of AI models. The authors demand a principle of
ethics with urgency
[
9
]
which entails (1)
ethics by design”
to consider ethical risks throughout the development
process and foresee broader societal impacts, (2) validated
robustness of AI systems
, and (3)
building public trust through independent oversight and transparency.
3. Ethical Risks
The presented work consolidates previous ethical risk assessments of crisis informatics with
social media data (Section 2) with an emphasis on ML methods. We expand on previous works
by examining recent technological advancements and newer insights on their potential risks.
For a better overview, the following sections are sorted by data- and algorithm-related concerns.
Please note that there is a conceptual overlap between some of the issues mentioned: e.g.,
limited representativeness of data is problematic because algorithms capture and reproduce
biases [
14
]. However, awareness of the problem layers allows for an in-depth understanding
and faceted scrutiny of future software.
3.1. Limited Representativeness
To understand who communicates and receives information on social media, it is necessary
to take a disaggregated look at user demographics. In 2020, there were more than 3.6 billion
social media users worldwide.
1
Facebook ranks rst amongst the most popular platforms, with
2.9 billion users as of January 2022.
2
Even though Twitter did not make the top ten list with
only 426 million users, it is still the most researched social media platform [
4
]. The reason for
this might be its easily accessible API for researchers, allowing them to analyze its full stream
of posts. By far margin, the majority of Twitter users come from the United States or Japan
(India ranked third with less than half of the amount of users in Japan, as of January 2022).
3
In
April 2021, 38.5% of all Twitter users ranged between ages 25 and 34, and 21% were between 35
and 49 years old.
4
These numbers indicate that most research done on Twitter corpora is based
on the
perceptions of a non-representative sample of people
. Here, perception relates to
both the reality witnessed by individuals due to spatio-temporal factors, and also to belief and
ideology – especially in the context of crisis [15].
Social media platforms use recommendation systems to display content that echoes users’
interests and opinions. The
lter bubble hypothesis
states that this mechanism leads to
isolated
echo chambers
and polarization of social networks [
16
]. Regarding the attention
dynamics on social media, some voices recently argued that the Twitter community paid more
attention to the 2022 Ukraine crisis than other wars and genocides happening in the meantime.
5
1https://www.statista.com/statistics/278414/number-of-worldwide-social-network-users/
2https://www.statista.com/statistics/272014/global-social-networks-ranked-by-number-of-users/
3https://www.statista.com/statistics/242606/number-of-active-twitter-users-in-selected-countries/
4https://www.statista.com/statistics/283119/age-distribution-of-global-twitter-users
5https://www.npr.org/sections/goatsandsoda/2022/03/04/1084230259
摘要:

TheEthicalRisksofAnalyzingCrisisEventsonSocialMediawithMachineLearningAngelieKraft1,2,*,RicardoUsbeck1,21UniversitätHamburg,DepartmentofInformatics,Vogt-Kölln-Straße30,22527Hamburg2HamburgerInformatikTechnologie-Centere.V.(HITeC),Vogt-Kölln-Straße30,22527HamburgAbstractSocialmediaplatformsprovideaco...

展开>> 收起<<
The Ethical Risks of Analyzing Crisis Events on Social Media with Machine Learning.pdf

共11页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:11 页 大小:623.82KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 11
客服
关注