The Ethical Risks of Analyzing Crisis Events on Social Media with Machine Learning

2025-05-06 0 0 623.82KB 11 页 10玖币

侵权投诉

The Ethical Risks of Analyzing Crisis Events on Social

Media with Machine Learning

Angelie Kraft1,2,*,Ricardo Usbeck1,2

1Universität Hamburg, Department of Informatics, Vogt-Kölln-Straße 30, 22527 Hamburg

2Hamburger Informatik Technologie-Center e.V. (HITeC), Vogt-Kölln-Straße 30, 22527 Hamburg

Abstract

Social media platforms provide a continuous stream of real-time news regarding crisis events on a global

scale. Several machine learning methods utilize the crowd-sourced data for the automated detection of

crises and the characterization of their precursors and aftermaths. Early detection and localization of

crisis-related events can help save lives and economies. Yet, the applied automation methods introduce

ethical risks worthy of investigation — especially given their high-stakes societal context. This work

identies and critically examines ethical risk factors of social media analyses of crisis events focusing on

machine learning methods. We aim to sensitize researchers and practitioners to the ethical pitfalls and

promote fairer and more reliable designs.

Keywords

crisis informatics, machine learning, articial intelligence, social media, ethics, risks

1. Introduction

Social media platforms are a bottom-up community-driven means for real-time information

exchange during crisis events [

]. They are an important tool in keeping citizens and authorities

up-to-date in urgent situations [

]. The shared information can help to establish precautionary

measures, organize humanitarian aid, or keep track of missing people. Algorithmic approaches

are used to eciently lter, condense, and extract large amounts of social media posts [

Respective systems nowadays largely rely on deep learning (DL) methods for natural language

processing (NLP) [6], computer vision (CV) [7], or multimodal techniques [8].

The COVID-19 pandemic is a contemporary example where privacy and personal liberties

were sacriced for the quick development of new technologies [

]. Although crisis events ask for

fast responses, the innovation process must not happen at the cost of ethical considerations. In

this paper, we identify the main ethical risks when analyzing social media content via machine

learning (ML) to detect and characterize crises. To scrutinize ethical aspects of technology, we

take on a sociotechnical view [

]: We consider algorithms, their in-, and output data, as well

as the social system within which these are embedded. At the heart of this assessment is the

D2R2’22: International Workshop on Data-driven Resilience Research 2022, July 07, 2022, Leipzig

*Corresponding author.

"angelie.kraft@uni-hamburg.de (A. Kraft); ricardo.usbeck@uni-hamburg.de (R. Usbeck)

~https://krangelie.github.io/ (A. Kraft);

https://www.inf.uni-hamburg.de/en/inst/ab/sems/people/ricardo-usbeck.html (R. Usbeck)

0000-0002-2980-952X (A. Kraft); 0000-0002-0191-7211 (R. Usbeck)

CEUR

Workshop

Proceedings

http://ceur-ws.org

ISSN 1613-0073

CEUR Workshop Proceedings (CEUR-WS.org)

arXiv:2210.03352v1 [cs.LG] 7 Oct 2022

potential long-term impact on people’s well-being, values, expectations, and fair treatment, and

ultimately on whom a computer system serves and whom it harms. We elaborate on each of the

risks to sensitize practitioners and researchers developing and deploying respective systems.

2. Related Work

For several years now, ML methods have been used for the analysis of social media posts

regarding various types of natural disasters, like oods, hurricanes, earthquakes, res, and

draughts around the globe [

]. Systems have been developed to facilitate early warnings and to

support disaster responses or damage assessments [

]. NLP methods can help to distinguish

informative from uninformative texts posted on social media, classify the type of crisis event

the text belongs to [

], or the type of crisis-related content that is discussed (e.g., warnings,

utilities, needs, aected people [

]). The same can be done based on photos through CV

approaches [

]. The semantic content of posts can be further leveraged with spatial and/or

temporal information to facilitate crisis mapping. For the Chennai ood in 2015, Anbalagan

and Valliyammai [

] built a crisis mapping system that classied related tweets regarding their

content type (e.g., requests for help, sympathy, warnings, weather information, infrastructure

damages, etc.). This information was combined with the geographic coordinates derived from

textually mentioned locations via geoparsing. Tools like this which can identify and locate a

crisis-related event can help emergency responders navigate complex information streams.

In 2015, Crawford and Finn [

] outlined dierent classes of limitations of using social media

data in crisis informatics.

Ontological limitations

: Social media activities spike around more

sensational instances, although crises onsets are oftentimes followed-up by long-term eects.

So, the time frame of a virtual

discourse is not representative

of the actual crisis timeline.

Further, applications for humanitarian aid have in the past demonstrated a risk of reifying

power imbalances

: “Although crowdsourcing projects can allow the voices of those closest

to a disaster to be heard, some projects most strongly enhance the agency of international

humanitarians” (p. 495, [

]).

Epistemological limitations

: The interpretability of social

media data is limited by the role that platforms play in shaping the data. Recommendation

systems determine what users get to see and share. Moreover, a platform can be seen as a

cultural context, with its trends and communicative patterns. Contents may exaggerate real

events and be charged with opinion and emotion. Finally, distinguishing between human- and

bot-generated messages is not always feasible.

Ethical issues

: The main point here is the issue

privacy

. Personal statements of users are gathered at a time in which they are especially

vulnerable. Their posts oftentimes include sensitive information about location or well-being

and the needs of themselves or others. Crawford and Finn [

] claim that consent must not be

sacriced for “the greater good”.

The privacy issue was also listed as one ethical risk factor by Alexander [

], alongside

the loss of discretion caused by a tendency for sharing intimate details. Moreover, the au-

thor pointed out that especially wealthy and technologically literate individuals benet from

digital means of disaster management. This adds to the previously mentioned reication of

power imbalances. Finally, the spread of rumors and misinformation through users, as well as

ideology-driven governance of platforms aect the reliability of details and can cause an overall

misrepresentation of crises and their causes.

Regarding the use of articial intelligence (AI) in crisis informatics, Tzachor et al. [

] highlight

issues of the

disparate impact

of algorithmic outputs, as well as the lack of

transparency

and

trustworthiness

of AI models. The authors demand a principle of

ethics with urgency

[

]

which entails (1)

“ethics by design”

to consider ethical risks throughout the development

process and foresee broader societal impacts, (2) validated

robustness of AI systems

, and (3)

building public trust through independent oversight and transparency.

3. Ethical Risks

The presented work consolidates previous ethical risk assessments of crisis informatics with

social media data (Section 2) with an emphasis on ML methods. We expand on previous works

by examining recent technological advancements and newer insights on their potential risks.

For a better overview, the following sections are sorted by data- and algorithm-related concerns.

Please note that there is a conceptual overlap between some of the issues mentioned: e.g.,

limited representativeness of data is problematic because algorithms capture and reproduce

biases [

]. However, awareness of the problem layers allows for an in-depth understanding

and faceted scrutiny of future software.

3.1. Limited Representativeness

To understand who communicates and receives information on social media, it is necessary

to take a disaggregated look at user demographics. In 2020, there were more than 3.6 billion

social media users worldwide.

Facebook ranks rst amongst the most popular platforms, with

2.9 billion users as of January 2022.

Even though Twitter did not make the top ten list with

only 426 million users, it is still the most researched social media platform [

]. The reason for

this might be its easily accessible API for researchers, allowing them to analyze its full stream

of posts. By far margin, the majority of Twitter users come from the United States or Japan

(India ranked third with less than half of the amount of users in Japan, as of January 2022).

April 2021, 38.5% of all Twitter users ranged between ages 25 and 34, and 21% were between 35

and 49 years old.

These numbers indicate that most research done on Twitter corpora is based

on the

perceptions of a non-representative sample of people

. Here, perception relates to

both the reality witnessed by individuals due to spatio-temporal factors, and also to belief and

ideology – especially in the context of crisis [15].

Social media platforms use recommendation systems to display content that echoes users’

interests and opinions. The

lter bubble hypothesis

states that this mechanism leads to

isolated

echo chambers

and polarization of social networks [

]. Regarding the attention

dynamics on social media, some voices recently argued that the Twitter community paid more

attention to the 2022 Ukraine crisis than other wars and genocides happening in the meantime.

1https://www.statista.com/statistics/278414/number-of-worldwide-social-network-users/

2https://www.statista.com/statistics/272014/global-social-networks-ranked-by-number-of-users/

3https://www.statista.com/statistics/242606/number-of-active-twitter-users-in-selected-countries/

4https://www.statista.com/statistics/283119/age-distribution-of-global-twitter-users

5https://www.npr.org/sections/goatsandsoda/2022/03/04/1084230259

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

TheEthicalRisksofAnalyzingCrisisEventsonSocialMediawithMachineLearningAngelieKraft1,2,*,RicardoUsbeck1,21UniversitätHamburg,DepartmentofInformatics,Vogt-Kölln-Straße30,22527Hamburg2HamburgerInformatikTechnologie-Centere.V.(HITeC),Vogt-Kölln-Straße30,22527HamburgAbstractSocialmediaplatformsprovideaco...

展开>> 收起<<

The Ethical Risks of Analyzing Crisis Events on Social Media with Machine Learning.pdf

共11页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

The Ethical Risks of Analyzing Crisis Events on Social Media with Machine Learning

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: