Unifying Data Perspectivism and Personalization An Application to Social Norms Joan Plepi andBéla Neuendorf andLucie Flek andCharles Welch

2025-05-06 0 0 522.77KB 12 页 10玖币

侵权投诉

Unifying Data Perspectivism and Personalization:

An Application to Social Norms

Joan Plepi :and Béla Neuendorf :and Lucie Flek :; and Charles Welch :;

Conversational AI and Social Analytics (CAISA) Lab

:Department of Mathematics and Computer Science, University of Marburg

;The Hessian Center for Artiﬁcial Intelligence (Hessian.AI)

{plepi,neuendob,lucie.flek,welchc}@uni-marburg.de

Abstract

Instead of using a single ground truth for lan-

guage processing tasks, several recent studies

have examined how to represent and predict

the labels of the set of annotators. However,

often little or no information about annotators

is known, or the set of annotators is small. In

this work, we examine a corpus of social media

posts about conﬂict from a set of 13k anno-

tators and 210k judgements of social norms.

We provide a novel experimental setup that ap-

plies personalization methods to the modeling

of annotators and compare their effectiveness

for predicting the perception of social norms.

We further provide an analysis of performance

across subsets of social situations that vary by

the closeness of the relationship between par-

ties in conﬂict, and assess where personaliza-

tion helps the most.

1 Introduction

Obtaining a single ground truth is not possible or

necessary for subjective natural language classiﬁca-

tion tasks (Ovesdotter Alm,2011). Each annotator

is a person with their own feelings, thoughts, expe-

riences, and perspectives (Basile et al.,2021b). In

fact, researchers have been calling for the release

of data without an aggregated ground truth, and for

evaluation that takes individual perspectives into

account (Flek,2020).

The idea that each annotator has their own view

of subjective tasks, and even those previously

thought to be objective was introduced by Basile

(2020) as data perspectivism. A growth in the in-

terest of this viewpoint has led to the 1st Workshop

on Perspectivist Approaches to NLP in 2022. Work

has examined how to model annotators for subjec-

tive tasks and to predict each annotator’s label (Da-

vani et al.,2021;Fornaciari et al.,2021). Model-

ing annotator perspectives requires the release of

corpora that include annotator-level labels rather

than aggregated “ground truth” labels. Bender and

Friedman (2018) further recommend releasing data

statements that describe characteristics including

who is represented in the data and the demograph-

ics of annotators. Such information is beneﬁcial for

raising awareness of the biases in our data. While

some corpora contain this information, like those

for humor, emotion recognition, and hateful or

offensive language, they contain few annotators

and no additional information about them (Meaney

et al.,2021;Kennedy et al.,2018;Demszky et al.,

2020).

An additional complication for subjective tasks

is the fact that different people will interpret text

in different ways. What is deemed toxic or offen-

sive depends on who you ask (Sap et al.,2021;

Leonardelli et al.,2021). There are notable differ-

ences in perceived and intended sarcasm (Oprea

and Magdy,2019;Plepi and Flek,2021). How

one perceives the receptiveness of their own text

is different than how others see it (Yeomans et al.,

2020). For such tasks, predicting the label given

by third party annotators, without knowing much

about them, is not very useful. Modeling annota-

tors with personalization methods requires a corpus

with many self-reported labels from many annota-

tors, and additional contextual information about

them.

In this work, we use English textual data in the

form of posts from the website, Reddit, about so-

cial norms from the subreddit

/r/amitheasshole

(AITA). As shown in Figure 1, users of this online

community post descriptions of situations, often

involving interpersonal conﬂict, and ask other users

to judge whether the user acted wrongly in the sit-

uation or not. The judgements from these users

constitute our labels, and their authors are the set

of annotators (and we refer to them as such for

the remainder of the paper), which allows us to

explore methods to model annotators at a larger

scale. We explore methods of personalization to

model these annotators and examine how the ef-

arXiv:2210.14531v3 [cs.CL] 22 Oct 2023

fectiveness of our approach varies with the social

relation between the poster and others in the de-

scribed situation. We further provide an analysis

of how personalization affects demographic groups

and how performance varies across individuals.

Our contributions include (1) a discussion of the

relation between data perspectivism and person-

alization, (2) a novel problem setting involving a

recently collected dataset with unique properties

allowing us to explore these concepts for annotator

modeling, and (3) a novel comparison of contem-

porary personalization methods in this setting.

2 Formulation

We formalize our task in terms of the textual data

points, their authors, annotators, and the annota-

tions they provide. A poster,

, makes a post,

which is then commented on by an annotator, with

, who provides a comment,

ca,p

, and a label,

or verdict,

va,p

. Since we are modeling annota-

tors,

is not important to us, except that

u‰a

within the same post. Each post

has many com-

ments

, though this is not strictly necessary for

our purposes, it does help reveal the subjectivity of

the task. Importantly, each annotator,

, has many

comments,

. In our case, the comment

ca,pi

writ-

ten by annotator

on the

-th post

, is linked to

a single

va,pi

, though one could gather these from

separate sources, and doing so may be necessary

for other corpora. The subjective nature of the task

and its evaluation comes from the assumption that

annotators provide different verdicts for a post.

Work on annotator modeling attempts to esti-

mate the probability of a verdict given the post

and annotator,

ppva,p|a, pq

. This is in contrast to

predicting what an individual’s language means,

ppva,p|a, p, ca,pq

, which we refer to as a personal-

ized classiﬁcation task. Importantly, we make this

distinction because personalization has historically

focused on predicting a label assigned to an individ-

ual’s text in a particular context (e.g. the sentiment

of a review), whereas work on modeling annotators

focuses on the label an individual would assign in

that context.

There is often no information about annotators

or only an ID is known. A few works on annota-

tor modeling include extra information about the

annotator,

. This information can be deﬁned in

various ways (see §3.3). In this work, we use a

collection of other texts from the annotator (see

§6). To the best of our knowledge, our formulation

Figure 1: Example of a post on Reddit and two com-

ments. The post has the situation, which comes from

the post title and the full text of the post (truncated here).

Usernames appear next to the icons of the poster and

commentors. Each comment has a verdict, which is the

label they assign (YTA or NTA).

is novel in that it allows the application of

previously developed methods for personalization

to the task of annotator modeling. Importantly, we

are predicting how the annotator will label the post,

ppva,p|a, p, T q

, not how to interpret their text. For

other work that has attempted to interpret verdicts,

ppva,p|a, ca,pq, refer to §3.1.

3 Related Work

In our work, we refer to users as social media users.

Within AITA, we refer to the poster as the user

who originally made the post, and annotators as

those who commented on the post. Both posters

and annotators are authors of their respective posts

and comments.

3.1 Social Norms

Lourie et al. (2021) looked at the AITA subreddit to

model judgements of social norms. They looked at

how to predict the distribution of judgements for a

given situation, which indicates how controversial

a situation may be. Forbes et al. (2020) expanded

on this study by using their data to extract a corpus

of rules-of-thumb for social norms. We examine

a new dataset, created from the posts in their data

but including the set of comments, which include

annotators, their label, and the accompanied com-

ment (Welch et al.,2022b).

Efstathiadis et al. (2021) examined the classiﬁca-

tion of verdicts at both the post and comment levels,

ﬁnding that posts were more difﬁcult to classify.

Botzer et al. (2022) also constructed a classiﬁer

to predict the verdict given the text from a com-

ment and used it to study the behavior of users in

different subreddits. De Candia (2021) found that

the subreddits where a user has previously posted

can help predict how they will assign judgements.

The author manually categorized posts into ﬁve

categories: family, friendships, work, society, and

romantic relationships. They found that posts about

society, deﬁned as “any situation concerning pol-

itics, racism or gender questions,” were the most

controversial. Several works have also looked at

the demographic factors or framing of posts affects

received judgements (Zhou et al.,2021;De Candia,

2021;Botzer et al.,2022).

3.2 Personalization

Many different approaches and tasks have used

some form of personalization. These methods

use demographic factors (Hovy,2015), personal-

ity traits (Lynn et al.,2017), extra-linguistic infor-

mation that could include context, or community

factors (Bamman and Smith,2015), or previously

written text. A similarity between personalization

and annotator modeling is that the most common

approach appears to be using author IDs. These

have been used, for instance, in sentiment anal-

ysis (Mireshghallah et al.,2021), sarcasm detec-

tion (Kolchinski and Potts,2018), and query auto-

completion (Jaech and Ostendorf,2018).

King and Cook (2020) evaluated methods of per-

sonalized language modeling, including priming,

interpolation, and ﬁne-tuning of n-gram and neu-

ral language models. Wu et al. (2020) modeled

users by predicting their behaviors online. Sim-

ilarly, one’s use of language can be viewed as a

behavior. Welch et al. (2020b) modeled users by

learning separate embedding matrices for each user

in a shared embedding space. Welch et al. (2022a)

explored how to model users based on their simi-

larity to others. They used the perplexity of person-

alized models and the predictions of an authorship

attribution classiﬁer to generate user representa-

tions. In social media in particular, a community

graph structure can be used to model relationships

between users and their linguistic patterns (Yang

and Eisenstein,2017).

3.3 Annotator Disagreement

There has been a shift in thinking about anno-

tator disagreement as positive rather than nega-

tive (Aroyo and Welty,2015). Disagreement be-

tween annotators is often resolved through majority

voting (Nowak and Rüger,2010). In some cases,

label averaging can be used (Sabou et al.,2014),

or disagreements can be resolved through adjudica-

tion (Waseem and Hovy,2016). Majority voting,

which is most often used, takes away the voice of

underrepresented groups in a set of annotators, for

instance, older crowd workers (Díaz et al.,2019),

and aggregation in general obscures the causes of

lower model performance and removes the perspec-

tives of certain sociodemographic groups (Prab-

hakaran et al.,2021). On the other hand, Geva et al.

(2019) uses annotator’s identiﬁers as features to

improve model performance while training. They

note that annotator bias is a factor which needs

additional thought when creating a dataset.

Fornaciari et al. (2021) predict soft-labels for

each annotator to model disagreement which miti-

gates overﬁtting and improves performance on ag-

gregated labels across tasks, including less subjec-

tive tasks like part-of-speech tagging. Davani et al.

(2021) developed a multi-task model to predict all

annotator’s judgements, ﬁnding that this achieves

similar or better performance to models trained

on majority vote labels. They note that a model

that predicts multiple labels can also be used to

measure uncertainty. They experiment with two

datasets, which have fewer than a hundred annota-

tors each. This allows them to model all annotators,

though they note that training their model on cor-

pora with thousands of annotators, like ours, is not

computationally viable.

Most work models annotators using their ID

only. Basile et al. (2021a) has called for extra infor-

mation about annotators to be taken into account.

Some annotation tasks have collected demographic

information about annotators, for instance (Sap

et al.,2021), or used the conﬁdence of annotators

as extra information (Cabitza et al.,2020).

4 Dataset

We use the dataset of (Welch et al.,2022b), who

collected data from Reddit, an online platform with

many separate, focused communities called subred-

dits. The data is from the AITA subreddit, where

members describe a social situation they are in-

volved in, and ask members of the community for

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

UnifyingDataPerspectivismandPersonalization:AnApplicationtoSocialNormsJoanPlepi:andBélaNeuendorf:andLucieFlek:;andCharlesWelch:;ConversationalAIandSocialAnalytics(CAISA)Lab:DepartmentofMathematicsandComputerScience,UniversityofMarburg;TheHessianCenterforArtificialIntelligence(Hessian.AI){plepi,neuen...

展开>> 收起<<

Unifying Data Perspectivism and Personalization An Application to Social Norms Joan Plepi andBéla Neuendorf andLucie Flek andCharles Welch.pdf

共12页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Unifying Data Perspectivism and Personalization An Application to Social Norms Joan Plepi andBéla Neuendorf andLucie Flek andCharles Welch

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: