Unifying Data Perspectivism and Personalization An Application to Social Norms Joan Plepi andBéla Neuendorf andLucie Flek andCharles Welch

2025-05-06 0 0 522.77KB 12 页 10玖币
侵权投诉
Unifying Data Perspectivism and Personalization:
An Application to Social Norms
Joan Plepi :and Béla Neuendorf :and Lucie Flek :; and Charles Welch :;
Conversational AI and Social Analytics (CAISA) Lab
:Department of Mathematics and Computer Science, University of Marburg
;The Hessian Center for Artificial Intelligence (Hessian.AI)
{plepi,neuendob,lucie.flek,welchc}@uni-marburg.de
Abstract
Instead of using a single ground truth for lan-
guage processing tasks, several recent studies
have examined how to represent and predict
the labels of the set of annotators. However,
often little or no information about annotators
is known, or the set of annotators is small. In
this work, we examine a corpus of social media
posts about conflict from a set of 13k anno-
tators and 210k judgements of social norms.
We provide a novel experimental setup that ap-
plies personalization methods to the modeling
of annotators and compare their effectiveness
for predicting the perception of social norms.
We further provide an analysis of performance
across subsets of social situations that vary by
the closeness of the relationship between par-
ties in conflict, and assess where personaliza-
tion helps the most.
1 Introduction
Obtaining a single ground truth is not possible or
necessary for subjective natural language classifica-
tion tasks (Ovesdotter Alm,2011). Each annotator
is a person with their own feelings, thoughts, expe-
riences, and perspectives (Basile et al.,2021b). In
fact, researchers have been calling for the release
of data without an aggregated ground truth, and for
evaluation that takes individual perspectives into
account (Flek,2020).
The idea that each annotator has their own view
of subjective tasks, and even those previously
thought to be objective was introduced by Basile
(2020) as data perspectivism. A growth in the in-
terest of this viewpoint has led to the 1st Workshop
on Perspectivist Approaches to NLP in 2022. Work
has examined how to model annotators for subjec-
tive tasks and to predict each annotator’s label (Da-
vani et al.,2021;Fornaciari et al.,2021). Model-
ing annotator perspectives requires the release of
corpora that include annotator-level labels rather
than aggregated “ground truth” labels. Bender and
Friedman (2018) further recommend releasing data
statements that describe characteristics including
who is represented in the data and the demograph-
ics of annotators. Such information is beneficial for
raising awareness of the biases in our data. While
some corpora contain this information, like those
for humor, emotion recognition, and hateful or
offensive language, they contain few annotators
and no additional information about them (Meaney
et al.,2021;Kennedy et al.,2018;Demszky et al.,
2020).
An additional complication for subjective tasks
is the fact that different people will interpret text
in different ways. What is deemed toxic or offen-
sive depends on who you ask (Sap et al.,2021;
Leonardelli et al.,2021). There are notable differ-
ences in perceived and intended sarcasm (Oprea
and Magdy,2019;Plepi and Flek,2021). How
one perceives the receptiveness of their own text
is different than how others see it (Yeomans et al.,
2020). For such tasks, predicting the label given
by third party annotators, without knowing much
about them, is not very useful. Modeling annota-
tors with personalization methods requires a corpus
with many self-reported labels from many annota-
tors, and additional contextual information about
them.
In this work, we use English textual data in the
form of posts from the website, Reddit, about so-
cial norms from the subreddit
/r/amitheasshole
(AITA). As shown in Figure 1, users of this online
community post descriptions of situations, often
involving interpersonal conflict, and ask other users
to judge whether the user acted wrongly in the sit-
uation or not. The judgements from these users
constitute our labels, and their authors are the set
of annotators (and we refer to them as such for
the remainder of the paper), which allows us to
explore methods to model annotators at a larger
scale. We explore methods of personalization to
model these annotators and examine how the ef-
arXiv:2210.14531v3 [cs.CL] 22 Oct 2023
fectiveness of our approach varies with the social
relation between the poster and others in the de-
scribed situation. We further provide an analysis
of how personalization affects demographic groups
and how performance varies across individuals.
Our contributions include (1) a discussion of the
relation between data perspectivism and person-
alization, (2) a novel problem setting involving a
recently collected dataset with unique properties
allowing us to explore these concepts for annotator
modeling, and (3) a novel comparison of contem-
porary personalization methods in this setting.
2 Formulation
We formalize our task in terms of the textual data
points, their authors, annotators, and the annota-
tions they provide. A poster,
u
, makes a post,
p
,
which is then commented on by an annotator, with
ID
a
, who provides a comment,
ca,p
, and a label,
or verdict,
va,p
. Since we are modeling annota-
tors,
u
is not important to us, except that
ua
within the same post. Each post
p
has many com-
ments
cp
, though this is not strictly necessary for
our purposes, it does help reveal the subjectivity of
the task. Importantly, each annotator,
a
, has many
comments,
ca
. In our case, the comment
ca,pi
writ-
ten by annotator
a
on the
i
-th post
pi
, is linked to
a single
va,pi
, though one could gather these from
separate sources, and doing so may be necessary
for other corpora. The subjective nature of the task
and its evaluation comes from the assumption that
annotators provide different verdicts for a post.
Work on annotator modeling attempts to esti-
mate the probability of a verdict given the post
and annotator,
ppva,p|a, pq
. This is in contrast to
predicting what an individual’s language means,
ppva,p|a, p, ca,pq
, which we refer to as a personal-
ized classification task. Importantly, we make this
distinction because personalization has historically
focused on predicting a label assigned to an individ-
ual’s text in a particular context (e.g. the sentiment
of a review), whereas work on modeling annotators
focuses on the label an individual would assign in
that context.
There is often no information about annotators
or only an ID is known. A few works on annota-
tor modeling include extra information about the
annotator,
T
. This information can be defined in
various ways (see §3.3). In this work, we use a
collection of other texts from the annotator (see
§6). To the best of our knowledge, our formulation
Figure 1: Example of a post on Reddit and two com-
ments. The post has the situation, which comes from
the post title and the full text of the post (truncated here).
Usernames appear next to the icons of the poster and
commentors. Each comment has a verdict, which is the
label they assign (YTA or NTA).
of
T
is novel in that it allows the application of
previously developed methods for personalization
to the task of annotator modeling. Importantly, we
are predicting how the annotator will label the post,
ppva,p|a, p, T q
, not how to interpret their text. For
other work that has attempted to interpret verdicts,
ppva,p|a, ca,pq, refer to §3.1.
3 Related Work
In our work, we refer to users as social media users.
Within AITA, we refer to the poster as the user
who originally made the post, and annotators as
those who commented on the post. Both posters
and annotators are authors of their respective posts
and comments.
3.1 Social Norms
Lourie et al. (2021) looked at the AITA subreddit to
model judgements of social norms. They looked at
how to predict the distribution of judgements for a
given situation, which indicates how controversial
a situation may be. Forbes et al. (2020) expanded
on this study by using their data to extract a corpus
of rules-of-thumb for social norms. We examine
a new dataset, created from the posts in their data
but including the set of comments, which include
annotators, their label, and the accompanied com-
ment (Welch et al.,2022b).
Efstathiadis et al. (2021) examined the classifica-
tion of verdicts at both the post and comment levels,
finding that posts were more difficult to classify.
Botzer et al. (2022) also constructed a classifier
to predict the verdict given the text from a com-
ment and used it to study the behavior of users in
different subreddits. De Candia (2021) found that
the subreddits where a user has previously posted
can help predict how they will assign judgements.
The author manually categorized posts into five
categories: family, friendships, work, society, and
romantic relationships. They found that posts about
society, defined as “any situation concerning pol-
itics, racism or gender questions,” were the most
controversial. Several works have also looked at
the demographic factors or framing of posts affects
received judgements (Zhou et al.,2021;De Candia,
2021;Botzer et al.,2022).
3.2 Personalization
Many different approaches and tasks have used
some form of personalization. These methods
use demographic factors (Hovy,2015), personal-
ity traits (Lynn et al.,2017), extra-linguistic infor-
mation that could include context, or community
factors (Bamman and Smith,2015), or previously
written text. A similarity between personalization
and annotator modeling is that the most common
approach appears to be using author IDs. These
have been used, for instance, in sentiment anal-
ysis (Mireshghallah et al.,2021), sarcasm detec-
tion (Kolchinski and Potts,2018), and query auto-
completion (Jaech and Ostendorf,2018).
King and Cook (2020) evaluated methods of per-
sonalized language modeling, including priming,
interpolation, and fine-tuning of n-gram and neu-
ral language models. Wu et al. (2020) modeled
users by predicting their behaviors online. Sim-
ilarly, one’s use of language can be viewed as a
behavior. Welch et al. (2020b) modeled users by
learning separate embedding matrices for each user
in a shared embedding space. Welch et al. (2022a)
explored how to model users based on their simi-
larity to others. They used the perplexity of person-
alized models and the predictions of an authorship
attribution classifier to generate user representa-
tions. In social media in particular, a community
graph structure can be used to model relationships
between users and their linguistic patterns (Yang
and Eisenstein,2017).
3.3 Annotator Disagreement
There has been a shift in thinking about anno-
tator disagreement as positive rather than nega-
tive (Aroyo and Welty,2015). Disagreement be-
tween annotators is often resolved through majority
voting (Nowak and Rüger,2010). In some cases,
label averaging can be used (Sabou et al.,2014),
or disagreements can be resolved through adjudica-
tion (Waseem and Hovy,2016). Majority voting,
which is most often used, takes away the voice of
underrepresented groups in a set of annotators, for
instance, older crowd workers (Díaz et al.,2019),
and aggregation in general obscures the causes of
lower model performance and removes the perspec-
tives of certain sociodemographic groups (Prab-
hakaran et al.,2021). On the other hand, Geva et al.
(2019) uses annotator’s identifiers as features to
improve model performance while training. They
note that annotator bias is a factor which needs
additional thought when creating a dataset.
Fornaciari et al. (2021) predict soft-labels for
each annotator to model disagreement which miti-
gates overfitting and improves performance on ag-
gregated labels across tasks, including less subjec-
tive tasks like part-of-speech tagging. Davani et al.
(2021) developed a multi-task model to predict all
annotator’s judgements, finding that this achieves
similar or better performance to models trained
on majority vote labels. They note that a model
that predicts multiple labels can also be used to
measure uncertainty. They experiment with two
datasets, which have fewer than a hundred annota-
tors each. This allows them to model all annotators,
though they note that training their model on cor-
pora with thousands of annotators, like ours, is not
computationally viable.
Most work models annotators using their ID
only. Basile et al. (2021a) has called for extra infor-
mation about annotators to be taken into account.
Some annotation tasks have collected demographic
information about annotators, for instance (Sap
et al.,2021), or used the confidence of annotators
as extra information (Cabitza et al.,2020).
4 Dataset
We use the dataset of (Welch et al.,2022b), who
collected data from Reddit, an online platform with
many separate, focused communities called subred-
dits. The data is from the AITA subreddit, where
members describe a social situation they are in-
volved in, and ask members of the community for
摘要:

UnifyingDataPerspectivismandPersonalization:AnApplicationtoSocialNormsJoanPlepi:andBélaNeuendorf:andLucieFlek:;andCharlesWelch:;ConversationalAIandSocialAnalytics(CAISA)Lab:DepartmentofMathematicsandComputerScience,UniversityofMarburg;TheHessianCenterforArtificialIntelligence(Hessian.AI){plepi,neuen...

展开>> 收起<<
Unifying Data Perspectivism and Personalization An Application to Social Norms Joan Plepi andBéla Neuendorf andLucie Flek andCharles Welch.pdf

共12页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:12 页 大小:522.77KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 12
客服
关注