ModSandbox Facilitating Online Community Moderation Through Error Prediction and Improvement of Automated Rules

2025-05-06 0 0 5.73MB 24 页 10玖币
侵权投诉
ModSandbox: Facilitating Online Community Moderation Through Error
Prediction and Improvement of Automated Rules
JEAN Y. SONG,DGIST, Republic of Korea
SANGWOOK LEE,KAIST, Rupublic of Korea
JISOO LEE, Beeble Inc., Repulic of Korea
MINA KIM, Kakao Corp., Republic of Korea
JUHO KIM, KAIST, Republic of Korea
Fig. 1. ModSandbox supports online community moderators with error prediction and improvement of their automated rules. The
moderators currently monitor their community and update the rules to catch the target posts (posts they want to filter) based on
their previous experience with false positives and negatives. ModSandbox provides features to help predict possible false positives and
false negatives using existing posts (
A Sandbox Environment
and
FP/FN Recommendation
), and to improve automated rules
(FP/FN Collection and Automated Rule Analysis).
Despite the common use of rule-based tools for online content moderation, human moderators still spend a lot of time monitoring them
to ensure that they work as intended. Based on surveys and interviews with Reddit moderators who use AutoModerator, we identied
the main challenges in reducing false positives and false negatives of automated rules: not being able to estimate the actual eect of a
rule in advance and having diculty guring out how the rules should be updated. To address these issues, we built ModSandbox, a
novel virtual sandbox system that detects possible false positives and false negatives of a rule to be improved and visualizes which part
of the rule is causing issues. We conducted a user study with online content moderators, nding that ModSandbox can support quickly
nding possible false positives and false negatives of automated rules and guide moderators to update those to reduce future errors.
CCS Concepts:
Human-centered computing Human computer interaction (HCI)
;
Human computer interaction (HCI)
.
Additional Key Words and Phrases: sociotechnical systems; moderation; automated moderation bots; online communities; virtual
sandbox; human-AI collaboration
Both authors contributed equally to this research.
Authors’ addresses: Jean Y. Song, jeansong@dgist.ac.kr, DGIST, Daegu, Republic of Korea; Sangwook Lee, sangwooklee@kaist.ac.kr, KAIST, Daejeon,
Rupublic of Korea; Jisoo Lee, Beeble Inc., Seoul, Repulic of Korea, jisoo.lee@beeble.ai; Mina Kim, Kakao Corp., Jeju, Republic of Korea, iamhappy537@
gmail.com; Juho Kim, KAIST, Daejeon, Republic of Korea, juhokim@kaist.ac.kr.
1
arXiv:2210.09569v1 [cs.HC] 18 Oct 2022
2 Song and Lee, et al.
1 INTRODUCTION
Communities on social platforms such as Reddit, Discord, and Twitch have a group of users who volunteer to moderate
their communities, called online moderators [
23
,
30
]. They respond to the behavior of community members that violate
rules and work to improve overall interaction experiences between community members [
18
,
40
]. Contrary to large
social platform companies like Facebook and Twitter that apply machine learning algorithms to regulate user generated
content at scale [
17
], small online community moderators prefer using rule-based tools like a keyword lter because
they can customize the programmed conditions to suit their community’s needs, and the tool’s behaviors are more
predictable and interpretable. This helps the moderators apply community-specic norms and transparently explain
the tool’s malfunction to the members when it happens [20, 27].
These rule-based automated tools monitor posts being uploaded in real time and remove, hide, tag, and comment on
posts based on their programmed conditions [
20
]. For example, Reddit moderators use AutoModerator, a site-wide
rule-based moderation tool that can automate tedious moderation tasks [
20
,
27
]. According to the Reddit Transparency
Report 2021, AutoModerator removed 58.9% of the removed Reddit content [
34
]. Discord moderators use various
third-party moderation bots with rule-based moderation functions, such as word lters and user ban lists [
25
,
26
],
which more than 18.1M Discord servers use [44].
However, oftentimes a moderation tool may not work as intended – missing posts that the moderators wanted to
catch (false negative) or catching the posts that the moderators did not want to catch (false positive). These errors
adversely aect the community and require additional work from the moderators. For example, if the automated
tool does not immediately remove hateful speech, it can increase the level of emotional stress in the community [
37
].
On the other hand, if it removes an innocent post, it can cause a backlash from the authors due to being seen as
censorship [
14
,
19
]. This may decrease user engagement and cause them to leave the community [
1
,
11
,
24
]. To resolve
false negatives and false positives, moderators remove or approve posts manually [
40
], or update their automated rules
as an afterthought to prevent future problems [
20
]. We believe that a better solution would be to predict possible false
negatives and false positives beforehand so that moderators can minimize the errors of automated rules before their
deployment.
To understand the challenges with conguring automated rules, we conducted surveys and a round of interviews
with volunteer moderators on Reddit who actively use AutoModerator. From in-depth interviews with ve Reddit
moderators, we found four main challenges moderators encounter during a typical moderation process: 1) there is no
way to estimate the actual eects of a rule in advance, 2) it is hard to detect false positives of a rule after its deployment,
3) it is hard to gure out how the rule should be updated to reduce false positives and false negatives, and 4) it is hard
to understand which part of the rule is causing a problem.
Based on the identied challenges, we built ModSandbox, a sandbox system where moderators can test their
automated rules by using existing community posts before the actual deployment of rules. ModSandbox has four
main features corresponding to the identied challenges: 1) a sandbox to enable prompt conguration evaluation
without aecting the actual posts and comments, 2) a recommendation of possible false positive and false negative
posts to enable faster discovery, , 3) a temporary repository feature to allow users to collect actual false positive or false
negative posts to identify the common patterns in them, and 4) a visualization to analyze how the rule aects the posts.
ModSandbox uses a machine learning-based sentence encoder to calculate the possibility of false positive and false
negative for each post imported into the system.
Online Community Moderation Through Error Prediction and Improvement of Automated Rules 3
We conducted a user study with 10 active online moderators to assess whether and how ModSandbox helps the
conguration process of an automated moderation tool. ModSandbox was able to sort posts in a sandbox for the
participants to easily detect false positives and false negatives. Also, with ModSandbox, participants wrote more
sophisticated rules that can lter target posts more precisely. We observed that the participant tried to improve
their automated rules with structured and iterative processes using ModSandbox features. Finally, we compared their
perceived usefulness scores of ModSandbox and its features according to the types of task and the user’s condition to
highlight their strengths and weaknesses.
We conclude our work by discussing how the proposed design of a system can be improved, potentially facilitate
distributed governance for online communities, and how it can reduce cognitive labor in setting up automated moderation
tools.
2 RELATED WORK
We focus our review on automated content moderation on social platforms and designing systems for online content
moderation. In addition, we provide background information on Reddit’s AutoModerator, which we use for our user
study evaluation.
2.1 Automated Content Moderation on Social Platforms
There are two levels of content moderation on social platforms: community-level moderation led by users and platform-
level moderation led by platform companies [
39
]. Social platforms such as Meta and Twitter employ paid workers to
nd and remove content that violates site policies [
36
]. They focus on policing harmful behaviors such as spreading
fake news and hate speech [
3
,
28
], sharing unhealthy tags [
7
], posting violent or sexual content and using slurs and
swear words. Recently, many platforms have adopted machine learning-based systems to automatically manage their
content at scale [
17
]. For example, Facebook uses algorithms to automatically suspend accounts that do not use real
names. However, Facebook had to update its algorithm regarding the real name policy because its denition of real
name did not include Native Americans who have last names such as “Lone Hill” or “Brown Eyes” [
45
]. We note that
one down side of machine learning-based content moderation is that it lacks context, having the possibility to exclude
or disadvantage minor groups or small communities.
Other social platforms such as Reddit and Discord allow voluntary moderators to manage their communities
themselves [
30
]. Typically, these moderators are elected among community members who understand the community
norm or are invited by other moderators [
40
]. Unlike paid workers on large platforms who do not have the authority to
decide or change the policy of the platform, voluntary moderators are deeply involved in establishing, determining, and
executing their community rules [
40
]. Although the voluntary moderation opportunity increases the degree of freedom
that moderators have in applying the rules for their particular community, it requires moderators to spend a lot of their
time and eort on the moderation tasks. As voluntary moderators cannot spend most of their time monitoring their
communities, many adopt moderation tools provided by the platform [
20
], third-party companies [
2
], and platform
users [
26
]. These tools are mostly rule-based, which allows moderators to directly control how they operate and,
if necessary, to transparently communicate with community members on the cause of moderation errors, i.e. false
positives and false negatives caused by automated rules [20].
Even if these rule-based moderation tools are more straightforward and exible than machine learning-based tools,
they often do not work as the moderators intended, which requires human moderators to constantly update the
congurations to reect their intention [
8
]. For example, users can use abbreviations [
42
], intentional misspellings, and
4 Song and Lee, et al.
lexical variation [
7
] of a banned word to avoid automatically being ltered. The moderators then have to update their
lter by adding these variations [
20
]. While these false negatives are dealt with by updating the rules, false positives
are more annoying because they require moderators to manually reverse each issue [
40
]. In this study, we explore the
eectiveness of a moderation support system that allows its users to predict false positives and false negatives of a
rule-based automated content moderation tool that is applied to the content of their own community so that moderators
can improve their rules to prevent future false positives and false negatives.
2.2 Designing a System for Content Moderation
In the context of online content moderation, many studies have introduced machine learning-based classiers to detect
harmful comments and malicious users in the online space. Types of classier include the detection of cyberbullying [
15
],
profanities and insults [
43
], pornographic content [
41
], hate speech [
13
], and abusive behaviors [
10
,
32
]. As machine
learning techniques evolve, recent studies made classiers multimodal and community-specic, so that the classiers can
reect each community’s preference and culture. For example, Chancellor et al. [
6
] developed a multimodal classication
model to detect images and text that promote eating disorders, which do not fall into the traditional category of harmful
content. Furthermore, Chandrasekharan et al. [
8
] trained macro norms and community-specic to make the classier
more suitable for each community. While previous studies have focused on supervised learning to classify behaviors
generally considered harmful, our study adopts an embedding model pretrained by a large language corpus to nd
comments with few examples that represent the individual moderator’s intention. Our system combines the ltering
results and their semantic similarities with examples to nd possible false positives and false negatives.
Although there have been studies on algorithmic support for online content moderation, few studies have proposed
to visualize actual content of the community, such as comments to help congure automated rules and support the
moderation process. CommentIQ [
33
] is an interactive visualization tool for online news comment moderators, which
helps to nd high-quality comments for readers. The user can lter the comments by criteria, location, and times by
brushing and linking on their distribution visualization. Also, the system allows the users to reect their preference to
high-quality comments into the sorting order by setting the weights for predened criteria. Recently, FilterBuddy [
21
]
introduced a tool for YouTube creators to help moderate comments on their videos. The user could customize the word
lters to hide or remove comments with specic words in existing lter lists. The system used existing comments to
show what and how many comments were ltered, to help evaluate the performance of the lter. In this work, we focus
on system design to help community moderators congure a rule-based automated tool that supports combinations of
word lters to nd posts that violate community rules. Our system shows the expected results of the congured tool
using existing posts in a real community and visualizes the relationship between the posts and the conguration to
help users analyze each lter.
2.3 Background: Reddit AutoModerator
Reddit AutoModerator is a rule-based automated moderation tool developed by one of the Reddit moderators, Chad
Birch, in 2013 [
20
]. By conguring AutoModerator using YAML, Reddit moderators can create their own automated
rules suitable for each subreddit’s preference and culture. In 2015, Reddit ocially integrated AutoModerator into the
platform as a feature of the default moderation tools. According to Reddit transparency reports[
34
], AutoModerator
removed about 103.6M content in 2021, which is 20.9% more than 2020, and 58.9% of all content removed by moderators.
AutoModerator works on all the posts and comments on a subreddit according to the automated rules, which a human
moderator last saved in their AutoModerator. In other words, once moderators change their rules, AutoModerator
Online Community Moderation Through Error Prediction and Improvement of Automated Rules 5
No. Age Moderator Periods Gender AutoModerator Knowledge Congure AutoModerator?
P1 35-44 6 months M Yes, I’m not an expert
but I know enough to use in my own sub Yes, occasionally
P2 35-44 4 years M Yes, I’m an expert Yes, most of the time
P3 35-44 3 years M Yes, I’m not an expert
but I know enough to use in my own sub Yes, most of the time
P4 18-24 2 years M Yes, I’m an expert Yes, most of the time
P5 18-24 5 years M Well, I think I know a little bit Yes, most of the time
Table 1. Background Information of Interview Participants
applies the change to newly uploaded contents, not the previous ones. Most moderators write multiple automated rules
to detect profanity, slurs, and a set of posts that violate specic rules of an individual subreddit. Each rule has one or
more checks and actions. The check consists of a eld that AutoModerator reviews and a list of keywords, phrases, and
regular expressions. The tool veries whether the elds, such as title and body, include any words and phrases or match
with regular expressions in the list. The check also supports verifying content length, the number of user reports, the
account age, reputation score, and other features of the Reddit post. A human moderator can combine multiple checks
to ne-tune the scope of the rule. The rule also includes actions that indicate the moderation action to perform against
the posts identied by the check.
3 INTERVIEW: CHALLENGES ENCOUNTERED DURING CONFIGURATION PROCESS
To reect the current practices and challenges of conguring AutoModerator into the design of our system, we conducted
semi-structured interviews with ve Reddit moderators (Table 1) who have experience conguring AutoModerator. To
recruit AutoModerator users among Reddit moderators, we sent online survey links to moderators selected from a
list of popular subreddits
1
through the internal Reddit mailing system. A total of 50 moderators answered the online
survey that asked for knowledge of how to congure AutoModerator and whether they congured AutoModerator
themselves. Then, we sent interview recruitment emails to survey respondents who responded that they have congured
AutoModerator by themselves occasionally or most of the time, and left their email addresses for a further in-depth
interview.
Each interview session lasted 40-70 minutes through an online conference call and each participant was paid a $30
Amazon gift card for their participation. To extract the challenges of the conguration process from the interview
transcriptions, four authors and one assistant participated in an iterative coding process through multiple pairing
sessions. The authors were randomly paired for each session and coded an interview transcription. We immediately
resolved any disagreement through discussion. After coding all ve transcriptions, the authors gathered for four
consecutive two-hour meetings to interpret and nd patterns in the code and discussed until a consensus on the nal
codebook was reached on derived themes from the process.
According to interviews, their conguration process to update an automated moderation tool could be divided into
two steps: error identication step and rule update step to avoid similar errors in the future. In the following, we
describe the four challenges (C1-C4) that online community moderators face in each step.
C1. No way to estimate the actual eects of a rule in advance. When moderators want to discern errors from Auto-
Moderator they congured, they cannot estimate the actual eects of their rules in advance. Participants said that
1https://www.reddit.com/r/ListOfSubreddits/wiki/listofsubreddits/
摘要:

ModSandbox:FacilitatingOnlineCommunityModerationThroughErrorPredictionandImprovementofAutomatedRulesJEANY.SONG∗,DGIST,RepublicofKoreaSANGWOOKLEE∗,KAIST,RupublicofKoreaJISOOLEE,BeebleInc.,RepulicofKoreaMINAKIM,KakaoCorp.,RepublicofKoreaJUHOKIM,KAIST,RepublicofKoreaFig.1.ModSandboxsupportsonlinecommun...

展开>> 收起<<
ModSandbox Facilitating Online Community Moderation Through Error Prediction and Improvement of Automated Rules.pdf

共24页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:24 页 大小:5.73MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 24
客服
关注