ModSandbox Facilitating Online Community Moderation Through Error Prediction and Improvement of Automated Rules

2025-05-06 0 0 5.73MB 24 页 10玖币

侵权投诉

ModSandbox: Facilitating Online Community Moderation Through Error

Prediction and Improvement of Automated Rules

JEAN Y. SONG∗,DGIST, Republic of Korea

SANGWOOK LEE∗,KAIST, Rupublic of Korea

JISOO LEE, Beeble Inc., Repulic of Korea

MINA KIM, Kakao Corp., Republic of Korea

JUHO KIM, KAIST, Republic of Korea

Fig. 1. ModSandbox supports online community moderators with error prediction and improvement of their automated rules. The

moderators currently monitor their community and update the rules to catch the target posts (posts they want to filter) based on

their previous experience with false positives and negatives. ModSandbox provides features to help predict possible false positives and

false negatives using existing posts (

A Sandbox Environment

and

FP/FN Recommendation

), and to improve automated rules

(FP/FN Collection and Automated Rule Analysis).

Despite the common use of rule-based tools for online content moderation, human moderators still spend a lot of time monitoring them

to ensure that they work as intended. Based on surveys and interviews with Reddit moderators who use AutoModerator, we identied

the main challenges in reducing false positives and false negatives of automated rules: not being able to estimate the actual eect of a

rule in advance and having diculty guring out how the rules should be updated. To address these issues, we built ModSandbox, a

novel virtual sandbox system that detects possible false positives and false negatives of a rule to be improved and visualizes which part

of the rule is causing issues. We conducted a user study with online content moderators, nding that ModSandbox can support quickly

nding possible false positives and false negatives of automated rules and guide moderators to update those to reduce future errors.

CCS Concepts:

•Human-centered computing →Human computer interaction (HCI)

;

Human computer interaction (HCI)

Additional Key Words and Phrases: sociotechnical systems; moderation; automated moderation bots; online communities; virtual

sandbox; human-AI collaboration

∗Both authors contributed equally to this research.

Authors’ addresses: Jean Y. Song, jeansong@dgist.ac.kr, DGIST, Daegu, Republic of Korea; Sangwook Lee, sangwooklee@kaist.ac.kr, KAIST, Daejeon,

Rupublic of Korea; Jisoo Lee, Beeble Inc., Seoul, Repulic of Korea, jisoo.lee@beeble.ai; Mina Kim, Kakao Corp., Jeju, Republic of Korea, iamhappy537@

gmail.com; Juho Kim, KAIST, Daejeon, Republic of Korea, juhokim@kaist.ac.kr.

arXiv:2210.09569v1 [cs.HC] 18 Oct 2022

2 Song and Lee, et al.

1 INTRODUCTION

Communities on social platforms such as Reddit, Discord, and Twitch have a group of users who volunteer to moderate

their communities, called online moderators [

]. They respond to the behavior of community members that violate

rules and work to improve overall interaction experiences between community members [

]. Contrary to large

social platform companies like Facebook and Twitter that apply machine learning algorithms to regulate user generated

content at scale [

], small online community moderators prefer using rule-based tools like a keyword lter because

they can customize the programmed conditions to suit their community’s needs, and the tool’s behaviors are more

predictable and interpretable. This helps the moderators apply community-specic norms and transparently explain

the tool’s malfunction to the members when it happens [20, 27].

These rule-based automated tools monitor posts being uploaded in real time and remove, hide, tag, and comment on

posts based on their programmed conditions [

]. For example, Reddit moderators use AutoModerator, a site-wide

rule-based moderation tool that can automate tedious moderation tasks [

]. According to the Reddit Transparency

Report 2021, AutoModerator removed 58.9% of the removed Reddit content [

]. Discord moderators use various

third-party moderation bots with rule-based moderation functions, such as word lters and user ban lists [

which more than 18.1M Discord servers use [44].

However, oftentimes a moderation tool may not work as intended – missing posts that the moderators wanted to

catch (false negative) or catching the posts that the moderators did not want to catch (false positive). These errors

adversely aect the community and require additional work from the moderators. For example, if the automated

tool does not immediately remove hateful speech, it can increase the level of emotional stress in the community [

On the other hand, if it removes an innocent post, it can cause a backlash from the authors due to being seen as

censorship [

]. This may decrease user engagement and cause them to leave the community [

]. To resolve

false negatives and false positives, moderators remove or approve posts manually [

], or update their automated rules

as an afterthought to prevent future problems [

]. We believe that a better solution would be to predict possible false

negatives and false positives beforehand so that moderators can minimize the errors of automated rules before their

deployment.

To understand the challenges with conguring automated rules, we conducted surveys and a round of interviews

with volunteer moderators on Reddit who actively use AutoModerator. From in-depth interviews with ve Reddit

moderators, we found four main challenges moderators encounter during a typical moderation process: 1) there is no

way to estimate the actual eects of a rule in advance, 2) it is hard to detect false positives of a rule after its deployment,

3) it is hard to gure out how the rule should be updated to reduce false positives and false negatives, and 4) it is hard

to understand which part of the rule is causing a problem.

Based on the identied challenges, we built ModSandbox, a sandbox system where moderators can test their

automated rules by using existing community posts before the actual deployment of rules. ModSandbox has four

main features corresponding to the identied challenges: 1) a sandbox to enable prompt conguration evaluation

without aecting the actual posts and comments, 2) a recommendation of possible false positive and false negative

posts to enable faster discovery, , 3) a temporary repository feature to allow users to collect actual false positive or false

negative posts to identify the common patterns in them, and 4) a visualization to analyze how the rule aects the posts.

ModSandbox uses a machine learning-based sentence encoder to calculate the possibility of false positive and false

negative for each post imported into the system.

Online Community Moderation Through Error Prediction and Improvement of Automated Rules 3

We conducted a user study with 10 active online moderators to assess whether and how ModSandbox helps the

conguration process of an automated moderation tool. ModSandbox was able to sort posts in a sandbox for the

participants to easily detect false positives and false negatives. Also, with ModSandbox, participants wrote more

sophisticated rules that can lter target posts more precisely. We observed that the participant tried to improve

their automated rules with structured and iterative processes using ModSandbox features. Finally, we compared their

perceived usefulness scores of ModSandbox and its features according to the types of task and the user’s condition to

highlight their strengths and weaknesses.

We conclude our work by discussing how the proposed design of a system can be improved, potentially facilitate

distributed governance for online communities, and how it can reduce cognitive labor in setting up automated moderation

tools.

2 RELATED WORK

We focus our review on automated content moderation on social platforms and designing systems for online content

moderation. In addition, we provide background information on Reddit’s AutoModerator, which we use for our user

study evaluation.

2.1 Automated Content Moderation on Social Platforms

There are two levels of content moderation on social platforms: community-level moderation led by users and platform-

level moderation led by platform companies [

]. Social platforms such as Meta and Twitter employ paid workers to

nd and remove content that violates site policies [

]. They focus on policing harmful behaviors such as spreading

fake news and hate speech [

], sharing unhealthy tags [

], posting violent or sexual content and using slurs and

swear words. Recently, many platforms have adopted machine learning-based systems to automatically manage their

content at scale [

]. For example, Facebook uses algorithms to automatically suspend accounts that do not use real

names. However, Facebook had to update its algorithm regarding the real name policy because its denition of real

name did not include Native Americans who have last names such as “Lone Hill” or “Brown Eyes” [

]. We note that

one down side of machine learning-based content moderation is that it lacks context, having the possibility to exclude

or disadvantage minor groups or small communities.

Other social platforms such as Reddit and Discord allow voluntary moderators to manage their communities

themselves [

]. Typically, these moderators are elected among community members who understand the community

norm or are invited by other moderators [

]. Unlike paid workers on large platforms who do not have the authority to

decide or change the policy of the platform, voluntary moderators are deeply involved in establishing, determining, and

executing their community rules [

]. Although the voluntary moderation opportunity increases the degree of freedom

that moderators have in applying the rules for their particular community, it requires moderators to spend a lot of their

time and eort on the moderation tasks. As voluntary moderators cannot spend most of their time monitoring their

communities, many adopt moderation tools provided by the platform [

], third-party companies [

], and platform

users [

]. These tools are mostly rule-based, which allows moderators to directly control how they operate and,

if necessary, to transparently communicate with community members on the cause of moderation errors, i.e. false

positives and false negatives caused by automated rules [20].

Even if these rule-based moderation tools are more straightforward and exible than machine learning-based tools,

they often do not work as the moderators intended, which requires human moderators to constantly update the

congurations to reect their intention [

]. For example, users can use abbreviations [

], intentional misspellings, and

4 Song and Lee, et al.

lexical variation [

] of a banned word to avoid automatically being ltered. The moderators then have to update their

lter by adding these variations [

]. While these false negatives are dealt with by updating the rules, false positives

are more annoying because they require moderators to manually reverse each issue [

]. In this study, we explore the

eectiveness of a moderation support system that allows its users to predict false positives and false negatives of a

rule-based automated content moderation tool that is applied to the content of their own community so that moderators

can improve their rules to prevent future false positives and false negatives.

2.2 Designing a System for Content Moderation

In the context of online content moderation, many studies have introduced machine learning-based classiers to detect

harmful comments and malicious users in the online space. Types of classier include the detection of cyberbullying [

profanities and insults [

], pornographic content [

], hate speech [

], and abusive behaviors [

]. As machine

learning techniques evolve, recent studies made classiers multimodal and community-specic, so that the classiers can

reect each community’s preference and culture. For example, Chancellor et al. [

] developed a multimodal classication

model to detect images and text that promote eating disorders, which do not fall into the traditional category of harmful

content. Furthermore, Chandrasekharan et al. [

] trained macro norms and community-specic to make the classier

more suitable for each community. While previous studies have focused on supervised learning to classify behaviors

generally considered harmful, our study adopts an embedding model pretrained by a large language corpus to nd

comments with few examples that represent the individual moderator’s intention. Our system combines the ltering

results and their semantic similarities with examples to nd possible false positives and false negatives.

Although there have been studies on algorithmic support for online content moderation, few studies have proposed

to visualize actual content of the community, such as comments to help congure automated rules and support the

moderation process. CommentIQ [

] is an interactive visualization tool for online news comment moderators, which

helps to nd high-quality comments for readers. The user can lter the comments by criteria, location, and times by

brushing and linking on their distribution visualization. Also, the system allows the users to reect their preference to

high-quality comments into the sorting order by setting the weights for predened criteria. Recently, FilterBuddy [

]

introduced a tool for YouTube creators to help moderate comments on their videos. The user could customize the word

lters to hide or remove comments with specic words in existing lter lists. The system used existing comments to

show what and how many comments were ltered, to help evaluate the performance of the lter. In this work, we focus

on system design to help community moderators congure a rule-based automated tool that supports combinations of

word lters to nd posts that violate community rules. Our system shows the expected results of the congured tool

using existing posts in a real community and visualizes the relationship between the posts and the conguration to

help users analyze each lter.

2.3 Background: Reddit AutoModerator

Reddit AutoModerator is a rule-based automated moderation tool developed by one of the Reddit moderators, Chad

Birch, in 2013 [

]. By conguring AutoModerator using YAML, Reddit moderators can create their own automated

rules suitable for each subreddit’s preference and culture. In 2015, Reddit ocially integrated AutoModerator into the

platform as a feature of the default moderation tools. According to Reddit transparency reports[

], AutoModerator

removed about 103.6M content in 2021, which is 20.9% more than 2020, and 58.9% of all content removed by moderators.

AutoModerator works on all the posts and comments on a subreddit according to the automated rules, which a human

moderator last saved in their AutoModerator. In other words, once moderators change their rules, AutoModerator

Online Community Moderation Through Error Prediction and Improvement of Automated Rules 5

No. Age Moderator Periods Gender AutoModerator Knowledge Congure AutoModerator?

P1 35-44 6 months M Yes, I’m not an expert

but I know enough to use in my own sub Yes, occasionally

P2 35-44 4 years M Yes, I’m an expert Yes, most of the time

P3 35-44 3 years M Yes, I’m not an expert

but I know enough to use in my own sub Yes, most of the time

P4 18-24 2 years M Yes, I’m an expert Yes, most of the time

P5 18-24 5 years M Well, I think I know a little bit Yes, most of the time

Table 1. Background Information of Interview Participants

applies the change to newly uploaded contents, not the previous ones. Most moderators write multiple automated rules

to detect profanity, slurs, and a set of posts that violate specic rules of an individual subreddit. Each rule has one or

more checks and actions. The check consists of a eld that AutoModerator reviews and a list of keywords, phrases, and

regular expressions. The tool veries whether the elds, such as title and body, include any words and phrases or match

with regular expressions in the list. The check also supports verifying content length, the number of user reports, the

account age, reputation score, and other features of the Reddit post. A human moderator can combine multiple checks

to ne-tune the scope of the rule. The rule also includes actions that indicate the moderation action to perform against

the posts identied by the check.

3 INTERVIEW: CHALLENGES ENCOUNTERED DURING CONFIGURATION PROCESS

To reect the current practices and challenges of conguring AutoModerator into the design of our system, we conducted

semi-structured interviews with ve Reddit moderators (Table 1) who have experience conguring AutoModerator. To

recruit AutoModerator users among Reddit moderators, we sent online survey links to moderators selected from a

list of popular subreddits

through the internal Reddit mailing system. A total of 50 moderators answered the online

survey that asked for knowledge of how to congure AutoModerator and whether they congured AutoModerator

themselves. Then, we sent interview recruitment emails to survey respondents who responded that they have congured

AutoModerator by themselves occasionally or most of the time, and left their email addresses for a further in-depth

interview.

Each interview session lasted 40-70 minutes through an online conference call and each participant was paid a $30

Amazon gift card for their participation. To extract the challenges of the conguration process from the interview

transcriptions, four authors and one assistant participated in an iterative coding process through multiple pairing

sessions. The authors were randomly paired for each session and coded an interview transcription. We immediately

resolved any disagreement through discussion. After coding all ve transcriptions, the authors gathered for four

consecutive two-hour meetings to interpret and nd patterns in the code and discussed until a consensus on the nal

codebook was reached on derived themes from the process.

According to interviews, their conguration process to update an automated moderation tool could be divided into

two steps: error identication step and rule update step to avoid similar errors in the future. In the following, we

describe the four challenges (C1-C4) that online community moderators face in each step.

C1. No way to estimate the actual eects of a rule in advance. When moderators want to discern errors from Auto-

Moderator they congured, they cannot estimate the actual eects of their rules in advance. Participants said that

1https://www.reddit.com/r/ListOfSubreddits/wiki/listofsubreddits/

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ModSandbox:FacilitatingOnlineCommunityModerationThroughErrorPredictionandImprovementofAutomatedRulesJEANY.SONG∗,DGIST,RepublicofKoreaSANGWOOKLEE∗,KAIST,RupublicofKoreaJISOOLEE,BeebleInc.,RepulicofKoreaMINAKIM,KakaoCorp.,RepublicofKoreaJUHOKIM,KAIST,RepublicofKoreaFig.1.ModSandboxsupportsonlinecommun...

展开>> 收起<<

ModSandbox Facilitating Online Community Moderation Through Error Prediction and Improvement of Automated Rules.pdf

共24页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

ModSandbox Facilitating Online Community Moderation Through Error Prediction and Improvement of Automated Rules

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: