
ers, these may not directly lead to physical harm as
in the samples in SAFETEXT. The research in text
generation indicates the hardships in creating mod-
els that can generate safe and truthful text. With
our new dataset, we hope to better analyze the com-
monsense physical safety subset of these issues.
Commonsense Reasoning
Commonsense rea-
soning tasks have focused on various domains, such
as physical commonsense reasoning (Bisk et al.,
2020), visual commonsense reasoning (Zellers
et al.,2019a), and social commonsense reasoning
(Sap et al.,2019). These are framed in tasks such
as knowledge base completion (Li et al.,2016),
question-answering (Talmor et al.,2019), and natu-
ral language inference (Zellers et al.,2019b). Cur-
rent commonsense reasoning tasks typically focus
on generic everyday knowledge. In addition, many
contain samples where the incorrect answers are
easily distinguished among the general population.
Samples that focus on safety knowledge are miss-
ing from the current commonsense benchmarks.
However, it is crucial to evaluate models’ safety
reasoning abilities as they should be able to recog-
nize when text will lead to physical harm. Within
SAFETEXT, the scenarios relate to common occur-
rences and some rarer cases, while containing both
safe and unsafe advice that contextually follows
the scenario. Our unsafe samples are also difficult
to distinguish depending on the person’s knowl-
edge and experiences, making the task increasingly
difficult and important to study.
While SAFETEXT focuses on safety, several of
the previous datasets focus on morality. As a re-
sult, the assigned labels for SafeText versus other
datasets may differ based on the subjective opinions
of these two different categories. In addition, text
relating to commonsense physical safety has not
been closely studied in isolation. This can be due
to the difficulty in creating a dataset consisting of
such text. As the physical harm element of the text
is often subtle and not linked to specific keywords,
it is challenging to collect samples from outside
resources spanning different domains. In the next
section, we discuss how we create a dataset for this
type of text and further analyze existing NLP mod-
els for their inclusion of this harm in the following
sections.
3 Data Collection
To create the SAFETEXT dataset, we collect human-
written posts from Reddit and go through five
stages of filtering and rewriting text. These steps
are outlined in Figure 1and described in the fol-
lowing paragraphs. Screenshots and payment infor-
mation relating to our data collection process can
be seen in the Appendix.
Phase 1: Post Retrieval
We begin our data col-
lection by crawling human-written posts from two
subreddits: DeathProTips
3
and ShittyLifeProTips
4
.
We select these two subreddits as they focus on
giving unethical and unsafe advice to readers re-
garding various situations and contain posts in the
scenario/advice format. Though the subreddits are
satirical versions of other subreddits intended to
give genuine advice (e.g. LifeProTips), we find that
some of the advice is subtly satirical and instead
requires commonsense reasoning to understand it
as unsafe, making it a useful resource to create our
dataset. We retrieve posts between 1/31/2015 and
1/31/2022. To ensure the quality and relevancy of
the posts, we only retrieve those with a score of at
least 5 (as upvoted/downvoted by Reddit users), in-
dicating that the posts follow the subreddit’s theme.
Our post retrieval yields
∼
17,000 posts, such as
“don’t want to pay for a haircut? just join the army
for a free one.” and “trying to catch your dog that
got out/off its leash? shoot him!”.
Phase 2: Physical Harm Filtering
While posts
leading to mental harm may eventually incite phys-
ical harm as well, we are specifically interested
in the subset of unsafe text that will cause direct
physical harm if the actions it describes are fol-
lowed. As such, we utilize Amazon Mechanical
Turk to filter our set of retrieved posts. Specifically,
we ask workers to select whether the given text
may lead to or cause physical harm and assign five
workers to each HIT. We additionally specify that
text leading to mental harm (e.g. hate speech and
cyberbullying) should not be selected as leading
to physical harm in order to prevent these types of
samples from appearing in our dataset. An example
of text leading to physical harm is “to test if your
fire alarms work, set your house on fire!”, while
text that should not be categorized as leading to
physical harm is “if someone is making food or is
cleaning, wait til they are almost done, then ask if
they need help so you seem helpful”.
To aid in quality assurance, we include two addi-
tional posts in each HIT that have been annotated
3https://www.reddit.com/r/DeathProTips
4https://www.reddit.com/r/ShittyLifeProTips