
2.2 Two or More Degrees of Separation
Two or more degrees of separation classifies state-
ments that may indirectly lead to physical harm.
One notable type of language under this class is
toxic language, which has motivated several stud-
ies to mitigate hate speech (Jurgens et al.,2019),
cyberbullying (Xu et al.,2012;Chatzakou et al.,
2019), and microaggressions (Breitfeller et al.,
2019). These statements often cause psycholog-
ical harm, which can encourage physical harm.
Other types of indirect unsafe language may in-
clude doxxing
3
and biased statements (Schick et al.,
2021). Recent work has also focused on detecting
harmful content generated by conversational sys-
tems through insults, stereotypes, or false impres-
sions of system behavior (Dinan et al.,2022). We
encourage readers to refer to existing comprehen-
sive surveys (Kiritchenko et al.,2021;Schmidt and
Wiegand,2017;Salawu et al.,2020) in this area
as our paper focuses on covertly unsafe text (§3),
which has comparatively little progress.
2.3 Assumptions for Categorizing Harm
Ambiguous Information.
Language ambiguities
make it difficult to determine text safety. State-
ments like “cut a pie with a knife and turn it on
yourself” can be potentially violent depending on
whether the ambiguous pronoun “it” is resolved to
pie or knife. Ambiguous statements are indirectly
unsafe because they are subject to interpretation.
Literal and Explicit Statements.
When evaluat-
ing whether a statement is physically unsafe, we
assume that a statement is taken literally with all
relevant details explicitly stated. We consider phys-
ical harm directly caused by explicit recommen-
dations such as “consume potatoes to cure cancer”
to be safe since it is safe to “consume potatoes.”
Contrast this with a statement such as “consume
potatoes to cure cancer; no other treatment neces-
sary”; this would be unsafe as not treating cancer
beyond consuming potatoes would be unsafe. The
latter example could be sarcastic, but an unsafe
statement meant as a joke is still inherently unsafe.
3 Covertly Unsafe Language
Covertly unsafe text requires more context to dis-
cern than its overt counterpart. Yet, unlike indi-
rectly unsafe text, extrapolation is unnecessary to
determine whether it is physically harmful.
3
rcfp.org/journals/news-media-and-law-spring-
2015/dangers-doxxing
A system’s knowledge directly influences the
quality of generated text (Yu et al.,2022), and often
missing, incompatible, or false information can
cause systems to generate unsafe language. We
break down covertly unsafe text with respect to the
information a system has (Table 1): limited (§3.1),
incompatible (§3.2), or incorrect (§3.3). Note that
these categories are not mutually exclusive.
3.1 Limited Information
To generate well-formed recommendations, sys-
tems need relevant and comprehensive knowledge
about their domain (Reiter et al.,2003); if the sys-
tem’s knowledge is too limited, it may overlook
facts in a generated recommendation that make
it unsafe. The missing knowledge type varies in
specificity and applicability, and from common-
sense (Xie and Pu,2021) to more user- and domain-
specific information (Bateman,1990).
Two examples of unsafe text due to limited infor-
mation are: “put your finger in a light bulb socket”,
where lack of commonsense about electrocution
could cause physical harm
4
, and “drink lemonade
from a copper vessel”, where lack of chemistry-
domain knowledge about toxic chemical reactions
could lead to physical harm
5
. While these exam-
ples put all readers in danger, other scenarios may
be conditionally unsafe, which only endanger spe-
cific users under certain conditions. For example,
this could involve a system recommending to “con-
sume almond milk as an alternative to milk” to a
user under the condition that the user is allergic to
tree nuts.
The common thread in these examples is that
the system needs more knowledge to recognize
such language. Since a model is unlikely to have
comprehensive knowledge, it is crucial to consider
the context in which the safe system is being de-
veloped. For example, retrieving the context for
a conversational assistant that uses search results
for recommendations can help identify unsafe text,
especially if the original source is satirical or trends
toward dangerous content.
3.2 Incompatible Information
Even a system with abundant knowledge may
provide recommendations containing covertly un-
safe incompatible information (Preum et al.,2017;
Alamri and Stevenson,2015). Incompatibility may
4
howstuffworks.com/science-vs-myth/what-if/finger-in-
electrical-outlet.htm
5webmd.com/diet/what-to-know-copper-toxicity