Exploration of the Usage of Color Terms by Color-blind Participants in Online Discussion Platforms Ella Rabinovich1Boaz Carmeli12

2025-05-06 0 0 1.21MB 13 页 10玖币
侵权投诉
Exploration of the Usage of Color Terms by Color-blind Participants in
Online Discussion Platforms
Ella Rabinovich1Boaz Carmeli1,2
1IBM Research
2Technion – Israel Institute of Technology
ella.rabinovich1@ibm.com boazc@il.ibm.com
Abstract
Prominent questions about the role of sensory
vs. linguistic input in the way we acquire and
use language have been extensively studied in
the psycholinguistic literature. However, the
relative effect of various factors in a person’s
overall experience on their linguistic system
remains unclear. We study this question by
making a step forward towards a better under-
standing of the conceptual perception of col-
ors by color-blind individuals, as reflected in
their spontaneous linguistic productions. Us-
ing a novel and carefully curated dataset, we
show that red-green color-blind speakers use
the "red" and "green" color terms in less pre-
dictable contexts, and in linguistic environ-
ments evoking mental image to a lower extent,
when compared to their normal-sighted coun-
terparts. These findings shed some new and
interesting light on the role of sensory experi-
ence on our linguistic system.
1 Introduction
Colors play an exceptionally prominent role in our
lives. Simple and vivid, and yet so difficult to
describe or reduce to linguistic terms, our experi-
ence of color has long raised intriguing questions
concerning grounded color semantics – quantify-
ing the associations between words and perceptual
representations – to philosophers, scientists, and
psycholinguists (Chuang et al.,2008;Heer and
Stone,2012). McMahan and Stone (2015) have
shown that the subjective quality of color experi-
ence varies between individuals. A body of work in
color semantics have indicated that color lexicaliza-
tion and usage patterns can be significantly affected
by extra-linguistic factors, such as culture, physi-
cal environment (Athanasopoulos,2011;Josserand
et al.,2021), and the native language of a speaker
(Sarantakis,2014;Matusevych et al.,2018).
How do colors appear to color-blind individu-
als? Does the imperfect perceptual experience of
The authors contributed equally to this work.
red and green in people with red-green visual de-
ficiency (deuteranopia) shape their color-related
linguistic production? Embodied cognition theory
poses that the entirety of our sensory experience –
activities that help us develop a better understand-
ing of word semantics by using our five senses –
shapes our conceptual knowledge (Barsalou et al.,
2008;Foglia and Wilson,2013). As an example,
reading the word "cat" is likely to elicit sensory
experiences we have with cats, such as their sound
and how they look. Embodied cognition theory
thus assumes that all our sensory experiences con-
tribute to our conceptual knowledge and processing,
which, in turn, is reflected in our language.
Related Work Prior work on the effect of color-
blindness on language production is relatively
sparse. Landau and Gleitman (2009) studied the
language of blind children, focusing (among oth-
ers) on the achievements of three blind children’s
in the area of syntax and word learning. The au-
thors found general development patterns similar to
those by their sighted agemates. Representation of
colors in blind and color-blind individuals was stud-
ied in a controlled color-similarity experiment with
37 participants – 15 red-green color-blind, among
others (Shepard and Cooper,1992). The partici-
pants were asked to rank the degree of similarity
between colors, when presented with names-only,
visual colors only, and names+color stimuli. While
significant differences in the similarity judgments
were found for the color-only setting, when color-
deficient participants were presented with names
along with the colors, their rankings became closer
to those by normal-sighted people. This suggests
that linguistic exposure plays a considerable role
in shaping our perception of color representation.
Multiple works have studied the language of visu-
ally impaired and blind children at various stages of
language development, suggesting evidence for dif-
ficulties in just those areas of language acquisition
arXiv:2210.11905v2 [cs.CL] 30 Oct 2022
where visual information can provide input about
the world, stimulating hypotheses about pertinent
aspects of the linguistic system (Andersen et al.,
1984;Pérez-Pereira and Conti-Ramsden,2013).
The puzzling question on the role of sensory vs.
linguistics input in shaping our color perception
remains therefore sound. In this work, we make a
step forward towards better understanding of the
conceptual perception of the red and green colors
in red-green color-blind individuals, as mirrored in
their spontaneous linguistic production.
We perform a first (to the best of our knowledge)
large-scale computational study on the usage of
the "red" and "green" color terms in (self-reported)
population with deutan and protan visual impair-
ment. Using a novel dataset of linguistic produc-
tions by color-blind (CB) individuals, we show that
they use the "red" and "green" color terms in less
predictable contexts, and in linguistic environments
evoking mental image to a lower extent, when com-
pared to normal-sighted (NS) authors.
The contribution of this study is, therefore,
twofold: First, we release a large, diverse, and
carefully curated dataset of linguistic productions
by red-green CB authors, accompanied by a cor-
pus of utterances by NS individuals, aligned on
various linguistic properties. Second, we show pre-
liminary evidence for subtle, yet reliably detected,
divergences in the usage of "red" and "green" by
CB speakers, compared to their NS counterparts.
We make the dataset and our code available for
facilitating future research in this field.1
2 Datasets
We collected datasets used in this work from Reddit
– an online community-driven platform consisting
of numerous forums for news aggregation, content
rating, and discussions. As of 2021, it had over
430 million monthly active users, positioning it as
the sixth most popular social site in the US. Con-
tent entries are organized by areas of interest called
subreddits, ranging from main forums that receive
extensive attention to smaller ones that foster dis-
cussion on niche areas.
2.1 Collection of Posts by CB Users
Multiple subreddits allow their contributors to spec-
ify a flair – a metadata attribute adding context to
1
Code is available at
https://github.com/IBM/
colorblind-language; complying with Reddit’s terms
of use, we provide a full pipeline for re-producing the dataset
(extraction and filtering), rather than the data itself.
the specific subreddit, such as country of origin,
political association, occupation, age, etc. We col-
lected the set of color-blind Reddit authors from
r/colorblind
, considering only those self-
reported as having one of the red-green color blind-
ness types we study in this work: deuteranopia,
deuteranomaly,protanopia, and protanomaly. This
procedure resulted in
2,523
authors in total. Using
the collected list of user IDs, we were further able
to retrieve their entire digital footprint from Reddit,
spanning years 2005 through 2021.
Manual inspection of utterances produced by
the color-blind Reddit users reveals that CB au-
thors occasionally discuss various aspects related
to the impairment, as in "this game’s color-scheme
is not a good fit for colorblind, I cannot tell red
from green". Aiming at the analysis of deficiency-
agnostic linguistic productions, we apply strict fil-
ters on user utterances, by excluding (1) sentences
originating from a manually collected list of sub-
reddits potentially related to the color blindness
phenomenon, and (2) sentences containing words
possibly indicative of the CB impairment, such
as "color", "colorblind", "vision", their inflections
and spelling alternatives (e.g., "colour"), to prevent
potential biases stemming from deficiency-related
discussions. The full list of excluded subreddits
can be found in Appendix A.1.
2.2 Collection of Posts by NS Users
The comparative nature of our analysis requires
a collection of utterances produced by normal-
sighted Reddit authors. Assuming the relatively
low ratio of
8%
of people with the CB deficiency
in the population (Wong,2011), we sampled a large
set of posts and comments from the general popula-
tion of Reddit authors, excluding the (self-reported)
set of CB users. We believe that this approach
largely targets the language of NS authors due to
their large numbers and extensive diversity.
Usage patterns of color terms in linguistic pro-
ductions can be affected by several dimensions: de-
mographic factors (gender, age), language modality
(spoken vs. written), linguistic register (formal vs.
informal), topical preferences, etc. Multiple works
have shown that there exist detectable differences
in the language of male and female speakers, and
that topical tendencies shape both the frequency
and contextual environment of word usage. There-
fore, we strived to create a control set of NS pro-
ductions that would be aligned with CB language
across these dimensions. While achieving a perfect
alignment is impractical, we controlled for two ma-
jor dimensions – gender and topic – while sampling
linguistic productions by NS authors.
Balancing Posts by Author Gender
Color
blindness affects approximately 1 in 12 men (
8
%)
and 1 in 200 women (
0.5
%) in the world (Wong,
2011). Because most commons roots of color blind-
ness are genetic, passed along the X-chromosome,
people with XY chromosomes (most men) only
need one defective chromosome (X) to have the
deficiency (Wong,2011). Roughly speaking, the
phenomenon is
16
times more frequent in men than
in women. The imbalanced
2
:
1
ratio of male (M) to
female (F) Reddit authors
2
imposes an additional
prior distribution to the ratio of men vs. women in
the color-blind population of Reddit, increasing the
estimated frequency of color-blind male authors to
be
32
times higher than that of female in our data.
3
A large body of research has shown that the
language of female authors differs from that of
their male counterparts, exhibiting both topical and
stylistic divergences (Lakoff,1973;Holmes,1984;
Labov,1990), to the extent that texts written by the
two genders are separable automatically (Koppel
et al.,2002;Argamon et al.,2003;Rabinovich et al.,
2017). Gender-linked differences in human color
lexicon, preferences, and perception have been re-
ported in the literature (Arthur et al.,2007;Eckert
and McConnell-Ginet,2013), suggestive of their
effect on both the frequency and contextual linguis-
tic environments of color terms. A valid control
set of authors should, therefore, maintain the same
M:F author ratio as in the CB set, i.e., 32:1.
Recently, Rabinovich et al. (2020) released a
large dataset of posts and comments collected from
the Reddit discussion platform, where each sen-
tence is annotated by the (self-reported) binary au-
thor gender. We exploit this dataset by making use
of utterances by
13,630
male users, and by (ran-
domly downsampled)
425
female users, preserving
the
32:1
M:F author ratio and resulting in the total
of 14,055 authors4and over 45M posts.
Balancing Posts by Topical Threads
Usage pat-
terns of words, and in particular, color terms, are
2According to statistics in shorturl.at/doH02.
3
The collection of color-blind authors does not contain
gender markers; therefore, applying the general Reddit prior
to our set of CB authors is a plausible choice.
4
Authors with self-reported gender that also indicated their
color blindness defect, were excluded from this set.
likely to be affected by their contextual environ-
ment. As an example, using color terms in a topi-
cal thread (subreddit) related to interior design will
differ from that of gaming, health, or world news.
Aiming at similar topical distribution in both CB
and NS sets, we balance the distribution of sen-
tences in various subreddits across the two popula-
tions, by (1) splitting the data at the sentence-level,
(2) using the CB subreddit distribution as the an-
chor, and (3) performing stratified sampling of NS
data to maintain the same relative topical ratios.
Specifically, let
R
=
(r1, r2, r3, ..., rn)
be the rela-
tive ratios of the amount of sentences spanning
n
subreddits in the CB dataset, where
Pri=1
; the
set of NS sentences is then randomly downsampled
in a manner preserving the topical distribution
R
.
Although the absolute number of sentences differs
significantly in the two datasets, the relative ratio
of each topical thread is roughly preserved.
2.3 Color Terms used in this Study
We address our research questions by performing
contrastive analysis of the usage patterns of "red"
and "green", as well as additional eight color terms
exceeding the total count of
1000
in our CB dataset:
"black", "white", "blue", "brown", "gr[ae]y", "yel-
low", "pink" and "purple".
5
This resulted in the to-
tal number of over
80
K and
380
K sentences, each
including at least one of the ten color terms, for the
CB and NS populations, respectively. Differences
(if they exist) are anticipated to be linked to the
CB-deficiency, therefore evident in the usage of
"red" and "green" terms, but not the others.
2.4 Fixed Expressions and Named Entities
Color terms are often used in fixed linguistic expres-
sions – groups of words used together to express
a particular idea or concept that is more specific
than the literal combination of individual words.
Among such expressions are "black music", "red
army", "green energy", etc. Both the production
and comprehension of such expressions is unlikely
to evoke a visual image of color in one’s mind,
hence processing of these terms does not rely on
the ability to visually distinguish between colors.
Therefore, we excluded expressions with salient
non-compositional nature from this work.
A subset of expressions exceeding the
0.5
% rel-
ative frequency among the full set of <color-term
NOUN> adjective phrases considered in this work
5With an exception of "purple" that has 943 occurrences.
摘要:

ExplorationoftheUsageofColorTermsbyColor-blindParticipantsinOnlineDiscussionPlatformsEllaRabinovich1BoazCarmeli1;21IBMResearch2Technion–IsraelInstituteofTechnologyella.rabinovich1@ibm.comboazc@il.ibm.comAbstractProminentquestionsabouttheroleofsensoryvs.linguisticinputinthewayweacquireanduselanguag...

展开>> 收起<<
Exploration of the Usage of Color Terms by Color-blind Participants in Online Discussion Platforms Ella Rabinovich1Boaz Carmeli12.pdf

共13页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:13 页 大小:1.21MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 13
客服
关注