Tracing Semantic Variation in Slang Zhewei Sun1 Yang Xu12 1Department of Computer Science University of Toronto Toronto Canada

2025-05-06 0 0 459.29KB 15 页 10玖币
侵权投诉
Tracing Semantic Variation in Slang
Zhewei Sun1, Yang Xu1,2
1Department of Computer Science, University of Toronto, Toronto, Canada
2Cognitive Science Program, University of Toronto, Toronto, Canada
{zheweisun, yangxu}@cs.toronto.edu
Abstract
The meaning of a slang term can vary in
different communities. However, slang se-
mantic variation is not well understood and
under-explored in the natural language pro-
cessing of slang. One existing view argues that
slang semantic variation is driven by culture-
dependent communicative needs. An alter-
native view focuses on slang’s social func-
tions suggesting that the desire to foster se-
mantic distinction may have led to the histor-
ical emergence of community-specific slang
senses. We explore these theories using com-
putational models and test them against his-
torical slang dictionary entries, with a focus
on characterizing regularity in the geographi-
cal variation of slang usages attested in the US
and the UK over the past two centuries. We
show that our models are able to predict the re-
gional identity of emerging slang word mean-
ings from historical slang records. We offer
empirical evidence that both communicative
need and semantic distinction play a role in the
variation of slang meaning yet their relative im-
portance fluctuates over the course of history.
Our work offers an opportunity for incorporat-
ing historical cultural elements into the natural
language processing of slang.
1 Introduction
Slang is a type of informal language commonly
used in both day-to-day conversations and online
written text. The pervasiveness of slang has gen-
erated increasing interest in the natural language
processing (NLP) community, with systems pro-
posed for automatic detection (Dhuliawala et al.,
2016;Pei et al.,2019), generation (Sun et al.,2019,
2021), and interpretation (Ni and Wang,2017;Sun
et al.,2022) of slang. However, these existing ap-
proaches do not account for the semantic variation
of slang among different groups of users—a defin-
ing characteristic of slang which distinguishes it
from conventional language (Andersson and Trudg-
ill,1992;Mattiello,2005;Eble,2012). Figure 1
Figure 1: Distribution of regional identities among
sense entries found in the English Wiktionary. See Ap-
pendix Afor the detailed experimental setup.
beast
1954: A fast car.
1982: Subway #2 of NYC.
1997: Excellent.
1837: An unpleasant person.
1877: A sexual offender.
1898: A bicycle.
2011: An outstanding example.
“You’re a beast, man. You nailed that sucker.”
Figure 2: Illustration of semantic variation in the slang
word beast, with senses recorded in American and
British English respectively. We develop slang seman-
tic variation models to trace the regional identity of a
new emerging slang sense given its historical meanings
and usages from different regions.
shows a tally of Wiktionary entries confirming that
semantic variation is much more prevalent in slang
compared to conventional language. Here we de-
velop a principled computational approach to inves-
tigate regularity in the semantic variation of slang.
We define semantic variation in slang as how a
slang term might take on different meanings in dif-
ferent communities. For example, Figure 2shows
how the commonly used slang word beast has diver-
gent meanings in different regions (or more specifi-
cally, two different countries in this case). Whereas
it is often used to express positive things or senti-
arXiv:2210.08635v2 [cs.CL] 9 Nov 2022
ment in the US, the same slang word has been used
to express more negative senses in the UK.
Recent work has quantified semantic variation
in non-standard language of online communities
using word and sense embedding models and dis-
covered that community characteristics (e.g., com-
munity size, network density) are relevant factors in
predicting the strength of this variation (Del Tredici
and Fernández,2017;Lucy and Bamman,2021).
However, it is not clear how slang senses vary
among different communities and what might be
the driving forces behind this variation.
As an initial step to model semantic variation
in slang, we focus on regional semantic variation
between the US and the UK by considering a re-
gional inference task illustrated in Figure 2: Given
an emerging slang sense (e.g., “An outstanding ex-
ample”) for a slang word (e.g., beast), infer which
region (e.g., US vs. UK) it might have originated
from based on its historical meanings and usages.
Our premise is that a model capturing the basic prin-
ciples of slang semantic variation should be able
to trace or infer the regional identities of emerging
slang meanings over time.
2 Theoretical Hypotheses
We consider two theoretical hypotheses for char-
acterizing regularity in slang semantic variation:
communicative need and semantic distinction.
Communicative need.
Prior work has sug-
gested that slang may be driven by culture-
dependent communicative need (Sornig,1981). We
refer to communicative need as how frequently a
meaning needs to be communicated or expressed.
Following recent work (e.g., Kemp and Regier
2012;Ryskina et al. 2020), we estimate commu-
nicative need based on usage frequencies from
Google Ngram
1
over the past two centuries.
2
In the
context of slang semantic variation, certain things
might be more frequently talked about in one re-
gion (or country) over another. As such, we might
expect these differential needs to drive meaning
differentiation in slang terms. For example, a US-
specific slang sense for beast describes the subway
line #2 of the New York City transit network, most
likely due to the specific need for communicating
that information in the US (as opposed to the UK).
1https://books.google.com/ngrams
2
We acknowledge that experiment-based methods for es-
timating need exist (see Karjus et al.,2021), but these alter-
native methods are difficult to operationalize at scale and in
naturalistic settings required for our analysis.
Semantic distinction.
We also consider an al-
ternative hypothesis termed semantic distinction
motivated by the social functions of slang (c.f.,
Labov,1972;Hovy,1990)—language that is used
to show and reinforce group identity (Eble,2012).
Under this view, slang senses may develop inde-
pendently in each region and form a semantically
cohesive set of meanings that reflect the cultural
identity of a region. As a result, emerging slang
senses are more likely to be in close semantic prox-
imity with historical slang senses from the same
region.
3
For example, the slang beast has formed
a cluster of senses in the US that describes some-
thing virtuous while senses in the UK often de-
scribe criminals. An emerging sense such as “An
outstanding example” would be considered more
likely to originate from the US due to its similarity
with the historical US senses of beast. Here we
operationalize semantic distinction by models of
semantic chaining from work on historical word
meaning extension (Ramiro et al.,2018;Habibi
et al.,2020), where each region develops a distinct
chain of related regional senses over history.
We evaluate these theories using slang sense
entries from Green’s Dictionary of Slang (GDoS,
Green,2010) over the past two centuries. Anal-
ysis on GDoS entries is appropriate because 1) a
more diverse set of topics is covered compared to
domain-specific slang found in online communities
(e.g., Reddit), and 2) the region and time metadata
associated with individual sense entries support a
diachronic analysis on slang semantic variation. To
preview our results, we show that both communica-
tive need and semantic distinction are relevant fac-
tors in predicting slang semantic variation, with an
exemplar-based chaining model offering the most
robust results overall. Meanwhile, the relative im-
portance of the two factors is time-dependent and
fluctuates over different periods of history.
3 Related Work
3.1 Variation in online language
Previous work in computational social science on
online social media has explored lexical varia-
tion (Eisenstein et al.,2014;Nguyen et al.,2016) by
studying the differences in word choice among dif-
ferent online communities. It has also been shown
3
It is worth nothing that communicative need and semantic
distinction may not be completely orthogonal. In fact, differ-
ences in communicative need may drive semantic distinction.
However, we consider these hypotheses as alternative ones
because they are motivated by different functions.
that linguistic and social variables can predict the
popularity and dissemination of linguistic innova-
tions in online language (Stewart and Eisenstein,
2018;Del Tredici and Fernández,2018). Yang and
Eisenstein (2017) modeled sentiment variation of
words found in tweets, where users with close ties
are assumed to give similar sentiment labels for
the same word. Del Tredici and Fernández (2017)
adapted Bamman et al. (2014)’s distributive em-
bedding model to train community-specific word
embeddings for a small set of Reddit communi-
ties and quantified semantic variation by compar-
ing cosine similarities between community-specific
embeddings for the same word.
Lucy and Bamman (2021) extended the previous
study to quantify semantic variation of online lan-
guage in 474 reddit communities. They compared
PMI based sense specificity of clustered BERT (De-
vlin et al.,2019) embeddings generated using dif-
ferent contextual instances of a word’s usage, along
with an alternative strategy that uses BERT to pre-
dict word substitutions from the same usage in-
stances (Amrami and Goldberg,2019). Lucy and
Bamman (2021) also proposed a regression-based
model of semantic variation with community-based
features (e.g., community size, network density) as
well as topical features derived from Reddit’s sub-
reddit hierarchy. While they find these features
to be informative in predicting the strength of se-
mantic variation, they do not explicitly model how
slang senses vary. Instead of predicting the strength
of semantic variation, our work takes a more direct
approach by modeling how slang senses vary and
study the driving forces underlying such variation.
We also extend our analysis to study attested slang
usages over the past two centuries instead of focus-
ing on contemporary internet slang.
Also related to our work is Keidar et al. (2022)
who performed a causal analysis of slang semantic
change using tweets from 2010 to 2020. Slang’s
usage frequencies were found to change more dras-
tically than those of conventional language while
the semantic change for stable senses progresses
much slower. In our study, we make a comple-
mentary observation in which slang senses from
the 19th century are still relevant for predicting
semantic variation in contemporary slang.
3.2 NLP for slang
Recent work in natural language processing has
shown increasing interest in the automatic process-
ing of novel slang, moving beyond retrieval based
methods (Dhuliawala et al.,2016;Wu et al.,2018;
Gupta et al.,2019) that do not generalize to emerg-
ing slang usages absent in training. In particular,
end-to-end deep neural networks have been pro-
posed for slang detection (Pei et al.,2019), slang in-
terpretation (Ni and Wang,2017), as well as the for-
mation of slang words (Kulkarni and Wang,2018).
Sun et al. (2021) proposed a model of slang seman-
tics based on Siamese networks (Baldi and Chau-
vin,1993;Bromley et al.,1994) to learn joint rep-
resentations for both conventional and slang senses.
The resulting sense representations can then be
used with a semantic chaining model (Ramiro et al.,
2018) to generate novel slang usages (Sun et al.,
2019,2021) or better interpret slang usages in
text (Sun et al.,2022). In those cases, each candi-
date word
w
is considered a class and its conven-
tional senses are taken as class attributes of
w
. We
apply chaining models in different ways from those
in Sun et al. (2021). Instead of treating each word
as a class, we group senses by region and consider
only slang senses.
Previous NLP approaches to slang have often
assumed that slang expressions are homogeneous
across different groups of users. Here, we relax this
assumption by explicitly modeling the factors that
contribute to slang semantic variation. For example,
a slang interpreter could benefit from a semantic
variation model in cases where the region has been
pre-determined, so that an interpreter would pre-
fer the meaning “excellent” when interpreting the
slang beast if the slang is known to be used in the
US. We hope that our work will contribute to more
sophisticated approaches toward the modeling of
informal language for these downstream tasks and
real-world applications.
4 Data
4.1 Green’s Dictionary of Slang (GDoS)
We collect slang lexical entries from Green’s Dic-
tionary of Slang (GDoS, Green,2010)
4
, a historical
English slang dictionary covering more than two
centuries of slang usage. Each word entry (e.g.,
“beast”) in GDoS is associated with one or more
sense entries. A sense entry contains a definition
sentence (e.g., “An outstanding example.”) and a
series of references. Each reference contains a re-
gion tag (e.g., US or UK), a date tag (e.g., 2011),
and a sentence indicating the origin of the reference.
4https://greensdictofslang.com/
摘要:

TracingSemanticVariationinSlangZheweiSun1,YangXu1,21DepartmentofComputerScience,UniversityofToronto,Toronto,Canada2CognitiveScienceProgram,UniversityofToronto,Toronto,Canada{zheweisun,yangxu}@cs.toronto.eduAbstractThemeaningofaslangtermcanvaryindifferentcommunities.However,slangse-manticvariationisn...

展开>> 收起<<
Tracing Semantic Variation in Slang Zhewei Sun1 Yang Xu12 1Department of Computer Science University of Toronto Toronto Canada.pdf

共15页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:15 页 大小:459.29KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 15
客服
关注