Tracing Semantic Variation in Slang Zhewei Sun1 Yang Xu12 1Department of Computer Science University of Toronto Toronto Canada

2025-05-06 0 0 459.29KB 15 页 10玖币

侵权投诉

Tracing Semantic Variation in Slang

Zhewei Sun1, Yang Xu1,2

1Department of Computer Science, University of Toronto, Toronto, Canada

2Cognitive Science Program, University of Toronto, Toronto, Canada

{zheweisun, yangxu}@cs.toronto.edu

Abstract

The meaning of a slang term can vary in

different communities. However, slang se-

mantic variation is not well understood and

under-explored in the natural language pro-

cessing of slang. One existing view argues that

slang semantic variation is driven by culture-

dependent communicative needs. An alter-

native view focuses on slang’s social func-

tions suggesting that the desire to foster se-

mantic distinction may have led to the histor-

ical emergence of community-speciﬁc slang

senses. We explore these theories using com-

putational models and test them against his-

torical slang dictionary entries, with a focus

on characterizing regularity in the geographi-

cal variation of slang usages attested in the US

and the UK over the past two centuries. We

show that our models are able to predict the re-

gional identity of emerging slang word mean-

ings from historical slang records. We offer

empirical evidence that both communicative

need and semantic distinction play a role in the

variation of slang meaning yet their relative im-

portance ﬂuctuates over the course of history.

Our work offers an opportunity for incorporat-

ing historical cultural elements into the natural

language processing of slang.

1 Introduction

Slang is a type of informal language commonly

used in both day-to-day conversations and online

written text. The pervasiveness of slang has gen-

erated increasing interest in the natural language

processing (NLP) community, with systems pro-

posed for automatic detection (Dhuliawala et al.,

2016;Pei et al.,2019), generation (Sun et al.,2019,

2021), and interpretation (Ni and Wang,2017;Sun

et al.,2022) of slang. However, these existing ap-

proaches do not account for the semantic variation

of slang among different groups of users—a deﬁn-

ing characteristic of slang which distinguishes it

from conventional language (Andersson and Trudg-

ill,1992;Mattiello,2005;Eble,2012). Figure 1

Figure 1: Distribution of regional identities among

sense entries found in the English Wiktionary. See Ap-

pendix Afor the detailed experimental setup.

beast

1954: A fast car.

1982: Subway #2 of NYC.

1997: Excellent.

1837: An unpleasant person.

1877: A sexual oﬀender.

1898: A bicycle.

2011: An outstanding example.

“You’re a beast, man. You nailed that sucker.”

Figure 2: Illustration of semantic variation in the slang

word beast, with senses recorded in American and

British English respectively. We develop slang seman-

tic variation models to trace the regional identity of a

new emerging slang sense given its historical meanings

and usages from different regions.

shows a tally of Wiktionary entries conﬁrming that

semantic variation is much more prevalent in slang

compared to conventional language. Here we de-

velop a principled computational approach to inves-

tigate regularity in the semantic variation of slang.

We deﬁne semantic variation in slang as how a

slang term might take on different meanings in dif-

ferent communities. For example, Figure 2shows

how the commonly used slang word beast has diver-

gent meanings in different regions (or more speciﬁ-

cally, two different countries in this case). Whereas

it is often used to express positive things or senti-

arXiv:2210.08635v2 [cs.CL] 9 Nov 2022

ment in the US, the same slang word has been used

to express more negative senses in the UK.

Recent work has quantiﬁed semantic variation

in non-standard language of online communities

using word and sense embedding models and dis-

covered that community characteristics (e.g., com-

munity size, network density) are relevant factors in

predicting the strength of this variation (Del Tredici

and Fernández,2017;Lucy and Bamman,2021).

However, it is not clear how slang senses vary

among different communities and what might be

the driving forces behind this variation.

As an initial step to model semantic variation

in slang, we focus on regional semantic variation

between the US and the UK by considering a re-

gional inference task illustrated in Figure 2: Given

an emerging slang sense (e.g., “An outstanding ex-

ample”) for a slang word (e.g., beast), infer which

region (e.g., US vs. UK) it might have originated

from based on its historical meanings and usages.

Our premise is that a model capturing the basic prin-

ciples of slang semantic variation should be able

to trace or infer the regional identities of emerging

slang meanings over time.

2 Theoretical Hypotheses

We consider two theoretical hypotheses for char-

acterizing regularity in slang semantic variation:

communicative need and semantic distinction.

Communicative need.

Prior work has sug-

gested that slang may be driven by culture-

dependent communicative need (Sornig,1981). We

refer to communicative need as how frequently a

meaning needs to be communicated or expressed.

Following recent work (e.g., Kemp and Regier

2012;Ryskina et al. 2020), we estimate commu-

nicative need based on usage frequencies from

Google Ngram

over the past two centuries.

In the

context of slang semantic variation, certain things

might be more frequently talked about in one re-

gion (or country) over another. As such, we might

expect these differential needs to drive meaning

differentiation in slang terms. For example, a US-

speciﬁc slang sense for beast describes the subway

line #2 of the New York City transit network, most

likely due to the speciﬁc need for communicating

that information in the US (as opposed to the UK).

1https://books.google.com/ngrams

We acknowledge that experiment-based methods for es-

timating need exist (see Karjus et al.,2021), but these alter-

native methods are difﬁcult to operationalize at scale and in

naturalistic settings required for our analysis.

Semantic distinction.

We also consider an al-

ternative hypothesis termed semantic distinction

motivated by the social functions of slang (c.f.,

Labov,1972;Hovy,1990)—language that is used

to show and reinforce group identity (Eble,2012).

Under this view, slang senses may develop inde-

pendently in each region and form a semantically

cohesive set of meanings that reﬂect the cultural

identity of a region. As a result, emerging slang

senses are more likely to be in close semantic prox-

imity with historical slang senses from the same

region.

For example, the slang beast has formed

a cluster of senses in the US that describes some-

thing virtuous while senses in the UK often de-

scribe criminals. An emerging sense such as “An

outstanding example” would be considered more

likely to originate from the US due to its similarity

with the historical US senses of beast. Here we

operationalize semantic distinction by models of

semantic chaining from work on historical word

meaning extension (Ramiro et al.,2018;Habibi

et al.,2020), where each region develops a distinct

chain of related regional senses over history.

We evaluate these theories using slang sense

entries from Green’s Dictionary of Slang (GDoS,

Green,2010) over the past two centuries. Anal-

ysis on GDoS entries is appropriate because 1) a

more diverse set of topics is covered compared to

domain-speciﬁc slang found in online communities

(e.g., Reddit), and 2) the region and time metadata

associated with individual sense entries support a

diachronic analysis on slang semantic variation. To

preview our results, we show that both communica-

tive need and semantic distinction are relevant fac-

tors in predicting slang semantic variation, with an

exemplar-based chaining model offering the most

robust results overall. Meanwhile, the relative im-

portance of the two factors is time-dependent and

ﬂuctuates over different periods of history.

3 Related Work

3.1 Variation in online language

Previous work in computational social science on

online social media has explored lexical varia-

tion (Eisenstein et al.,2014;Nguyen et al.,2016) by

studying the differences in word choice among dif-

ferent online communities. It has also been shown

It is worth nothing that communicative need and semantic

distinction may not be completely orthogonal. In fact, differ-

ences in communicative need may drive semantic distinction.

However, we consider these hypotheses as alternative ones

because they are motivated by different functions.

that linguistic and social variables can predict the

popularity and dissemination of linguistic innova-

tions in online language (Stewart and Eisenstein,

2018;Del Tredici and Fernández,2018). Yang and

Eisenstein (2017) modeled sentiment variation of

words found in tweets, where users with close ties

are assumed to give similar sentiment labels for

the same word. Del Tredici and Fernández (2017)

adapted Bamman et al. (2014)’s distributive em-

bedding model to train community-speciﬁc word

embeddings for a small set of Reddit communi-

ties and quantiﬁed semantic variation by compar-

ing cosine similarities between community-speciﬁc

embeddings for the same word.

Lucy and Bamman (2021) extended the previous

study to quantify semantic variation of online lan-

guage in 474 reddit communities. They compared

PMI based sense speciﬁcity of clustered BERT (De-

vlin et al.,2019) embeddings generated using dif-

ferent contextual instances of a word’s usage, along

with an alternative strategy that uses BERT to pre-

dict word substitutions from the same usage in-

stances (Amrami and Goldberg,2019). Lucy and

Bamman (2021) also proposed a regression-based

model of semantic variation with community-based

features (e.g., community size, network density) as

well as topical features derived from Reddit’s sub-

reddit hierarchy. While they ﬁnd these features

to be informative in predicting the strength of se-

mantic variation, they do not explicitly model how

slang senses vary. Instead of predicting the strength

of semantic variation, our work takes a more direct

approach by modeling how slang senses vary and

study the driving forces underlying such variation.

We also extend our analysis to study attested slang

usages over the past two centuries instead of focus-

ing on contemporary internet slang.

Also related to our work is Keidar et al. (2022)

who performed a causal analysis of slang semantic

change using tweets from 2010 to 2020. Slang’s

usage frequencies were found to change more dras-

tically than those of conventional language while

the semantic change for stable senses progresses

much slower. In our study, we make a comple-

mentary observation in which slang senses from

the 19th century are still relevant for predicting

semantic variation in contemporary slang.

3.2 NLP for slang

Recent work in natural language processing has

shown increasing interest in the automatic process-

ing of novel slang, moving beyond retrieval based

methods (Dhuliawala et al.,2016;Wu et al.,2018;

Gupta et al.,2019) that do not generalize to emerg-

ing slang usages absent in training. In particular,

end-to-end deep neural networks have been pro-

posed for slang detection (Pei et al.,2019), slang in-

terpretation (Ni and Wang,2017), as well as the for-

mation of slang words (Kulkarni and Wang,2018).

Sun et al. (2021) proposed a model of slang seman-

tics based on Siamese networks (Baldi and Chau-

vin,1993;Bromley et al.,1994) to learn joint rep-

resentations for both conventional and slang senses.

The resulting sense representations can then be

used with a semantic chaining model (Ramiro et al.,

2018) to generate novel slang usages (Sun et al.,

2019,2021) or better interpret slang usages in

text (Sun et al.,2022). In those cases, each candi-

date word

is considered a class and its conven-

tional senses are taken as class attributes of

. We

apply chaining models in different ways from those

in Sun et al. (2021). Instead of treating each word

as a class, we group senses by region and consider

only slang senses.

Previous NLP approaches to slang have often

assumed that slang expressions are homogeneous

across different groups of users. Here, we relax this

assumption by explicitly modeling the factors that

contribute to slang semantic variation. For example,

a slang interpreter could beneﬁt from a semantic

variation model in cases where the region has been

pre-determined, so that an interpreter would pre-

fer the meaning “excellent” when interpreting the

slang beast if the slang is known to be used in the

US. We hope that our work will contribute to more

sophisticated approaches toward the modeling of

informal language for these downstream tasks and

real-world applications.

4 Data

4.1 Green’s Dictionary of Slang (GDoS)

We collect slang lexical entries from Green’s Dic-

tionary of Slang (GDoS, Green,2010)

, a historical

English slang dictionary covering more than two

centuries of slang usage. Each word entry (e.g.,

“beast”) in GDoS is associated with one or more

sense entries. A sense entry contains a deﬁnition

sentence (e.g., “An outstanding example.”) and a

series of references. Each reference contains a re-

gion tag (e.g., US or UK), a date tag (e.g., 2011),

and a sentence indicating the origin of the reference.

4https://greensdictofslang.com/

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

TracingSemanticVariationinSlangZheweiSun1,YangXu1,21DepartmentofComputerScience,UniversityofToronto,Toronto,Canada2CognitiveScienceProgram,UniversityofToronto,Toronto,Canada{zheweisun,yangxu}@cs.toronto.eduAbstractThemeaningofaslangtermcanvaryindifferentcommunities.However,slangse-manticvariationisn...

展开>> 收起<<

Tracing Semantic Variation in Slang Zhewei Sun1 Yang Xu12 1Department of Computer Science University of Toronto Toronto Canada.pdf

共15页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Tracing Semantic Variation in Slang Zhewei Sun1 Yang Xu12 1Department of Computer Science University of Toronto Toronto Canada

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: