On the Curious Case of 2norm of Sense Embeddings Yi Zhou University of Liverpool

2025-05-02 0 0 1.58MB 10 页 10玖币

侵权投诉

On the Curious Case of `2norm of Sense Embeddings

Yi Zhou

University of Liverpool

y.zhou71@liverpool.ac.uk

Danushka Bollegala

University of Liverpool, Amazon

danushka@liverpool.ac.uk

Abstract

We show that the `2norm of a static sense

embedding encodes information related to the

frequency of that sense in the training corpus

used to learn the sense embeddings. This ﬁnd-

ing can be seen as an extension of a previously

known relationship for word embeddings to

sense embeddings. Our experimental results

show that, in spite of its simplicity, the `2norm

of sense embeddings is a surprisingly effec-

tive feature for several word sense related tasks

such as (a) most frequent sense prediction, (b)

Word-in-Context (WiC), and (c) Word Sense

Disambiguation (WSD). In particular, by sim-

ply including the `2norm of a sense embed-

ding as a feature in a classiﬁer, we show that

we can improve WiC and WSD methods that

use static sense embeddings.

1 Introduction

Background:

Given a text corpus, static word

embedding learning methods (Pennington et al.

2014,Mikolov et al. 2013a, etc.) learn a single

vector (aka embedding) to represent the meaning

of a word in the corpus. In contrast, static sense

embedding learning methods (Loureiro and Jorge

2019a,Scarlini et al. 2020b, etc.) learn multiple

embeddings for each word, corresponding to the

different senses of that word.

Arora et al. (2016) proposed a random walk

model on the word co-occurrence graph and

showed that if word embeddings are uniformly dis-

tributed over the unit sphere, the log-frequency of

a word in a corpus is proportional to the squared

norm of the static word embedding, learnt from the

corpus. Hashimoto et al. (2016) showed that under

a simple metric random walk over words where

the probability of transitioning from one word to

another depends only on the squared Euclidean dis-

tance between their embeddings, the log-frequency

of word co-occurrences between two words con-

verges to the negative squared Euclidean distance

measured between the corresponding word embed-

dings. Mu and Viswanath (2018) later showed

that word embeddings are distributed in a narrow

cone, hence not satisfying the uniformity assump-

tion used by Arora et al. (2016), however their

result still holds for such anisotropic embeddings.

On the other hand, Arora et al. (2018) showed that a

word embedding can be represented as the linearly-

weighted combination of sense embeddings. How-

ever, to the best of our knowledge, it remains un-

known thus far as to

What is the relationship be-

tween the sense embeddings and the frequency

of a sense?

, the central question that we study in

this paper.

Contributions:

First, by extending the prior re-

sults for word embeddings into sense embed-

dings, we show that the

squared `2norm of a

static sense embedding is proportional to the

log-frequency of the sense in the training cor-

pus.

This ﬁnding has important practical implica-

tions. For example, it is known that assigning every

occurrence of an ambiguous word in a corpus to the

most frequent sense of that word (popularly known

as the Most Frequent Sense (MFS) baseline) is a

surprisingly strong baseline for WSD (McCarthy

et al.,2004,2007). Therefore, the theoretical rela-

tionship which we prove implies that we should be

able to use `2norm to predict the MFS of a word.

Second, we conduct a series of experiments to

empirically validate the above-mentioned relation-

ship. We ﬁnd that the relationship holds for differ-

ent types of static sense embeddings learnt using

methods such as GloVe (Pennington et al.,2014)

and skip-gram with negative sampling (SGNS;

Mikolov et al.,2013b) on SemCor (Miller et al.,

1993).

Third, motivated by our ﬁnding that

norm of

pretrained static sense embeddings encode sense-

frequency related information, we use

norm of

sense embeddings as a feature for several sense-

arXiv:2210.14815v1 [cs.CL] 26 Oct 2022

related tasks such as (a) to predict the MFS of an

ambiguous word, (b) determining whether the same

sense of a word has been used in two different

contexts (WiC; Pilehvar and Camacho-Collados,

2019), and (c) disambiguating the sense of a word

in a sentence (WSD). We ﬁnd that, regardless of

its simplicity,

norm is a surprisingly effective

feature, consistently improving the performance

in all those benchmarks/tasks. The evaluation

scripts is available at:

https://github.com/LivNLP/

L2norm-of-sense-embeddings.

2`2norm vs. Frequency

Let us ﬁrst revisit the generative model proposed

by Arora et al. (2016) for static word embeddings,

where the

-th word,

, in a corpus is generated

at step

of a random walk of a context vector

which represents what is being talked about. The

probability,

p(v|ct)

, of emitting

at time

is mod-

elled using a log-linear word production model,

proportionally to

exp(ct>v)

. If

is a word co-

occurrence graph, where vertices correspond to the

words in the vocabulary,

, the random walker can

be seen as visiting the vertices in

according to

this probability distribution. Arora et al. (2016)

showed that the partition function,

, given by

(1)

for this probabilistic model is a constant

, inde-

pendent of the context c.

Zc=X

exp(c>v)(1)

Assuming that the stationary distribution of this

random walk is uniform over the unit sphere, Arora

et al. (2016) proved the relationship in

(2)

, for

dimensional word embeddings, v∈Rd.

log p(v) = ||v||2

2d−log Z(2)

Let the frequency of

in the corpus be

f(v)

, and

the total number of word occurrences be

Pvf(v)

p(v)

can be estimated using corpus

counts as

f(v)/N

. Because

, and

are con-

stants, independent of

(2)

implies a linear rela-

tionship between log f(v)and ||v||2

To extend this result to sense embeddings, we

observe that the word

generated at step

by the

above-described random walk can be uniquely as-

sociated with a sense id

, corresponding to the

meaning of

as used in

. If we consider a second

sense co-occurrence graph

, where vertices cor-

respond to the sense ids, then the above-mentioned

Figure 1: Part of the word co-occurrence graph

Gv(bottom) shown with the corresponding sense co-

occurrence graph Gs(top). Each word in Gvis mapped

to its correct sense in Gs.

corpus generation process corresponds to a second

random walk on Gs, as shown in Figure 1.

Although an ambiguous word can be mapped

to multiple sense ids across the corpus in differ-

ent contexts, at any given time step

, a word

is mapped to only one vertex in

, determined

by the context

. Indeed a WSD can be seen as

the process of ﬁnding such a mapping. The two

random walks over word and sense id graphs are

isomorphic and converge to the same set of ﬁnal

states (Bauerschmidt et al.,2021). Therefore, an

analogous relationship given by

(3)

can be obtained

by replacing word embeddings,

, with sense em-

beddings, s, in (2).

log p(s) = ||s||2

2ds

−log Z0(3)

Here,

is the dimensionality of the sense em-

beddings

s∈Rds

. Later in § 3, we empiri-

cally show that the normalisation coefﬁcient,

Z0=

Psexp(c>s)

, for sense embeddings also satisﬁes

the self-normalising (Andreas and Klein,2015)

property, thus independent of

. If we abuse the no-

tation

f(s)

to denote also the frequency of

in the

corpus (i.e. the total number of times the random

walker visits the vertex

), from

(3)

it follows that

log f(s)is linearly related to ||s||2

3 Empirical Validation

The theoretical analysis described in § 2 implies

a linear relationship between

log f(s)

and

||s||2

for the learnt sense embeddings. To empirically

verify this relationship, we learn static sense em-

beddings using GloVe and SGNS from SemCor,

which is the largest corpus manually annotated with

WordNet (Miller,1995) sense ids. Speciﬁcally, we

consider the co-occurrences of senses instead of

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

OntheCuriousCaseof`2normofSenseEmbeddingsYiZhouUniversityofLiverpooly.zhou71@liverpool.ac.ukDanushkaBollegalaUniversityofLiverpool,Amazondanushka@liverpool.ac.ukAbstractWeshowthatthe`2normofastaticsenseembeddingencodesinformationrelatedtothefrequencyofthatsenseinthetrainingcorpususedtolearnthesensee...

展开>> 收起<<

On the Curious Case of 2norm of Sense Embeddings Yi Zhou University of Liverpool.pdf

共10页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

On the Curious Case of 2norm of Sense Embeddings Yi Zhou University of Liverpool

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: