Word Sense Induction with Hierarchical Clustering and Mutual Information Maximization Hadi Abdine1 Moussa Kamal Eddine1 Michalis Vazirgiannis12 Davide Buscaldi3

2025-04-24 0 0 441.24KB 10 页 10玖币
侵权投诉
Word Sense Induction with Hierarchical Clustering and Mutual
Information Maximization
Hadi Abdine1, Moussa Kamal Eddine1, Michalis Vazirgiannis1,2, Davide Buscaldi3
1École Polytechnique, 2AUEB, 3Université Sorbonne Paris Nord
Abstract
Word sense induction (WSI) is a difficult prob-
lem in natural language processing that in-
volves the unsupervised automatic detection of
a word’s senses (i.e. meanings). Recent work
achieves significant results on the WSI task
by pre-training a language model that can ex-
clusively disambiguate word senses, whereas
others employ previously pre-trained language
models in conjunction with additional strate-
gies to induce senses. In this paper, we pro-
pose a novel unsupervised method based on
hierarchical clustering and invariant informa-
tion clustering (IIC). The IIC is used to train
a small model to optimize the mutual infor-
mation between two vector representations of
a target word occurring in a pair of synthetic
paraphrases. This model is later used in in-
ference mode to extract a higher quality vec-
tor representation to be used in the hierarchi-
cal clustering. We evaluate our method on
two WSI tasks and in two distinct clustering
configurations (fixed and dynamic number of
clusters). We empirically demonstrate that, in
certain cases, our approach outperforms prior
WSI state-of-the-art methods, while in others,
it achieves a competitive performance.
1 Introduction
The automatic identification of a word’s senses is
an open problem in natural language processing,
known as "word sense induction" (WSI). It is clear
that the task of word sense induction is closely
related to the task of word sense disambiguation
(WSD) relying on a predefined sense inventory (i.e.
WordNet (Fellbaum,1998;Wallace,2007;Feinerer
and Hornik,2020)) and aiming to solve the word’s
ambiguity in context. In WSI, given a target word,
we focus on clustering a collection of sentences us-
ing this word according to its senses. For example,
figure 1shows the different clusters obtained by us-
ing RoBERTa
LARGE
(Liu et al.,2019) of 3000
sentences that contain the word bank collected
from Wikipedia. We can see five different clus-
ters where the corresponding centroids represent
the 2D PCA projection of the average contextual
word vectors of the word bank. The clusters are
obtained using the agglomerative clustering with
cosine affinity and average linkage. Word senses
are more beneficial than simple word forms for a
variety of tasks including Information Retrieval,
Machine Translation and others (Pantel and Lin,
2002). The former are typically represented as a
fixed list of definitions from a manually constructed
lexical database. However, lexical databases are
missing important domain-specific senses. For ex-
ample, these databases often lack explicit semantic
or contextual links between concepts and defini-
tions (Agirre et al.,2009). Hand-crafted lexical
databases also frequently fail to convey the pre-
cise meaning of a target word in a specific context
(Véronis,2004). In order to address these issues,
WSI intends to learn in an unsupervised manner
the various meanings of a given word.
This paper includes the following contributions:
1) We propose a new unsupervised method using
contextual word embeddings (i.e. RoBERTa, BERT
and DeBERTa (He et al.,2021)) that are being
updated with more sense-related information by
maximizing the mutual information between two
instances of the same cluster. To achieve that, we
generate a randomly perturbated replicate of the
given sentence while preserving its meaning. Thus,
we extract different word representations of the
same target with two similar contexts. This method
presents competitive results on WSI tasks.
2) We apply for the first time a method to compute
a dynamic number of senses for each word. We
rely on a recent word polysemy score function (Xy-
polopoulos et al.,2020).
3) We study the sense information per hidden layer
for four different pretrained language models. We
share, for all models, the layers with the best per-
formance on sense-related tasks.
arXiv:2210.05422v1 [cs.CL] 11 Oct 2022
Figure 1: An illustration represents the different sense-based clusters of the word bank with the most frequent
words used in the corresponding contexts. These clusters are obtained using agglomerative clustering on a set of
RoBERTa vectors of the word bank extracted from 3000 sentences collected from Wikipedia. The centre of each
cluster is the 2D PCA vector of the average ’bank’ vectors of the cluster. The size of the points is proportional to
the frequency of its appearance in the context of each sense-based cluster.
2 Related Work
Previous works on WSI use generative statistical
models to solve this task. Mainly, they approach
this task as a topic modeling problem using La-
tent Dirichlet Allocation (LDA) (Lau et al.,2012;
Chang et al.,2014;Goyal and Hovy,2014;Wang
et al.,2015;Komninos and Manandhar,2016). Au-
toSense (Amplayo et al.,2018), one of the most
recent best-performing LDA methods, is based on
two principles: First, senses are represented as a
distribution over topics. Second, the model gen-
erates a pair composed of the target word and its
neighboring word, thus seperating the topic distri-
butions into fine-grained senses based on lexical
semantics. AutoSense throws away the garbage
senses by removing topics distributions that don’t
belong to any instance. Furthermore, it adds new
ones according to the generated (target, neighbor)
pairs which means that fixing the number of senses
by the model is not required. While most of the
WSI methods fix the number of clusters for all
the words, in our work we explore two setups for
the number of clusters, fixed and dynamic. Other
works (Song et al.,2016;Corrêa and Amancio,
2018) use the static word embedding Word2Vec
(Mikolov et al.,2013) to get the representations of
polysemous words before applying the clustering
method.
After the emergence of contextual word Embed-
dings, pretrained language models such as ELMo
(Peters et al.,2018) (based on BiLSTM) and BERT
(Devlin et al.,2019) (based on the transformers)
(Vaswani et al.,2017) are used with additional tech-
niques to induce senses of a target word. (Amrami
and Goldberg,2018) and (Amrami and Goldberg,
2019) use consecutively ELMo and BERT
LARGE
to predict probable substitutes for the target words.
Next, it gives each instance
k
representatives where
each one contains multiple possible substitutes
drawn randomly from the word distribution pre-
dicted by the language model. Each representative
is a vector conducted from TF-IDF. Following, the
representatives are clustered using the agglomera-
tive clustering where the number of clusters is fixed
to 7. Finally, each instance will be assigned to one
or multiple clusters according to the corresponding
cluster of each of its representatives. Instead of
using the word substitutes approach, our work uses
the contextual word embedding extracted from pre-
摘要:

WordSenseInductionwithHierarchicalClusteringandMutualInformationMaximizationHadiAbdine1,MoussaKamalEddine1,MichalisVazirgiannis1;2,DavideBuscaldi31ÉcolePolytechnique,2AUEB,3UniversitéSorbonneParisNordAbstractWordsenseinduction(WSI)isadifcultprob-leminnaturallanguageprocessingthatin-volvestheunsuper...

展开>> 收起<<
Word Sense Induction with Hierarchical Clustering and Mutual Information Maximization Hadi Abdine1 Moussa Kamal Eddine1 Michalis Vazirgiannis12 Davide Buscaldi3.pdf

共10页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:10 页 大小:441.24KB 格式:PDF 时间:2025-04-24

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 10
客服
关注