Learning Disentangled Representations for Natural Language Definitions Danilo S. Carvalho1Giangiacomo Mercatali1yYingji Zhang1yAndre Freitas12 Department of Computer Science University of Manchester United Kingdom1

2025-04-29 0 0 3.82MB 14 页 10玖币
侵权投诉
Learning Disentangled Representations for Natural Language Definitions
Danilo S. Carvalho1Giangiacomo Mercatali1Yingji Zhang1Andre Freitas1,2
Department of Computer Science, University of Manchester, United Kingdom1
Idiap Research Institute, Switzerland2
<firstname.lastname>@[postgrad.]manchester.ac.uk
Abstract
Disentangling the encodings of neural mod-
els is a fundamental aspect for improving
interpretability, semantic control and down-
stream task performance in Natural Language
Processing. Currently, most disentanglement
methods are unsupervised or rely on synthetic
datasets with known generative factors. We ar-
gue that recurrent syntactic and semantic reg-
ularities in textual data can be used to provide
the models with both structural biases and gen-
erative factors. We leverage the semantic struc-
tures present in a representative and semanti-
cally dense category of sentence types, defi-
nitional sentences, for training a Variational
Autoencoder to learn disentangled represen-
tations. Our experimental results show that
the proposed model outperforms unsupervised
baselines on several qualitative and quantita-
tive benchmarks for disentanglement, and it
also improves the results in the downstream
task of definition modeling.
1 Introduction
Learning disentangled representations is a funda-
mental step towards enhancing the interpretability
of the encodings in deep generative models, as
well as improving their downstream performance
and generalization ability. Disentangled represen-
tations aim to encode the fundamental structure
of the data in a more explicit manner, where in-
dependent latent variables are embedded for each
generative factor (Bengio et al.,2013).
Previous work in machine learning proposed
to learn disentangled representations by modify-
ing the ELBO objective of the Variational Autoen-
coders (VAE) (Kingma and Welling,2014), within
an unsupervised framework (Higgins et al.,2017;
Kim and Mnih,2018;Chen et al.,2018). On the
other hand, a more recent line of work claims the
benefits of supervision in disentanglement (Lo-
catello et al.,2019) and it advocates the importance
of designing frameworks able to exploit structures
Training
english poets who lived in the lake district
word embedding (w)
z
w, r pθˆw, ˆr
role embedding (r)
Differentia
Quality
Supertype Differentia
Event
Event
Location
Evaluation
Qualitative
Latent traversals
Interpolation
TSNE
Quantitative
Disentanglement
metrics
Downstream task
Definition Modeling
Figure 1: Left: Supervision mechanism with defini-
tion semantic roles (DSR) encoded in the latent space.
The dotted arrow represent the conditional VAE ver-
sion. Right: Evaluation framework.
in the data for introducing inductive biases. In par-
allel, disentanglement approaches for NLP have
been tackling text style transfer, and evaluating the
results with extrinsic metrics, such as style transfer
accuracy (Hu et al.,2017;John et al.,2019;Cheng
et al.,2020).
While style transfer approaches investigate the
ability to disentangle and control syntactic factors
such as tense and gender, the aspect of understand-
ing and disentangling the semantic structure in lan-
guage is under-explored, but with recent attempts
of separating syntactic and semantic latent spaces
showing promising results (Chen et al.,2019;Bao
et al.,2019). Furthermore, evaluating disentangle-
ment is challenging, because it requires knowledge
of generative factors, leading most approaches to
train on synthetic datasets (Higgins et al.,2017;
Zhang et al.,2021).
In this work, we argue that recurrent semantic
structures at sentence level can be leveraged both
as inductive biases for enhancing disentanglement
(
RQ1
) but also for providing meaningful genera-
tive factors that can be employed for evaluating the
degree of disentanglement (
RQ2
). We also inves-
tigate whether organizing the generative factors in
arXiv:2210.02898v2 [cs.CL] 15 Feb 2023
groups may facilitate learning and disentanglement
(
RQ3
). As a result, this work focuses on natural
language definitions, which are a textual resource
characterised by a principled structure in terms of
semantic roles, as demonstrated by previous work
which proposed the extraction of structural and se-
mantic patterns in this kind of data (Silva et al.,
2016,2018).
Seeking to address the highlighted issues and an-
swer the research questions, we make the following
contributions, also depicted in Figure 1.
1) We design a supervised framework for en-
hancing disentanglement in language representa-
tions by conditioning on the information provided
by the semantic role labels (SRL) in natural lan-
guage definitions. We present two mechanisms for
injecting SRL biases into latent variables, firstly,
reconstructing both words and corresponding SRL
in a VAE, secondly, employing SRL information as
input variables for a Conditional VAE (Zhao et al.,
2017).
2) We propose a framework for evaluating the
disentanglement properties of the encodings on
non-synthetic textual datasets. Our evaluation
framework employs semantic role label groupings
as generative factors, enabling the measurement
of several contemporary quantitative metrics. The
results show that the proposed bias injection mech-
anisms are able to increase the degree of disentan-
glement (separability) of the representations.
3) We demonstrate that models trained with our
disentanglement framework are able to outperform
contemporary baselines in the downstream task of
definition modeling (Noraset et al.,2017).
2 Disentangling framework
In this section we first describe the framework de-
signed for improving disentanglement in natural
language definitions with semantic role labels. Sec-
ondly, we present three models, shown in Figure 2
based on the Variational Autoencoder (VAE) (Bow-
man et al.,2016) architecture for achieving disen-
tanglement.
2.1 Disentangling definitions
Definition semantic roles
Our framework is
based on natural language definitions, which are
a particular type of linguistic expression, charac-
terised by high abstraction, and specific phrasal
properties. Previous work in NLP for dictionary
definitions (Silva et al.,2018) has shown that there
are categories that can be consistently found in
most definitions. In fact, Silva et al. (2018) define
precise Semantic Role Labels (SRL) for phrases
representing definitions, under the name of Defini-
tion Semantic Roles (DSR).
The example from (Silva et al.,2018) classifies
the semantic roles within "english poets who lived
in the lake district" as follows. "poets" as noun
category (supertype), "english" as quality of the
term (Differentia Quality), "who lived" as event
that the subject is involved with (differentia event),
and "in the lake district" as the location of the action
(Event location). The full DSRs proposed by Silva
et al. (2018) are reported in Table 9in Appendix A.
Disentangling using SRL
Our goal is to enhance
disentanglement in natural language by injecting
categorical structures into latent variables. We find
that this goal is well aligned with the findings of Lo-
catello et al. (2019), where it is claimed that a
higher degree of disentanglement may benefit from
supervision and inductive biases. Our hypothesis
is that we may leverage such semantic information
for learning representation with higher degree of
disentanglement. While in the context of this work
we use dictionary definitions as a target empirical
setting, we conjecture that these conclusions can
be extended to broader definitional sentence-types.
The core intuition behind the approach is that the
supervision signal should increase the likelihood
of point clustering in regions corresponding to, or
related to the discrete supervision labels, given the
network architecture formulation.
2.2 Definition VAEs
Unsupervised VAE
The first training framework
that we consider is the traditional variational au-
toencoder (VAE) for sentences (Bowman et al.,
2016), which operates in an unsupervised fash-
ion, as in Figure 2a. The unsupervised VAE
employs a multivariate gaussian prior distribu-
tion
p(z)
and generates a sentence
x
with a de-
coder network
pθ(x|z)
. The joint distribution
for the decoder is defined as
p(z)pθ(x|z)
, which,
for a sequence of tokens
x
of length
T
result as
pθ(x|z) = QT
i=1 pθ(xi|x<i, z)
. The VAE objec-
tive consists into maximizing the expectation of the
log-likelihood which is defined as
Ep(x)log pθ(x)
.
Due to the computational intractability of the such
expectation value, the variational distribution
qθ
is
employed to approximate pθ(z|x).
As a result, an evidence lower bound
LVAE
E D
tokens tokens
rec. loss
(a) Unsupervised VAE
E D
tokens tokens
roles
joint loss
(b) Supervised VAE
E D
tokens tokens
roles
rec. loss
roles
(c) CVAE
Figure 2: Proposed architectures for learning disentangled representations in definitions.
(ELBO) where
Ep(x)[log pθ(x)] ≥ LVAE
, is de-
rived as follows:
LTokens =Eqφ(z|x)hlog pθ(x|z)iKLqφ(z|x)||p(z)
DSR supervised VAE
The aim of this model
is to inject the categorical structure of the defini-
tion semantic roles (DSR) into the latent variables,
by factorizing them into the VAE auto-encoding
objective function. In order to achieve this goal,
we introduce the variable r for semantic roles, and
train the "DSR VAE", where both sentence and se-
mantic roles are auto-encoded. The variable
r
here
operates just as
x
, with the corresponding label val-
ues. As a result, two separate losses are produced
and added together for the final loss, as shown in
Figure 2b. The ELBO for semantic roles is defined
as follows:
LRoles =Eqφ(z|r)hlog pθ(r|z)iKLqφ(z|r)||p(z)
The final loss is given by LTokens +LRoles.
Conditional VAE with SRL
For explicitly lever-
aging the definition semantic roles, we propose a
supervision mechanism based on the Conditional
VAE (CVAE) (Zhao et al.,2017), shown in Fig-
ure 2c. Similar to the previously described model,
we instantiate a VAE framework, where
x
is the
variable for the tokens, and
r
for the roles. We
perform auto-encoding for both roles and tokens,
and additionally, we condition the decoder network
on the roles. The CVAE is trained to maximize the
conditional log likelihood of
x
given
r
, which in-
volves an intractable marginalization over the latent
variable z.
The ELBO is defined as:
LCVAE =Eqφ(z|r,x)hlog pθ(x|z, r)i
KLqφ(z|x, r)||p(z|r)
Training
We consider LSTM-based VAE and
Transformer-based VAE (Optimus (Li et al.,2020))
as baselines. The training process follows the vari-
ational autoencoding methodology (Kingma and
Welling,2014). First, tokenization is performed
in the sentences and the roles. The Encoder net-
work involves feeding both first into embedding
layers, then into LSTM / Transformer layers. Sub-
sequently, two vectors
µ
and
σ
are sampled with
two linear layers, and the vector
z
is computed with
the re-parameterization trick. Finally, the decoder
network is built with the LSTM / Transformer lay-
ers and another embedding layer, which return the
same dimension that was given as input.
3 Evaluation framework
We first present the evaluation framework that for
measuring disentanglement, then describe and jus-
tify the generative factor setup used in the experi-
ments.
3.1 DSR as generative factors
While early approaches for disentanglement in
NLP have been proposed in the context of in style
transfer applications (John et al.,2019;Cheng et al.,
2020) and are assessed purely in terms of style
transfer accuracy, evaluating the intrinsic properties
of the latent encodings is fundamental for disentan-
glement, as mentioned in several machine learning
approaches (Higgins et al.,2017;Kim and Mnih,
2018). Recently, Zhang et al. (2021) proposed a
framework for computing several popular quantita-
tive disentanglement metrics such as (Higgins et al.,
2017;Kim and Mnih,2018) testing it on synthetic
datasets. The limitation in (Zhang et al.,2021) is
that it works only with synthetic datasets.
In this work, we propose a method where seman-
tic role labels, such as the ones provided in (Silva
et al.,2018), are used as generative factors for eval-
uating the degree of disentanglement in the en-
codings. The framework, illustrated in Figure 3,
considers multiple generative factors, where each
factor is composed by a number of semantic roles
(for example the factor "location" includes, origin-
location, and event-location). In this way, the
dataset can be seen as the result of a sampling
of multiple generative factors, which is the same
principle used when creating synthetic datasets for
摘要:

LearningDisentangledRepresentationsforNaturalLanguageDenitionsDaniloS.Carvalho1GiangiacomoMercatali1yYingjiZhang1yAndreFreitas1;2DepartmentofComputerScience,UniversityofManchester,UnitedKingdom1IdiapResearchInstitute,Switzerland2@[postgrad.y]manchester.ac.ukAbstractDisentanglingtheencodingsofneural...

展开>> 收起<<
Learning Disentangled Representations for Natural Language Definitions Danilo S. Carvalho1Giangiacomo Mercatali1yYingji Zhang1yAndre Freitas12 Department of Computer Science University of Manchester United Kingdom1.pdf

共14页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:14 页 大小:3.82MB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 14
客服
关注