groups may facilitate learning and disentanglement
(
RQ3
). As a result, this work focuses on natural
language definitions, which are a textual resource
characterised by a principled structure in terms of
semantic roles, as demonstrated by previous work
which proposed the extraction of structural and se-
mantic patterns in this kind of data (Silva et al.,
2016,2018).
Seeking to address the highlighted issues and an-
swer the research questions, we make the following
contributions, also depicted in Figure 1.
1) We design a supervised framework for en-
hancing disentanglement in language representa-
tions by conditioning on the information provided
by the semantic role labels (SRL) in natural lan-
guage definitions. We present two mechanisms for
injecting SRL biases into latent variables, firstly,
reconstructing both words and corresponding SRL
in a VAE, secondly, employing SRL information as
input variables for a Conditional VAE (Zhao et al.,
2017).
2) We propose a framework for evaluating the
disentanglement properties of the encodings on
non-synthetic textual datasets. Our evaluation
framework employs semantic role label groupings
as generative factors, enabling the measurement
of several contemporary quantitative metrics. The
results show that the proposed bias injection mech-
anisms are able to increase the degree of disentan-
glement (separability) of the representations.
3) We demonstrate that models trained with our
disentanglement framework are able to outperform
contemporary baselines in the downstream task of
definition modeling (Noraset et al.,2017).
2 Disentangling framework
In this section we first describe the framework de-
signed for improving disentanglement in natural
language definitions with semantic role labels. Sec-
ondly, we present three models, shown in Figure 2
based on the Variational Autoencoder (VAE) (Bow-
man et al.,2016) architecture for achieving disen-
tanglement.
2.1 Disentangling definitions
Definition semantic roles
Our framework is
based on natural language definitions, which are
a particular type of linguistic expression, charac-
terised by high abstraction, and specific phrasal
properties. Previous work in NLP for dictionary
definitions (Silva et al.,2018) has shown that there
are categories that can be consistently found in
most definitions. In fact, Silva et al. (2018) define
precise Semantic Role Labels (SRL) for phrases
representing definitions, under the name of Defini-
tion Semantic Roles (DSR).
The example from (Silva et al.,2018) classifies
the semantic roles within "english poets who lived
in the lake district" as follows. "poets" as noun
category (supertype), "english" as quality of the
term (Differentia Quality), "who lived" as event
that the subject is involved with (differentia event),
and "in the lake district" as the location of the action
(Event location). The full DSRs proposed by Silva
et al. (2018) are reported in Table 9in Appendix A.
Disentangling using SRL
Our goal is to enhance
disentanglement in natural language by injecting
categorical structures into latent variables. We find
that this goal is well aligned with the findings of Lo-
catello et al. (2019), where it is claimed that a
higher degree of disentanglement may benefit from
supervision and inductive biases. Our hypothesis
is that we may leverage such semantic information
for learning representation with higher degree of
disentanglement. While in the context of this work
we use dictionary definitions as a target empirical
setting, we conjecture that these conclusions can
be extended to broader definitional sentence-types.
The core intuition behind the approach is that the
supervision signal should increase the likelihood
of point clustering in regions corresponding to, or
related to the discrete supervision labels, given the
network architecture formulation.
2.2 Definition VAEs
Unsupervised VAE
The first training framework
that we consider is the traditional variational au-
toencoder (VAE) for sentences (Bowman et al.,
2016), which operates in an unsupervised fash-
ion, as in Figure 2a. The unsupervised VAE
employs a multivariate gaussian prior distribu-
tion
p(z)
and generates a sentence
x
with a de-
coder network
pθ(x|z)
. The joint distribution
for the decoder is defined as
p(z)pθ(x|z)
, which,
for a sequence of tokens
x
of length
T
result as
pθ(x|z) = QT
i=1 pθ(xi|x<i, z)
. The VAE objec-
tive consists into maximizing the expectation of the
log-likelihood which is defined as
Ep(x)log pθ(x)
.
Due to the computational intractability of the such
expectation value, the variational distribution
qθ
is
employed to approximate pθ(z|x).
As a result, an evidence lower bound
LVAE