
Dynamic Latent Separation for Deep Learning
Yi-Lin Tuan 1Zih-Yun Chiu 2William Yang Wang 1
Abstract
A core problem in machine learning is to learn
expressive latent variables for model prediction
on complex data that involves multiple sub-
components in a flexible and interpretable fashion.
Here, we develop an approach that improves ex-
pressiveness, provides partial interpretation, and
is not restricted to specific applications. The key
idea is to dynamically distance data samples in
the latent space and thus enhance the output di-
versity. Our dynamic latent separation method,
inspired by atomic physics, relies on the jointly
learned structures of each data sample, which also
reveal the importance of each sub-component for
distinguishing data samples. This approach, atom
modeling, requires no supervision of the latent
space and allows us to learn extra partially inter-
pretable representations besides the original goal
of a model. We empirically demonstrate that the
algorithm also enhances the performance of small
to larger-scale models in various classification and
generation problems.
1. Introduction
Deep neural networks with multiple hidden layers are
trained to be expressive models that learn complicated re-
lationships between their inputs and outputs (Srivastava
et al.,2014). Among various data types, data samples that
consist of many sub-units, such as images and texts, can
require models to be more expressive to consider nuanced
differences among sub-units. The demand for this delicacy
leads to developing large-scale and complex model architec-
tures (Vaswani et al.,2017), which cause drawbacks such
as compromised model interpretability (Ribeiro et al.,2016;
Bastani et al.,2017;Rudin,2019;Jain & Wallace,2019).
Various algorithms exist that improve model expressive-
ness not by advancing model architectures. For instance,
1
Department of Computer Science, University of California
Santa Barbara
2
Department of Electrical and Computer Engineer-
ing, University of California San Diego. Correspondence to: Yi-
Lin Tuan <ytuan@cs.ucsb.edu>.
Work in progress.
contrastive learning ameliorates classification expressive-
ness (Dosovitskiy et al.,2014;Chen et al.,2020) by pushing
away latent features from different classes. Vector quan-
tization tackles the expressiveness of autoencoders (Van
Den Oord et al.,2017) by learning discrete representations
using a preset codebook. As a separate effort, post-hoc
methods or designed models that follow self-explaining
protocol (Alvarez-Melis & Jaakkola,2018) reveal some un-
derlying reason for model behaviors. While these methods
show promising results in their bundled applications, it is
yet certain of their transferability and usefulness to other
applications. Meanwhile, it is yet underexplored of gener-
alizable training algorithms that can simultaneously help
expressiveness and uncover partial explanations.
We present a novel algorithm that simultaneously improves
model expressiveness, provides an interpretation of sub-
component importance, and is generalizable to multiple
applications. Our method, atom modeling, first maps the la-
tent representations of each sub-component in a data sample
to a learnable token importance and then dynamically dis-
tances data samples based on token importance using a loss
function inspired by Coulomb force (Coulomb,1785). After
training, token importance reveals which sub-components
in a data sample contribute to its semantic meaning and are
key to distinguishing itself from other data samples. The dy-
namic separation between data samples encourages a model
to predict diverse outputs, thus boosting expressiveness.
This method can be viewed as connecting sub-component
importance and inter-sample relationships to elevate im-
pacts from local details. A similar observation can be
found in atomic physics, where the balance distance be-
tween atoms, fundamental particles that form every matter
in nature, depends on the structure of sub-atomic particles in
each atom (Brown,2009;Halliday et al.,2013). In addition,
applying atom modeling in a neural network also amounts to
regularizing the representation space to preserve each data
sample’s uniqueness. Finally, atom modeling promotes ex-
pressiveness using a loss function with no latent supervision,
enabling it to be flexibly applied to different applications.
We demonstrate the utility of atom modeling objective
functions by training or finetuning convolution neural net-
works, generative adversarial networks, and transformers
on Gaussian mixtures, natural texts (CoLA, Poem), and nat-
1
arXiv:2210.03728v3 [cs.LG] 11 Feb 2024