
Preprint
FROM POINTS TO FUNCTIONS:
INFINITE-DIMENSIONAL REPRESENTATIONS IN
DIFFUSION MODELS
Sarthak Mittal†1,2, Guillaume Lajoie1,2, Stefan Bauer5, Arash Mehrjou3,4
1Mila, 2Universit´
e de Montr´
eal, 3MPI-IS, 4ETH Zurich, 5KTH Stockholm
ABSTRACT
Diffusion-based generative models learn to iteratively transfer unstructured noise
to a complex target distribution as opposed to Generative Adversarial Networks
(GANs) or the decoder of Variational Autoencoders (VAEs) which produce sam-
ples from the target distribution in a single step. Thus, in diffusion models ev-
ery sample is naturally connected to a random trajectory which is a solution to a
learned stochastic differential equation (SDE). Generative models are only con-
cerned with the final state of this trajectory that delivers samples from the desired
distribution. Abstreiter et al. (2021) showed that these stochastic trajectories can
be seen as continuous filters that wash out information along the way. Conse-
quently, it is reasonable to ask if there is an intermediate time step at which the
preserved information is optimal for a given downstream task. In this work, we
show that a combination of information content from different time steps gives a
strictly better representation for the downstream task. We introduce an attention
and recurrence based modules that “learn to mix” information content of various
time-steps such that the resultant representation leads to superior performance in
downstream tasks.1
1 INTRODUCTION
A lot of the progress in Machine Learning hinges on learning good representations of the data,
whether in supervised or unsupervised fashion. Typically in the absence of label information, learn-
ing a good representation is often guided by reconstruction of the input, as is the case with autoen-
coders and generative models like variational autoencoders (Vincent et al.,2010;Kingma & Welling,
2013;Rezende et al.,2014); or by some notion of invariance to certain transformations like in Con-
trastive Learning and similar approaches (Chen et al.,2020b;d;Grill et al.,2020). In this work, we
analyze a novel way of representation learning which was introduced in Abstreiter et al. (2021) with
a denoising objective using diffusion based models to obtain unbounded representations.
Diffusion-based models (Sohl-Dickstein et al.,2015;Song et al.,2020;2021;Sajjadi et al.,2018;
Niu et al.,2020;Cai et al.,2020;Chen et al.,2020a;Saremi et al.,2018;Dhariwal & Nichol,
2021;Luhman & Luhman,2021;Ho et al.,2021;Mehrjou et al.,2017) are generative models that
leverage step-wise perturbations to the samples of the data distribution (eg. CIFAR10), modeled
via a Stochastic Differential Equation (SDE), until convergence to an unstructured distribution (eg.
N(0,I)) called, in this context, the prior distribution. In contrast to this diffusion process, a “score
model” is learned to approximate the reverse process that iteratively converges to the data distribu-
tion starting from the prior distribution. Beyond the generative modelling capacity of score-based
models, we instead use the additionally encoded representations to perform inference tasks, such as
classification.
In this work, we revisit the formulation provided by Abstreiter et al. (2021); Preechakul et al. (2022)
which augments such diffusion-based systems with an encoder for performing representation learn-
ing which can be used for downstream tasks. In particular, we look at the infinite-dimensional
representation learning methodology from Abstreiter et al. (2021) and perform a deeper dive into
†Correspondence author sarthmit@gmail.com
1Open-sourced implementation is available at https://github.com/sarthmit/traj drl
1
arXiv:2210.13774v1 [cs.LG] 25 Oct 2022