Efficient Data Mosaicing with Simulation-based Inference
Andrew Gambardella1,Youngjun Choi1,Doyo Choi1and Jinjoon Lee1
1Graduate School of Culture Technology, KAIST
{atgambardella, youngjun.choi, doyochoi, jinjoon.lee}@kaist.ac.kr
Abstract
We introduce an efficient algorithm for general data
mosaicing, based on the simulation-based infer-
ence paradigm. Our algorithm takes as input a tar-
get datum, source data, and partitions of the target
and source data into fragments, learning distribu-
tions over averages of fragments of the source data
such that samples from those distributions approx-
imate fragments of the target datum. We utilize a
model that can be trivially parallelized in conjunc-
tion with the latest advances in efficient simulation-
based inference in order to find approximate poste-
riors fast enough for use in practical applications.
We demonstrate our technique is effective in both
audio and image mosaicing problems.
1 Introduction
Among post-structuralist texts, the 1980 book “A Thousand
Plateaus: Capitalism and Schizophrenia” by Deleuze and
Guattari stands out as a seminal experimental work of phi-
losophy, dealing with a wide range of topics originating from
the natural world. On the topic of language, the authors state
the following:
...relatively few linguists have analyzed the neces-
sarily social character of enunciation...The social
character of enunciation is intrinsically founded
only if one succeeds in demonstrating how enun-
ciation in itself implies collective assemblages. It
then becomes clear that the statement is individu-
ated, and enunciation subjectified, only to the ex-
tent that an impersonal collective assemblage re-
quires it and determines it to be so...every state-
ment of a collective assemblage of enunciation be-
longs to indirect discourse...Direct discourse is a
detached fragment of a mass and is born of the
dismemberment of the collective assemblage; but
the collective assemblage is always like the mur-
mur from which I take my proper name, the con-
stellation of voices, concordant or not, from which
I draw my voice [Deleuze and Guattari, 1980].
Whereas philosophers such as Deleuze describe the world
in natural language, artists evoke the world itself through me-
dia. The most natural tool with which a contemporary me-
dia artist could evoke the ideas espoused in this quote would
be a mosaic, directly constructing a collective assemblage of
“voices” which materialize into a so-called novel “subjec-
tified enunciation” directly from the aggregate of detached
fragments of “direct discourse.” In more common parlance,
this means finding, from a set of source data, fragments of
the data which could be rearranged and overlapped so as to
approximate an entirely different target datum. Such a mo-
saic would serve as a metaphor for how linguistic meaning
is created in a social manner, and that all enunciations of
natural language that can be understood must be inherited
from other enunciations which others had heard and under-
stood previously, an idea which has resonated throughout
Western philosophy for centuries, notably having been used
to humorous effect by Humpty Dumpty in Lewis Carroll’s
1871 novel “Through the Looking-Glass” [Carroll, 1871]and
expanded upon in Wittgenstein’s “Philosophical Investiga-
tions” [Wittgenstein, 1953]and Davidson’s “A Nice Derange-
ment of Epitaphs” [Davidson, 1986].
The algorithmic art community has developed a number of
tools and approaches to mosaicing in many different modal-
ities. As a further contribution to this field, in this paper
we propose a generalized and data-agnostic approach to mo-
saicing by approaching data mosaicing as a Bayesian infer-
ence problem. Recent advances in probabilistic program-
ming [van de Meent et al., 2018]and simulation-based in-
ference [Cranmer et al., 2020]allow for statisticians to write
a stochastic model for mosaicing naturally in a programming
language, using inference techniques to condition on a target
datum and discover a posterior over traces (i.e., runs of the
model) such that in the aggregate, samples from the posterior
produce output which closely approximates the target datum.
This process requires only the model specification and the
verification of inference results, a simple task relative to the
prohibitively daunting requirements imposed when creating a
mosaic out of many overlapping components in an interactive
computer-aided setup as is done commonly.
The main contributions of this work are as follows:
• We introduce a stochastic model for data generation
through mosaicing via simple averaging.
• We show that this model can be implemented in a prob-
abilistic programming language and effectively condi-
arXiv:2210.14602v2 [cs.SD] 1 Feb 2023