Efficient Data Mosaicing with Simulation-based Inference Andrew Gambardella1Youngjun Choi1Doyo Choi1and Jinjoon Lee1 1Graduate School of Culture Technology KAIST

2025-05-03 0 0 5.13MB 9 页 10玖币
侵权投诉
Efficient Data Mosaicing with Simulation-based Inference
Andrew Gambardella1,Youngjun Choi1,Doyo Choi1and Jinjoon Lee1
1Graduate School of Culture Technology, KAIST
{atgambardella, youngjun.choi, doyochoi, jinjoon.lee}@kaist.ac.kr
Abstract
We introduce an efficient algorithm for general data
mosaicing, based on the simulation-based infer-
ence paradigm. Our algorithm takes as input a tar-
get datum, source data, and partitions of the target
and source data into fragments, learning distribu-
tions over averages of fragments of the source data
such that samples from those distributions approx-
imate fragments of the target datum. We utilize a
model that can be trivially parallelized in conjunc-
tion with the latest advances in efficient simulation-
based inference in order to find approximate poste-
riors fast enough for use in practical applications.
We demonstrate our technique is effective in both
audio and image mosaicing problems.
1 Introduction
Among post-structuralist texts, the 1980 book “A Thousand
Plateaus: Capitalism and Schizophrenia” by Deleuze and
Guattari stands out as a seminal experimental work of phi-
losophy, dealing with a wide range of topics originating from
the natural world. On the topic of language, the authors state
the following:
...relatively few linguists have analyzed the neces-
sarily social character of enunciation...The social
character of enunciation is intrinsically founded
only if one succeeds in demonstrating how enun-
ciation in itself implies collective assemblages. It
then becomes clear that the statement is individu-
ated, and enunciation subjectified, only to the ex-
tent that an impersonal collective assemblage re-
quires it and determines it to be so...every state-
ment of a collective assemblage of enunciation be-
longs to indirect discourse...Direct discourse is a
detached fragment of a mass and is born of the
dismemberment of the collective assemblage; but
the collective assemblage is always like the mur-
mur from which I take my proper name, the con-
stellation of voices, concordant or not, from which
I draw my voice [Deleuze and Guattari, 1980].
Whereas philosophers such as Deleuze describe the world
in natural language, artists evoke the world itself through me-
dia. The most natural tool with which a contemporary me-
dia artist could evoke the ideas espoused in this quote would
be a mosaic, directly constructing a collective assemblage of
“voices” which materialize into a so-called novel “subjec-
tified enunciation” directly from the aggregate of detached
fragments of “direct discourse. In more common parlance,
this means finding, from a set of source data, fragments of
the data which could be rearranged and overlapped so as to
approximate an entirely different target datum. Such a mo-
saic would serve as a metaphor for how linguistic meaning
is created in a social manner, and that all enunciations of
natural language that can be understood must be inherited
from other enunciations which others had heard and under-
stood previously, an idea which has resonated throughout
Western philosophy for centuries, notably having been used
to humorous effect by Humpty Dumpty in Lewis Carroll’s
1871 novel “Through the Looking-Glass” [Carroll, 1871]and
expanded upon in Wittgenstein’s “Philosophical Investiga-
tions” [Wittgenstein, 1953]and Davidson’s “A Nice Derange-
ment of Epitaphs” [Davidson, 1986].
The algorithmic art community has developed a number of
tools and approaches to mosaicing in many different modal-
ities. As a further contribution to this field, in this paper
we propose a generalized and data-agnostic approach to mo-
saicing by approaching data mosaicing as a Bayesian infer-
ence problem. Recent advances in probabilistic program-
ming [van de Meent et al., 2018]and simulation-based in-
ference [Cranmer et al., 2020]allow for statisticians to write
a stochastic model for mosaicing naturally in a programming
language, using inference techniques to condition on a target
datum and discover a posterior over traces (i.e., runs of the
model) such that in the aggregate, samples from the posterior
produce output which closely approximates the target datum.
This process requires only the model specification and the
verification of inference results, a simple task relative to the
prohibitively daunting requirements imposed when creating a
mosaic out of many overlapping components in an interactive
computer-aided setup as is done commonly.
The main contributions of this work are as follows:
• We introduce a stochastic model for data generation
through mosaicing via simple averaging.
We show that this model can be implemented in a prob-
abilistic programming language and effectively condi-
arXiv:2210.14602v2 [cs.SD] 1 Feb 2023
Figure 1: JFK-MM by Adam Finkelstein and Sandy Farrier. A pho-
tographic mosaic of John F. Kennedy made from parts of Marilyn
Monroe pictures which was exhibited in the Xerox PARC Algorith-
mic Art Show in 1994. Our method uses simulation-based inference
in order to create similar mosaics using arbitrary data.
tioned on a real datum which did not originate from the
model, allowing us to discover an interpretable and dis-
entangled representation of arbitrary data as mixtures of
other data in the form of a posterior distribution, from
which we can sample a potentially limitless collection
of mosaics for any given target datum.
We show how recent advances in simulation-based in-
ference allow for one to perform inference efficiently,
creating mosaics many thousands of times faster than a
naive baseline approach.
We demonstrate that one singular mosaicing model can
be applied to multiple modalities in a data-agnostic man-
ner, showing numerous experiments in both audio and
image mosaicing.
We demonstrate that our model produces qualitatively
good results even in extremely low compute regimes.
2 Preliminaries
2.1 Simulation-based Inference
Simulation-based inference refers to a collection of tech-
niques used to perform Bayesian inference over stochastic la-
tent variables in a computer program. Typically these pro-
grams are written using a probabilistic programming lan-
guage which allows for this inference procedure to be done
automatically, using approximate inference algorithms which
are specially tuned to work over runs of computer programs.
Each run of a computer program, used to form empirical dis-
tributions over its random variables, is referred to as a trace
in the probabilistic programming literature.
Simulation-based inference techniques are rooted in the
theory of Approximate Bayesian Computation (ABC) [Mar-
joram et al., 2003; Wilkinson, 2013], in which the likeli-
hood is typically not computed directly. In ABC, multiple
traces of the simulator are aggregated into an empirical pos-
terior, with traces being accepted or given high weight when
their observed variables align closely with a target observa-
tion (where “closeness” is defined via a tolerance hyperpa-
rameter), and rejected or given low weight otherwise. Several
different approximate inference algorithms can be used un-
der this framework, but the present work examines the use of
Markov chain Monte Carlo (MCMC) [Wingate et al., 2011;
van de Meent et al., 2018]due to its verifiable convergence
guarantees.
2.2 Markov Chain Monte Carlo (MCMC)
Markov chain Monte Carlo (MCMC) is a class of algorithms
that allow for sampling from a target probability distribu-
tion which cannot be determined analytically. These algo-
rithms all involve the construction of a Markov chain which
has the target probability distribution as its equilibrium dis-
tribution, but each algorithm differs in its construction of this
chain. The most basic of these algorithms, which is com-
monly implemented in probabilistic programming libraries,
is known as random walk Metropolis-Hastings [Metropolis
et al., 1953; Hastings, 1970; Wingate et al., 2011], which
involves finding new model parameters at each step via a
random walk, and accepting or rejecting these new param-
eters based on how closely the model outputs match the
observation relative to the previous model parameters in
the chain. A much more effective, but much harder to
implement, variation of Metropolis-Hastings can be found
in Hamiltonian Monte Carlo (HMC) [Duane et al., 1987;
Neal, 1996], which proposes new moves in state space with
an approximate Hamiltonian dynamics simulator. The advan-
tage of this approach over random walk Metropolis-Hastings
is that successive moves in state space using HMC are much
less correlated with previous states than they would be us-
ing random walk Metropolis-Hastings. This in turn leads to a
drastic reduction in the number of forward runs of the model
that are needed both during the warmup stage (i.e., before the
chain has converged) and when collecting posterior samples,
ultimately leading to convergence many orders of magnitude
faster in terms of wall-clock time.
3 Prior Work
Data mosaicing, particularly photographic mosaicing, has
been one of the mainstays of algorithmic art for decades. The
first algorithmic photographic mosaic exhibited as art was
most likely JFK-MM by Adam Finkelstein and Sandy Far-
rier, reproduced in Figure 1. These early methods, includ-
ing extensions to the video domain, relied heavily on color
摘要:

EfcientDataMosaicingwithSimulation-basedInferenceAndrewGambardella1,YoungjunChoi1,DoyoChoi1andJinjoonLee11GraduateSchoolofCultureTechnology,KAISTfatgambardella,youngjun.choi,doyochoi,jinjoon.leeg@kaist.ac.krAbstractWeintroduceanefcientalgorithmforgeneraldatamosaicing,basedonthesimulation-basedinfe...

展开>> 收起<<
Efficient Data Mosaicing with Simulation-based Inference Andrew Gambardella1Youngjun Choi1Doyo Choi1and Jinjoon Lee1 1Graduate School of Culture Technology KAIST.pdf

共9页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:9 页 大小:5.13MB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 9
客服
关注