Efﬁcient Data Mosaicing with Simulation-based Inference Andrew Gambardella1Youngjun Choi1Doyo Choi1and Jinjoon Lee1 1Graduate School of Culture Technology KAIST

2025-05-03 0 0 5.13MB 9 页 10玖币

侵权投诉

Efﬁcient Data Mosaicing with Simulation-based Inference

Andrew Gambardella1,Youngjun Choi1,Doyo Choi1and Jinjoon Lee1

1Graduate School of Culture Technology, KAIST

{atgambardella, youngjun.choi, doyochoi, jinjoon.lee}@kaist.ac.kr

Abstract

We introduce an efﬁcient algorithm for general data

mosaicing, based on the simulation-based infer-

ence paradigm. Our algorithm takes as input a tar-

get datum, source data, and partitions of the target

and source data into fragments, learning distribu-

tions over averages of fragments of the source data

such that samples from those distributions approx-

imate fragments of the target datum. We utilize a

model that can be trivially parallelized in conjunc-

tion with the latest advances in efﬁcient simulation-

based inference in order to ﬁnd approximate poste-

riors fast enough for use in practical applications.

We demonstrate our technique is effective in both

audio and image mosaicing problems.

1 Introduction

Among post-structuralist texts, the 1980 book “A Thousand

Plateaus: Capitalism and Schizophrenia” by Deleuze and

Guattari stands out as a seminal experimental work of phi-

losophy, dealing with a wide range of topics originating from

the natural world. On the topic of language, the authors state

the following:

...relatively few linguists have analyzed the neces-

sarily social character of enunciation...The social

character of enunciation is intrinsically founded

only if one succeeds in demonstrating how enun-

ciation in itself implies collective assemblages. It

then becomes clear that the statement is individu-

ated, and enunciation subjectiﬁed, only to the ex-

tent that an impersonal collective assemblage re-

quires it and determines it to be so...every state-

ment of a collective assemblage of enunciation be-

longs to indirect discourse...Direct discourse is a

detached fragment of a mass and is born of the

dismemberment of the collective assemblage; but

the collective assemblage is always like the mur-

mur from which I take my proper name, the con-

stellation of voices, concordant or not, from which

I draw my voice [Deleuze and Guattari, 1980].

Whereas philosophers such as Deleuze describe the world

in natural language, artists evoke the world itself through me-

dia. The most natural tool with which a contemporary me-

dia artist could evoke the ideas espoused in this quote would

be a mosaic, directly constructing a collective assemblage of

“voices” which materialize into a so-called novel “subjec-

tiﬁed enunciation” directly from the aggregate of detached

fragments of “direct discourse.” In more common parlance,

this means ﬁnding, from a set of source data, fragments of

the data which could be rearranged and overlapped so as to

approximate an entirely different target datum. Such a mo-

saic would serve as a metaphor for how linguistic meaning

is created in a social manner, and that all enunciations of

natural language that can be understood must be inherited

from other enunciations which others had heard and under-

stood previously, an idea which has resonated throughout

Western philosophy for centuries, notably having been used

to humorous effect by Humpty Dumpty in Lewis Carroll’s

1871 novel “Through the Looking-Glass” [Carroll, 1871]and

expanded upon in Wittgenstein’s “Philosophical Investiga-

tions” [Wittgenstein, 1953]and Davidson’s “A Nice Derange-

ment of Epitaphs” [Davidson, 1986].

The algorithmic art community has developed a number of

tools and approaches to mosaicing in many different modal-

ities. As a further contribution to this ﬁeld, in this paper

we propose a generalized and data-agnostic approach to mo-

saicing by approaching data mosaicing as a Bayesian infer-

ence problem. Recent advances in probabilistic program-

ming [van de Meent et al., 2018]and simulation-based in-

ference [Cranmer et al., 2020]allow for statisticians to write

a stochastic model for mosaicing naturally in a programming

language, using inference techniques to condition on a target

datum and discover a posterior over traces (i.e., runs of the

model) such that in the aggregate, samples from the posterior

produce output which closely approximates the target datum.

This process requires only the model speciﬁcation and the

veriﬁcation of inference results, a simple task relative to the

prohibitively daunting requirements imposed when creating a

mosaic out of many overlapping components in an interactive

computer-aided setup as is done commonly.

The main contributions of this work are as follows:

• We introduce a stochastic model for data generation

through mosaicing via simple averaging.

• We show that this model can be implemented in a prob-

abilistic programming language and effectively condi-

arXiv:2210.14602v2 [cs.SD] 1 Feb 2023

Figure 1: JFK-MM by Adam Finkelstein and Sandy Farrier. A pho-

tographic mosaic of John F. Kennedy made from parts of Marilyn

Monroe pictures which was exhibited in the Xerox PARC Algorith-

mic Art Show in 1994. Our method uses simulation-based inference

in order to create similar mosaics using arbitrary data.

tioned on a real datum which did not originate from the

model, allowing us to discover an interpretable and dis-

entangled representation of arbitrary data as mixtures of

other data in the form of a posterior distribution, from

which we can sample a potentially limitless collection

of mosaics for any given target datum.

• We show how recent advances in simulation-based in-

ference allow for one to perform inference efﬁciently,

creating mosaics many thousands of times faster than a

naive baseline approach.

• We demonstrate that one singular mosaicing model can

be applied to multiple modalities in a data-agnostic man-

ner, showing numerous experiments in both audio and

image mosaicing.

• We demonstrate that our model produces qualitatively

good results even in extremely low compute regimes.

2 Preliminaries

2.1 Simulation-based Inference

Simulation-based inference refers to a collection of tech-

niques used to perform Bayesian inference over stochastic la-

tent variables in a computer program. Typically these pro-

grams are written using a probabilistic programming lan-

guage which allows for this inference procedure to be done

automatically, using approximate inference algorithms which

are specially tuned to work over runs of computer programs.

Each run of a computer program, used to form empirical dis-

tributions over its random variables, is referred to as a trace

in the probabilistic programming literature.

Simulation-based inference techniques are rooted in the

theory of Approximate Bayesian Computation (ABC) [Mar-

joram et al., 2003; Wilkinson, 2013], in which the likeli-

hood is typically not computed directly. In ABC, multiple

traces of the simulator are aggregated into an empirical pos-

terior, with traces being accepted or given high weight when

their observed variables align closely with a target observa-

tion (where “closeness” is deﬁned via a tolerance hyperpa-

rameter), and rejected or given low weight otherwise. Several

different approximate inference algorithms can be used un-

der this framework, but the present work examines the use of

Markov chain Monte Carlo (MCMC) [Wingate et al., 2011;

van de Meent et al., 2018]due to its veriﬁable convergence

guarantees.

2.2 Markov Chain Monte Carlo (MCMC)

Markov chain Monte Carlo (MCMC) is a class of algorithms

that allow for sampling from a target probability distribu-

tion which cannot be determined analytically. These algo-

rithms all involve the construction of a Markov chain which

has the target probability distribution as its equilibrium dis-

tribution, but each algorithm differs in its construction of this

chain. The most basic of these algorithms, which is com-

monly implemented in probabilistic programming libraries,

is known as random walk Metropolis-Hastings [Metropolis

et al., 1953; Hastings, 1970; Wingate et al., 2011], which

involves ﬁnding new model parameters at each step via a

random walk, and accepting or rejecting these new param-

eters based on how closely the model outputs match the

observation relative to the previous model parameters in

the chain. A much more effective, but much harder to

implement, variation of Metropolis-Hastings can be found

in Hamiltonian Monte Carlo (HMC) [Duane et al., 1987;

Neal, 1996], which proposes new moves in state space with

an approximate Hamiltonian dynamics simulator. The advan-

tage of this approach over random walk Metropolis-Hastings

is that successive moves in state space using HMC are much

less correlated with previous states than they would be us-

ing random walk Metropolis-Hastings. This in turn leads to a

drastic reduction in the number of forward runs of the model

that are needed both during the warmup stage (i.e., before the

chain has converged) and when collecting posterior samples,

ultimately leading to convergence many orders of magnitude

faster in terms of wall-clock time.

3 Prior Work

Data mosaicing, particularly photographic mosaicing, has

been one of the mainstays of algorithmic art for decades. The

ﬁrst algorithmic photographic mosaic exhibited as art was

most likely JFK-MM by Adam Finkelstein and Sandy Far-

rier, reproduced in Figure 1. These early methods, includ-

ing extensions to the video domain, relied heavily on color

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

EfcientDataMosaicingwithSimulation-basedInferenceAndrewGambardella1,YoungjunChoi1,DoyoChoi1andJinjoonLee11GraduateSchoolofCultureTechnology,KAISTfatgambardella,youngjun.choi,doyochoi,jinjoon.leeg@kaist.ac.krAbstractWeintroduceanefcientalgorithmforgeneraldatamosaicing,basedonthesimulation-basedinfe...

展开>> 收起<<

Efﬁcient Data Mosaicing with Simulation-based Inference Andrew Gambardella1Youngjun Choi1Doyo Choi1and Jinjoon Lee1 1Graduate School of Culture Technology KAIST.pdf

共9页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Efﬁcient Data Mosaicing with Simulation-based Inference Andrew Gambardella1Youngjun Choi1Doyo Choi1and Jinjoon Lee1 1Graduate School of Culture Technology KAIST

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: