A Distributed Block-Split Gibbs Sampler with Hypergraph Structure for High-Dimensional Inverse Problems

2025-04-27 0 0 1.89MB 42 页 10玖币

侵权投诉

A Distributed Block-Split Gibbs Sampler with

Hypergraph Structure for High-Dimensional

Inverse Problems

P.-A. Thouvenin‡, A. Repetti†⋆, P. Chainais‡*

‡Universit´

e de Lille, CNRS, Centrale Lille, UMR 9189 CRIStAL, F-59000 Lille, France

†School of Mathematics and Computer Sciences, Heriot-Watt University, Edinburgh, UK

⋆School of Engineering and Physical Sciences, Heriot-Watt University, Edinburgh, UK

{pierre-antoine.thouvenin, pierre.chainais}@centralelille.fr, a.repetti@hw.ac.uk

Abstract

Sampling-based algorithms are classical approaches to perform Bayesian inference in in-

verse problems. They provide estimators with the associated credibility intervals to quantify the

uncertainty on the estimators. Although these methods hardly scale to high dimensional prob-

lems, they have recently been paired with optimization techniques, such as proximal and split-

ting approaches, to address this issue. Such approaches pave the way to distributed samplers,

splitting computations to make inference more scalable and faster. We introduce a distributed

Split Gibbs sampler (SGS) to efﬁciently solve such problems involving distributions with multi-

ple smooth and non-smooth functions composed with linear operators. The proposed approach

leverages a recent approximate augmentation technique reminiscent of primal-dual optimiza-

tion methods. It is further combined with a block-coordinate approach to split the primal and

dual variables into blocks, leading to a distributed block-coordinate SGS. The resulting algo-

rithm exploits the hypergraph structure of the involved linear operators to efﬁciently distribute

the variables over multiple workers under controlled communication costs. It accommodates

several distributed architectures, such as the Single Program Multiple Data and client-server

architectures. Experiments on a large image deblurring problem show the performance of

the proposed approach to produce high quality estimates with credibility intervals in a small

amount of time. Supplementary material to reproduce the experiments is available online.

Keywords: MCMC algorithm, Bayesian inference, block-coordinate algorithm, distributed archi-

tecture, high dimensional imaging inverse problems

*PC and PAT acknowledge support from the ANR project “Chaire IA Sherlock” ANR-20-CHIA-0031-01, the programme

d’investissements d’avenir ANR-16-IDEX-0004 ULNE and R´

egion HDF. AR acknowledges support from the Royal Society

of Edinburgh.

arXiv:2210.02341v4 [stat.ME] 26 Nov 2023

1 Introduction

This work focuses on sampling from a generic distribution of the form

π(x)∝exp(−h(x)−f(x)−g(Dx)),(1.1)

where h:H→]− ∞,+∞]is a Lipschitz-differentiable function, f:H→]−∞,+∞]and g:G→

]−∞,+∞]are possibly non-smooth functions, and D:H→Gis a linear operator. Such a distribu-

tion typically arises as the posterior distribution involved in imaging inverse problems, from which

Bayesian estimators need to be formed [26]. In the remainder, we will consider that H=RN

and G=RM, where Nand Mare very large. Sampling from (1.1) is challenging due to (i) the

presence of the composite function g◦Dand (ii) the large dimension of Hand G. To address

these issues, this paper proposes a distributed MCMC algorithm which (i) leverages the approx-

imate augmentation AXDA [37] to decouple (split) functions involved in (1.1), and (ii) exploits

the structure of (1.1) to design a distributed sampler reminiscent of block-coordinate approaches

in optimization. We brieﬂy discuss recent splitting-based distributed samplers in Section 1.1, and

highlight some of their limitations. Scalable splitting optimization approaches are reviewed in Sec-

tion 1.2 to motivate this paper. The proposed approach, which takes further inspiration from the

optimization literature, is outlined in Section 1.3.

1.1 Sampling methods: splitting and distributed techniques

Markov chain Monte Carlo (MCMC) algorithms are generic approaches providing estimates with

associated credibility intervals [29]. They aim to generate a Markov chain that yields samples

from the target distribution (1.1) in the stationary regime. Nevertheless, they are often considered

computationally too expensive to handle high dimensional problems, especially when composite

functions are involved. This is often the case with inverse problems in image processing. Over

the last decade, many authors have proposed more versatile and scalable optimization-inspired

MCMC algorithms [16,25,31]. These approaches exploit quantities repeatedly used in optimiza-

tion to efﬁciently explore high dimensional parameter spaces, most often gradients and proximal

operators1.

A splitting approach based on an asymptotically exact data augmentation (AXDA) has also re-

cently been proposed by [36,37]. Inspired by splitting optimization approaches [20], AXDA intro-

duces auxiliary variables to split composite distributions. The density (1.1) is then approximated

π(α,β)(x,z,u)∝exp −h(x)−f(x)−g(z)−ϕα(Dx,z−u)−ψβ(u),(1.2)

1The proximal operator of a proper, lower semi-continuous function f:RN→]− ∞,+∞]is deﬁned for any y∈RN

by [18]: proxf(y) = argmin

x∈RNf(x) + ∥x−y∥2

2/2.

where ϕα:G×G→]− ∞,+∞],ψβ:G→]− ∞,+∞],αcontrols the discrepancy between Dx

and z−u, and βis an augmentation parameter. The variable zis to be interpreted as a splitting

variable, and uis an additional augmentation parameter. The role of ϕαis to strongly couple

Dx and z−u, while ψβkeeps usmall enough. This latter parameter is not mandatory, but

improves the mixing properties of the sampler by a further decoupling between Dx and z[36].

For appropriate choices of ϕαand ψβ, the marginal distribution of xwith respect to (1.2) converges

to the target distribution (1.1) as (α, β)→(0,0). A Gibbs sampler is proposed in [36,37] to draw

samples from (1.2), referred to as the split Gibbs sampler (SGS). The additional cost of adding

new variables is compensated by the beneﬁt of this divide-to-conquer strategy. Further details

on this splitting technique are provided in Section 2.1. Designing efﬁcient algorithms to handle

distributions of the form (1.1) becomes even more challenging when potentially many composite

functions are considered. An extension of SGS has been considered [36,28] for distributions

involving C∈N∗composite terms, with a density of the form

π(x)∝exp −h(x)−f(x)−

c=1

gc(Dcx),(1.3)

where for every c∈ {1, . . . , C},Gc=RMc,M=PC

c=1 Mc,Dc:H→Gcand gc:Gc→]−∞,+∞].

Applying AXDA to (1.3) leads to an approximation with density

π(α,β)x,(zc,uc)1⩽c⩽C

∝exp −h(x)−f(x)−

c=1 gc(zc) + ϕc,αc(Dcx,zc−uc) + ψc,βc(uc),(1.4)

where (zc,uc)1⩽c⩽Care auxiliary variables and, for c∈ {1, . . . , C},ϕc,αc:Gc×Gc→]−∞,+∞]

and ψc,βc:Gc→]−∞,+∞]. The variables (zc,uc)1⩽c⩽Care conditionally independent, paving the

way to a distributed implementation on a client-server architecture.

Only a few distributed samplers have been proposed in the literature [28,36,37]. However,

these samplers only focus on distributing the splitting variables associated with the composite

functions gc◦Dcfrom (1.3), without decomposing the global variable xinto blocks. In particular,

[28] propose a consensus-based approximate posterior distribution, addressed with a distributed

Metropolis-within-Gibbs sampler relying on a client-server architecture. The sampler exploits the

conditional independence between the splitting variables to parallelize computations. This is es-

pecially relevant for data-distributed applications, in which a shared parameter value needs to be

inferred from a dataset distributed over multiple workers, e.g., for distributed logistic regression.

However, this setting has several drawbacks when turning to high dimensional problems. First,

the consensus constraint necessitates to duplicate the variables of interest on the different work-

ers. Second, the client-server architecture may induce communication bottlenecks, as all (clients)

workers need to communicate with the server. It also exhibits limitations in terms of distribution

ﬂexibility, since it separates composite functions only, without splitting the high dimensional global

variable of interest xinto blocks. In high dimensions, it may be of interest to split the variable x

itself, not only the data or the composite functions.

To summarize, splitting techniques have been introduced in sampling methods. They have per-

mitted to handle multiple composite functions in parallel. Distributed versions have also been

proposed in the literature, but restricted to client-server approaches. The client-server architec-

ture can be critical in a high dimensional setting. In addition, none of these methods considered

splitting the high-dimensional variable of interest xinto blocks. Block-coordinate approaches can

be necessary in practice to handle high-dimensional problems, e.g., when xis an image with more

than 106unknown parameters. Since the distributed split Gibbs sampler proposed in this work are

essentially inspired by optimization approaches, the next paragraph reviews methods from the op-

timization literature that combine all at once composite-function splitting, variable splitting (i.e.,

block-coordinate approaches) and distributed techniques.

1.2 Splitting, distributed and block-coordinate methods in optimization

Optimization-based inference consists in estimating a mode of the distribution (1.1) (e.g., the

maximum a posteriori (MAP) estimator), deﬁned as a solution to

minimize

x∈Hh(x) + f(x) + g(Dx).(1.5)

Optimization algorithms aim to build sequences that asymptotically converges to a solution to

problem (1.5). Problem (1.5) can be efﬁciently solved with proximal primal-dual methods [20,

13,38], that can be seen as the optimization counterpart of the splitting approaches discussed in

Section 1.1. Similarly to AXDA, primal-dual methods rely on an auxiliary variable z∈Gassociated

with the composite function g◦D. The variable x∈His referred to as the primal variable

while zis called the dual variable, since it is associated with the dual space Ginduced by the

operator D. It is worth noticing that the SGS approach developed by [36] was directly inspired

by such an optimization splitting technique. Splitting proximal algorithms beneﬁt from many

acceleration techniques (e.g., inertia [2,24], preconditioning [7,12]), they are versatile, and

scalable. In particular, they are highly parallelizable, and can be efﬁciently distributed to split the

computational cost per iteration, under well-established theoretical guarantees [1,10,20].

Designing efﬁcient algorithms to handle distributions of the form (1.3) becomes more challeng-

ing when multiple composite functions are involved, especially when the dimensions of xand y

increase. In this context, the counterpart of problem (1.3) is given by

minimize

x∈Hh(x) + f(x) +

c=1

gc(Dcx).(1.6)

Primal-dual proximal algorithms [13,38,20,27,9] permit to handle composite terms in (1.6) in

parallel in the dual domains induced by operators (Dc)1⩽c⩽C. This is possible since dual variables

can be distributed on multiple workers.

Further, to address problems with high dimensional variable x, a usual strategy in optimization

consists in splitting xinto blocks (xk)1⩽k⩽K, and either alternate between the blocks [4,8,22,34],

Figure 1: Example of the block-sparse structure of the matrix Dinvolved in (1.1). The columns of Dare

split into Kcontiguous blocks (black dashed lines), with a small overlap compared to the size of the blocks.

The orange dashed rectangles highlight the subparts of Dimplemented on each worker k. These act on a

parameter block stored on worker k, and a few parameters stored on worker k+ 1.

or distribute the blocks over multiple workers [11,27]. In particular, [27] combine such block-

coordinate approaches with primal-dual splitting techniques to parallelize and distribute both the

primal and dual variables. Then, the associated minimization problem is of the form

minimize

x=(xk)1⩽k⩽K∈H

k=1

hk(xk) + fk(xk) +

c=1

gc(

k=1

Dc,kxk),(1.7)

where, for every k∈ {1, . . . , K},hkand fkonly act on the k-th block of the variable x∈H. Such

a formulation is of particular interest when considering block-sparse matrices Dc, as illustrated

in Figure 1. Finally, [27] also developed asynchronous distributed algorithms over hypergraphs2

structures by combining the resulting block-coordinate primal-dual algorithms with consensus con-

straints that impose all (xk)1⩽k⩽Kto be equal. This strategy provides a high ﬂexibility in the choice

of the distribution architecture. Section 3.2 will detail the interest of hypergraphs. Unfortunately,

these algorithms only provide a point estimate, without additional information. In absence of

ground truth, these approaches do not directly quantify the uncertainty over the estimate. The

proposed approach will focus on drawing samples from the distribution corresponding to (1.7)

using a distributed architecture.

1.3 Proposed distributed block-coordinate SGS

This work focuses on cases where xbelongs to a high dimensional space, for instance for Nlarger

than 106. We introduce a distributed block-coordinate SGS to sample from (1.3) by splitting the

global variable of interest xinto blocks (xk)1⩽k⩽K. This approach will also be able to handle mul-

tiple composite functions as in (1.7). To this aim, we pair SGS with optimization-inspired MCMC

transition kernels [16,25,31] to sample from conditional distributions that would otherwise be

either intractable or challenging to handle in a distributed setting.

2Hypergraphs generalize structures of graphs, where edges can connect multiple nodes (i.e., variables) together,

hence generalizing communications between variables.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ADistributedBlock-SplitGibbsSamplerwithHypergraphStructureforHigh-DimensionalInverseProblemsP.-A.Thouvenin‡,A.Repetti†⋆,P.Chainais‡*‡Universit´edeLille,CNRS,CentraleLille,UMR9189CRIStAL,F-59000Lille,France†SchoolofMathematicsandComputerSciences,Heriot-WattUniversity,Edinburgh,UK⋆SchoolofEngineeringa...

展开>> 收起<<

A Distributed Block-Split Gibbs Sampler with Hypergraph Structure for High-Dimensional Inverse Problems.pdf

共42页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

A Distributed Block-Split Gibbs Sampler with Hypergraph Structure for High-Dimensional Inverse Problems

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: