A Distributed Block-Split Gibbs Sampler with Hypergraph Structure for High-Dimensional Inverse Problems

2025-04-27 0 0 1.89MB 42 页 10玖币
侵权投诉
A Distributed Block-Split Gibbs Sampler with
Hypergraph Structure for High-Dimensional
Inverse Problems
P.-A. Thouvenin, A. Repetti, P. Chainais*
Universit´
e de Lille, CNRS, Centrale Lille, UMR 9189 CRIStAL, F-59000 Lille, France
School of Mathematics and Computer Sciences, Heriot-Watt University, Edinburgh, UK
School of Engineering and Physical Sciences, Heriot-Watt University, Edinburgh, UK
{pierre-antoine.thouvenin, pierre.chainais}@centralelille.fr, a.repetti@hw.ac.uk
Abstract
Sampling-based algorithms are classical approaches to perform Bayesian inference in in-
verse problems. They provide estimators with the associated credibility intervals to quantify the
uncertainty on the estimators. Although these methods hardly scale to high dimensional prob-
lems, they have recently been paired with optimization techniques, such as proximal and split-
ting approaches, to address this issue. Such approaches pave the way to distributed samplers,
splitting computations to make inference more scalable and faster. We introduce a distributed
Split Gibbs sampler (SGS) to efficiently solve such problems involving distributions with multi-
ple smooth and non-smooth functions composed with linear operators. The proposed approach
leverages a recent approximate augmentation technique reminiscent of primal-dual optimiza-
tion methods. It is further combined with a block-coordinate approach to split the primal and
dual variables into blocks, leading to a distributed block-coordinate SGS. The resulting algo-
rithm exploits the hypergraph structure of the involved linear operators to efficiently distribute
the variables over multiple workers under controlled communication costs. It accommodates
several distributed architectures, such as the Single Program Multiple Data and client-server
architectures. Experiments on a large image deblurring problem show the performance of
the proposed approach to produce high quality estimates with credibility intervals in a small
amount of time. Supplementary material to reproduce the experiments is available online.
Keywords: MCMC algorithm, Bayesian inference, block-coordinate algorithm, distributed archi-
tecture, high dimensional imaging inverse problems
*PC and PAT acknowledge support from the ANR project “Chaire IA Sherlock” ANR-20-CHIA-0031-01, the programme
d’investissements d’avenir ANR-16-IDEX-0004 ULNE and R´
egion HDF. AR acknowledges support from the Royal Society
of Edinburgh.
arXiv:2210.02341v4 [stat.ME] 26 Nov 2023
1 Introduction
This work focuses on sampling from a generic distribution of the form
π(x)exp(h(x)f(x)g(Dx)),(1.1)
where h:H]− ∞,+]is a Lipschitz-differentiable function, f:H]−∞,+]and g:G
],+]are possibly non-smooth functions, and D:HGis a linear operator. Such a distribu-
tion typically arises as the posterior distribution involved in imaging inverse problems, from which
Bayesian estimators need to be formed [26]. In the remainder, we will consider that H=RN
and G=RM, where Nand Mare very large. Sampling from (1.1) is challenging due to (i) the
presence of the composite function gDand (ii) the large dimension of Hand G. To address
these issues, this paper proposes a distributed MCMC algorithm which (i) leverages the approx-
imate augmentation AXDA [37] to decouple (split) functions involved in (1.1), and (ii) exploits
the structure of (1.1) to design a distributed sampler reminiscent of block-coordinate approaches
in optimization. We briefly discuss recent splitting-based distributed samplers in Section 1.1, and
highlight some of their limitations. Scalable splitting optimization approaches are reviewed in Sec-
tion 1.2 to motivate this paper. The proposed approach, which takes further inspiration from the
optimization literature, is outlined in Section 1.3.
1.1 Sampling methods: splitting and distributed techniques
Markov chain Monte Carlo (MCMC) algorithms are generic approaches providing estimates with
associated credibility intervals [29]. They aim to generate a Markov chain that yields samples
from the target distribution (1.1) in the stationary regime. Nevertheless, they are often considered
computationally too expensive to handle high dimensional problems, especially when composite
functions are involved. This is often the case with inverse problems in image processing. Over
the last decade, many authors have proposed more versatile and scalable optimization-inspired
MCMC algorithms [16,25,31]. These approaches exploit quantities repeatedly used in optimiza-
tion to efficiently explore high dimensional parameter spaces, most often gradients and proximal
operators1.
A splitting approach based on an asymptotically exact data augmentation (AXDA) has also re-
cently been proposed by [36,37]. Inspired by splitting optimization approaches [20], AXDA intro-
duces auxiliary variables to split composite distributions. The density (1.1) is then approximated
by
π(α,β)(x,z,u)exp h(x)f(x)g(z)ϕα(Dx,zu)ψβ(u),(1.2)
1The proximal operator of a proper, lower semi-continuous function f:RN]− ∞,+]is defined for any yRN
by [18]: proxf(y) = argmin
xRNf(x) + xy2
2/2.
2
where ϕα:G×G]− ∞,+],ψβ:G]− ∞,+],αcontrols the discrepancy between Dx
and zu, and βis an augmentation parameter. The variable zis to be interpreted as a splitting
variable, and uis an additional augmentation parameter. The role of ϕαis to strongly couple
Dx and zu, while ψβkeeps usmall enough. This latter parameter is not mandatory, but
improves the mixing properties of the sampler by a further decoupling between Dx and z[36].
For appropriate choices of ϕαand ψβ, the marginal distribution of xwith respect to (1.2) converges
to the target distribution (1.1) as (α, β)(0,0). A Gibbs sampler is proposed in [36,37] to draw
samples from (1.2), referred to as the split Gibbs sampler (SGS). The additional cost of adding
new variables is compensated by the benefit of this divide-to-conquer strategy. Further details
on this splitting technique are provided in Section 2.1. Designing efficient algorithms to handle
distributions of the form (1.1) becomes even more challenging when potentially many composite
functions are considered. An extension of SGS has been considered [36,28] for distributions
involving CNcomposite terms, with a density of the form
π(x)exp h(x)f(x)
C
X
c=1
gc(Dcx),(1.3)
where for every c∈ {1, . . . , C},Gc=RMc,M=PC
c=1 Mc,Dc:HGcand gc:Gc]−∞,+].
Applying AXDA to (1.3) leads to an approximation with density
π(α,β)x,(zc,uc)1cC
exp h(x)f(x)
C
X
c=1 gc(zc) + ϕc,αc(Dcx,zcuc) + ψc,βc(uc),(1.4)
where (zc,uc)1cCare auxiliary variables and, for c∈ {1, . . . , C},ϕc,αc:Gc×Gc]−∞,+]
and ψc,βc:Gc]−∞,+]. The variables (zc,uc)1cCare conditionally independent, paving the
way to a distributed implementation on a client-server architecture.
Only a few distributed samplers have been proposed in the literature [28,36,37]. However,
these samplers only focus on distributing the splitting variables associated with the composite
functions gcDcfrom (1.3), without decomposing the global variable xinto blocks. In particular,
[28] propose a consensus-based approximate posterior distribution, addressed with a distributed
Metropolis-within-Gibbs sampler relying on a client-server architecture. The sampler exploits the
conditional independence between the splitting variables to parallelize computations. This is es-
pecially relevant for data-distributed applications, in which a shared parameter value needs to be
inferred from a dataset distributed over multiple workers, e.g., for distributed logistic regression.
However, this setting has several drawbacks when turning to high dimensional problems. First,
the consensus constraint necessitates to duplicate the variables of interest on the different work-
ers. Second, the client-server architecture may induce communication bottlenecks, as all (clients)
workers need to communicate with the server. It also exhibits limitations in terms of distribution
flexibility, since it separates composite functions only, without splitting the high dimensional global
variable of interest xinto blocks. In high dimensions, it may be of interest to split the variable x
itself, not only the data or the composite functions.
3
To summarize, splitting techniques have been introduced in sampling methods. They have per-
mitted to handle multiple composite functions in parallel. Distributed versions have also been
proposed in the literature, but restricted to client-server approaches. The client-server architec-
ture can be critical in a high dimensional setting. In addition, none of these methods considered
splitting the high-dimensional variable of interest xinto blocks. Block-coordinate approaches can
be necessary in practice to handle high-dimensional problems, e.g., when xis an image with more
than 106unknown parameters. Since the distributed split Gibbs sampler proposed in this work are
essentially inspired by optimization approaches, the next paragraph reviews methods from the op-
timization literature that combine all at once composite-function splitting, variable splitting (i.e.,
block-coordinate approaches) and distributed techniques.
1.2 Splitting, distributed and block-coordinate methods in optimization
Optimization-based inference consists in estimating a mode of the distribution (1.1) (e.g., the
maximum a posteriori (MAP) estimator), defined as a solution to
minimize
xHh(x) + f(x) + g(Dx).(1.5)
Optimization algorithms aim to build sequences that asymptotically converges to a solution to
problem (1.5). Problem (1.5) can be efficiently solved with proximal primal-dual methods [20,
13,38], that can be seen as the optimization counterpart of the splitting approaches discussed in
Section 1.1. Similarly to AXDA, primal-dual methods rely on an auxiliary variable zGassociated
with the composite function gD. The variable xHis referred to as the primal variable
while zis called the dual variable, since it is associated with the dual space Ginduced by the
operator D. It is worth noticing that the SGS approach developed by [36] was directly inspired
by such an optimization splitting technique. Splitting proximal algorithms benefit from many
acceleration techniques (e.g., inertia [2,24], preconditioning [7,12]), they are versatile, and
scalable. In particular, they are highly parallelizable, and can be efficiently distributed to split the
computational cost per iteration, under well-established theoretical guarantees [1,10,20].
Designing efficient algorithms to handle distributions of the form (1.3) becomes more challeng-
ing when multiple composite functions are involved, especially when the dimensions of xand y
increase. In this context, the counterpart of problem (1.3) is given by
minimize
xHh(x) + f(x) +
C
X
c=1
gc(Dcx).(1.6)
Primal-dual proximal algorithms [13,38,20,27,9] permit to handle composite terms in (1.6) in
parallel in the dual domains induced by operators (Dc)1cC. This is possible since dual variables
can be distributed on multiple workers.
Further, to address problems with high dimensional variable x, a usual strategy in optimization
consists in splitting xinto blocks (xk)1kK, and either alternate between the blocks [4,8,22,34],
4
Figure 1: Example of the block-sparse structure of the matrix Dinvolved in (1.1). The columns of Dare
split into Kcontiguous blocks (black dashed lines), with a small overlap compared to the size of the blocks.
The orange dashed rectangles highlight the subparts of Dimplemented on each worker k. These act on a
parameter block stored on worker k, and a few parameters stored on worker k+ 1.
or distribute the blocks over multiple workers [11,27]. In particular, [27] combine such block-
coordinate approaches with primal-dual splitting techniques to parallelize and distribute both the
primal and dual variables. Then, the associated minimization problem is of the form
minimize
x=(xk)1kKH
K
X
k=1
hk(xk) + fk(xk) +
C
X
c=1
gc(
K
X
k=1
Dc,kxk),(1.7)
where, for every k∈ {1, . . . , K},hkand fkonly act on the k-th block of the variable xH. Such
a formulation is of particular interest when considering block-sparse matrices Dc, as illustrated
in Figure 1. Finally, [27] also developed asynchronous distributed algorithms over hypergraphs2
structures by combining the resulting block-coordinate primal-dual algorithms with consensus con-
straints that impose all (xk)1kKto be equal. This strategy provides a high flexibility in the choice
of the distribution architecture. Section 3.2 will detail the interest of hypergraphs. Unfortunately,
these algorithms only provide a point estimate, without additional information. In absence of
ground truth, these approaches do not directly quantify the uncertainty over the estimate. The
proposed approach will focus on drawing samples from the distribution corresponding to (1.7)
using a distributed architecture.
1.3 Proposed distributed block-coordinate SGS
This work focuses on cases where xbelongs to a high dimensional space, for instance for Nlarger
than 106. We introduce a distributed block-coordinate SGS to sample from (1.3) by splitting the
global variable of interest xinto blocks (xk)1kK. This approach will also be able to handle mul-
tiple composite functions as in (1.7). To this aim, we pair SGS with optimization-inspired MCMC
transition kernels [16,25,31] to sample from conditional distributions that would otherwise be
either intractable or challenging to handle in a distributed setting.
2Hypergraphs generalize structures of graphs, where edges can connect multiple nodes (i.e., variables) together,
hence generalizing communications between variables.
5
摘要:

ADistributedBlock-SplitGibbsSamplerwithHypergraphStructureforHigh-DimensionalInverseProblemsP.-A.Thouvenin‡,A.Repetti†⋆,P.Chainais‡*‡Universit´edeLille,CNRS,CentraleLille,UMR9189CRIStAL,F-59000Lille,France†SchoolofMathematicsandComputerSciences,Heriot-WattUniversity,Edinburgh,UK⋆SchoolofEngineeringa...

展开>> 收起<<
A Distributed Block-Split Gibbs Sampler with Hypergraph Structure for High-Dimensional Inverse Problems.pdf

共42页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:42 页 大小:1.89MB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 42
客服
关注