Federated Bayesian Computation via Piecewise Deterministic Markov Processes Joris Bierkens Andrew B. Duncan

2025-05-06 0 0 641.48KB 27 页 10玖币

侵权投诉

Federated Bayesian Computation via Piecewise

Deterministic Markov Processes

Joris Bierkens∗

, Andrew B. Duncan†

October 26, 2022

Abstract

When performing Bayesian computations in practice, one is often faced with the challenge

that the constituent model components and/or the data are only available in a distributed

fashion, e.g. due to privacy concerns or sheer volume. While various methods have been

proposed for performing posterior inference in such federated settings, these either make

very strong assumptions on the data and/or model or otherwise introduce signiﬁcant bias

when the local posteriors are combined to form an approximation of the target posterior.

By leveraging recently developed methods for Markov Chain Monte Carlo (MCMC) based

on Piecewise Deterministic Markov Processes (PDMPs), we develop a computation- and

communication- eﬃcient family of posterior inference algorithms (Fed-PDMC) which pro-

vides asymptotically exact approximations of the full posterior over a large class of Bayesian

models, allowing heterogenous model and data contributions from each client. We show that

communication between clients and the server preserves the privacy of the individual data

sources by establishing diﬀerential privacy guarantees. We quantify the performance of Fed-

PDMC over a class of illustrative analytical case-studies and demonstrate its eﬃcacy on a

number of synthetic examples along with realistic Bayesian computation benchmarks.

1 Introduction

In the problem of Federated Bayesian learning we are faced with the unique challenge that,

either due to privacy or scalability, the model and its data are distributed across a federation

of workers. In this setting, the model and/or data owned by the individual worker must not

be disclosed to the other workers, and neither to any coordinator. While this problem has

been studied in previous works, much of the proposed methodology has involved sacriﬁcing the

asymptotic exactness, which is characteristic of MCMC-based sampling algorithms, to facilitate

the distribution and federation of the data, or alternatively being focused on a very narrow class

of models.

The problem of federated learning has largely been studied in the optimisation setting, where

a global loss function which can be decomposed into local worker contributions must be optimised.

Classical strategies including Fed-SGD and Fed-Avg [MMR+17], FedAc [YM20], and subsequent

extensions, build on the general idea that each worker seeks to locally move towards the optimiser

of the model based on their own local data contribution, computing a local gradient, and a central

server aggregates these local gradient candidates in an appropriate fashion to obtain an estimate

of the global minimiser.

∗Delft University of Technology

†Imperial College London, Alan Turing Institute

arXiv:2210.13816v1 [stat.CO] 25 Oct 2022

The validity of these approaches typically hinge on strong assumptions. Firstly, it is assumed

that the data across workers is independent and identically distributed, though a number of more

recent federated learning methods seek to weaken these requirements, e.g. through knowledge

distillation [ZHZ21] or Bayesian non-parametric modelling, e.g. [YAG+19]. Secondly, it is as-

sumed that workers use the same local model, though recent work on model personalization has

suggested some strategies to address this [MMRS20].

In this work, we address the problem of federated learning in a Bayesian setting, i.e. we seek

to generate samples from a global posterior probability distribution obtained as a multiplicative

composition of local posteriors distributed across the workers, and without sharing of model

and/or data. This is an inherently more challenging problem due to the fact that far more

information pertaining to the local posterior distribution must be somehow communicated with

other workers while ensuring privacy of data/model.

Previous works have sought to lift methodology from the federated learning to the Bayesian

setting, employing Stochastic Gradient Langevin Dynamics (SGLD)-based generalisations of Fed-

erated Learning counterparts, e.g. [ZLZ+19, EMMBK21, VPD+22]. Similarly, the Langevin-type

algorithm proposed in [SSR22] combines distributed MCMC with compression techniques to re-

duce the burden of communicating large gradients.

Related approaches seek to employ approximations of the local posterior contributions which

are used to communicate information to the central server. Such approaches include distributed

variational inference [ZBKM18, HWL+17], using Gaussian approximations [ASGXR20] and en-

semble approaches [LAD+21]. In [BGR22], local predictive posterior contributions are distilled

and stored into a neural network which is communicated with the central server.

Some works seek to reformulate the federated learning problem through the lens of Bayesian

model averaging, where the local model contributions are combined into an accurate global ap-

proximation as a model ensemble [CC20, TG20], building on other Bayesian uncertainty quan-

tiﬁcation methods used in deep learning such as [MIG+19]. Related to this are approaches which

adopt a Bayesian hierarchical modelling view of Federated Learning, introducing hierarchical

priors and ﬁxed and random eﬀects to share global information across the diﬀerent federated

workers, [KVMD22].

All of the approaches discussed above either employ local posterior approximations to enable

eﬀective communication, and/or are contingent on very strict approximations on the structure

of the model. To our knowledge, there is no approach which can perform a full, (asymptotically)

exact Bayesian inference in this context for a general Bayesian model. In this paper we provide a

federated (or distributed) approach to Markov Chain Monte Carlo with the following properties:

(i) the correct posterior distribution is retained; (ii) the observational data may be distributed

among workers with no requirement to exchange information other than the algorithmic output;

(iii) the observational data amongst diﬀerent workers does not have to be identically distributed,

nor do the local prior distributions have to be the same; (iv) the eﬃciency of the federated

approach compares favourably to the non-federated approach in the sense that the algorithmic

slowing down is compensated by the fact that computation is distributed among workers; (v) the

amount of information that is communicated between the workers and the server respects the

privacy requirements of the worker, which can be quantiﬁed from a diﬀerential privacy viewpoint.

We will base our approach on the framework of Piecewise Deterministic Monte Carlo [BVD17,

BFR19], which we will introduce in Section 2. As discussed in Section 3 this framework can be

easily extended to allow for a federated (or distributed) approach while retaining the correct

stationary distribution. The computational eﬃciency of our method is discussed in Section 4.

We will also consider our approach from the viewpoint of diﬀerential privacy in Section 5. We

provide numerical experiments for several examples to establish proof of concept and investigate

eﬃciency properties in Section 6.

2 Piecewise Deterministic Monte Carlo

Here we concisely describe the framework of Piecewise Deterministic Monte Carlo (PDMC) in

some generality. Essentially, a PDMC sampler is based on a piecewise deterministic Markov

process. This is a continuous time Markov process which moves along continuous deterministic

trajectories, until at random times, a jump within the state space is made. In PDMC the state

space consists of a position process X(t)∈Rdand a velocity process V(t) taking values in a set

V ⊂ Rd. The jumps (or events) will only aﬀect the velocity. The process will be designed to

have a particular stationary probability distribution µ(dx, dv) with marginal position distribution

π(dx) = Rv∈V µ(dx, dv). Here πmay be considered to be a Bayesian posterior distribution of

interest.

2.1 Deterministic dynamics

In the general setting, the deterministic dynamics are described as the solution of an ordinary

diﬀerential equation

dx(t)

dt =v(t),dv(t)

dt =ψ(x(t)),(1)

where ψ:Rd→Ris a suﬃciently regular function so that solutions to (1) are deﬁned uniquely,

e.g., ψmay be assumed to be globally Lipschitz. The deterministic dynamics are assumed

to preserve a ‘reference’ stationary measure µ0(dx, dv) = π0(dx)⊗ν(dv), where π0(dx) =

exp(−U0(x)) dx for a suitable function U0:Rd→Rand νis a probability measure on V. This

means that for a solution φ(t;x0, v0) := (x(t;x0, v0), v(t;x0, v0)) to (1) with initial condition

(x0, v0), we have for all integrable f:Rd× V → Rthat

Zx0∈Rd,v0∈V

f(φ(t;x0, v0))µ0(dx0, dv0)

=Zx∈Rd,v∈V

f(x, v)µ0(dx, dv).

An interesting special case is when π0(dx) is chosen to be the prior distribution in a Bayesian

inference problem, but this is not necessary.

Example 2.1 (Zig-Zag Sampler).For the Zig-Zag Sampler (ZZS, [BFR19]), we take ψ(x) = 0,

U0(x) = 0, V={−1,+1}dand the stationary velocity distribution is taken to be ν= Uniform(V).

We see that the velocities assume only discrete values which do not change under the deterministic

dynamics.

Example 2.2 (Bouncy Particle Sampler and Boomerang Sampler).Let V=Rdequipped with a

Gaussian measure ν=N(0,Σ). where Σ is a positive deﬁnite matrix. The dynamics (1) preserve

µ0by taking ψ(x) = −Σ∇U0(x). In particular, for the Bouncy Particle Sampler (BPS, [BVD17]

we take U0(x) = 0 and thus ψ(x) = 0, and usually Σ = Idso that µ0(dx, dv) = Leb(Rd)(dx)1⊗

N(0, Id)(dv). For the Boomerang Sampler [BGKR20], we take U0(x) = 1

2xTΣ−1x, so ψ(x) = −x

and have µ0(dx, dv) = N(0,Σ)(dx)⊗ N(0,Σ)(dv). In contrast to the ZZS, the BPS has a

continuous space of possible velocities, but as for ZZS, the velocities do not change under the

deterministic dynamics. For the Boomerang Sampler the deterministic dynamics correspond to

a (skewed) harmonic oscillator.

1Leb(Rd) denotes Lebesgue measure on Rd.

2.2 Jumps

Next we specify the jumping mechanism which changes the velocity at random times. This is

governed by a jump intensity λ(x, v) and a Markov jump kernel Q(x, v, dv0) : Rd×V ×B(V)2→

[0,1]. More generally we may have multiple types of jumps with multiple types of rates (λi)k

i=1

and jump distributions (Qi)k

i=1, competing for which event occurs ﬁrst. Suppose we start from

time 0 at position (x0, v0) and recall that we have deterministic dynamics t7→ φ(t;x0, v0). The

distribution of the inter-event times τiare given by

P(τi≥t) = exp −Zt

λi(φ(s;x0, v0)) ds.

The event that actually takes place is speciﬁed by setting i0= arg miniτi. At time τi0we make

a transition according to the selected jump kernel Qi0, so that the distribution of the velocity

after the jump is given by Qi0(φ(τi0;x0, v0),·).

Remark 2.3. It is always possible to write a combination of jump mechanisms (λi, Qi)k

i=1 as a

single jump mechanism (λ, Q) by deﬁning

λ(x, v) =

i=1

λi(x, v),

Q(x, v, dv0) =

i=1

λi(x, v)

λ(x, v)λ(x,v)>0Qi(x, v, dv0).

This provides a convenient notational simpliﬁcation which we will use whenever this does not

cause confusion.

Using the notation of the previous remark, Algorithm 1 describes a general PDMC sampler.

In practice it may be challenging to simulate τsatisfying (2). We discuss the usual approach

of Poisson thinning in the Appendix.

2.3 Stationary distribution

It is possible to formulate conditions on λiand Qiin order to have a prespeciﬁed stationary

distribution. For a function f:Rd× V → Rwe write

Qf(x, v) = Zv0

Q(x, v, dv0)f(x, v0)

Suppose we wish to have the distribution exp(−U(x))π0(dx) as (marginal) stationary distribu-

tion. In order to achieve this we impose the following conditions (understood to hold for all

bounded measurable f:Rd× V → R):

(i) Invariance of νunder the jump kernels: for all x∈Rd,

Zv∈V

Qif(x, v)ν(dv) = Zv∈V

f(x, v)ν(dv) (3)

2B(V) denotes the σ-ﬁeld of Borel subsets of V.

Algorithm 1 Piecewise Deterministic Monte Carlo

Input: Initial condition (x, v)∈Rd× V.

Output: The sequence of skeleton points (Tk, Xk, Vk)∞

k=0.

1: Set (T0, X0, V0) = (0, x, v).

2: for k= 0,1,2, . . . (until a stopping criterion is met) do

3: Simulate τsuch that

P(τ≥t) = exp −Zt

λ(φ(s;Xk, Vk)) ds(2)

4: Set

Tk+1 =Tk+τ,

(Xk+1,˜

Vk+1) = φ(τ;Xk, Vk)

5: Simulate V(Tk+1)∼Q(Xk+1,˜

Vk+1,·)

6: end for

(ii) Eﬀective sign reversal under jumps: for all i= 1, . . . , k and x∈Rd,

Zv∈V

λi(x, v)Qif(x, v)ν(dv)

=Zv∈V

λi(x, −v)f(x, v)ν(dv),(4)

and

(iii) Event intensity condition: for all (x, v)∈Rd× V,

i=1

[λi(x, v)−λi(x, −v)] = hv, ∇U(x)i.(5)

Under the stated conditions it follows that the process with deterministic dynamics (1), and

jumps according to (λi, Qi)m

i=1, has stationary distribution µ(dx, dv)∝exp(−U(x)) π0(dx)⊗

ν(dv). The proof of this result depends on the notion of the Markov process generator and is

beyond the scope of this work; see e.g. [BFR19, BVD17, BGKR20]

Example 2.4 (Zig-Zag Sampler).For Zig-Zag, we consider for i= 1, . . . , d,λi(x, v) = max(vi∂iU(x),0)+

γi(x) and Qif(x, v) = f(x, Fiv), where γiis a non-negative function (called the excess switching

rate or refreshment rate

(Fiv)j=(vjj6=i,

−vjj=i.

This corresponds to ﬂipping the ith direction of the velocity at a rate which depends on the ith

partial derivative of Uas indicated.

Example 2.5 (Bouncy Particle Sampler and Boomerang Sampler).For BPS and Boomerang,

recall that ν(dv)∝ N(0,Σ) for a positive deﬁnite matrix Σ. We take

λ(x, v) = max(hv, ∇U(x)i,0)

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

FederatedBayesianComputationviaPiecewiseDeterministicMarkovProcessesJorisBierkens*,AndrewB.DuncanOctober26,2022AbstractWhenperformingBayesiancomputationsinpractice,oneisoftenfacedwiththechallengethattheconstituentmodelcomponentsand/orthedataareonlyavailableinadistributedfashion,e.g.duetoprivacyconc...

展开>> 收起<<

Federated Bayesian Computation via Piecewise Deterministic Markov Processes Joris Bierkens Andrew B. Duncan.pdf

共27页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Federated Bayesian Computation via Piecewise Deterministic Markov Processes Joris Bierkens Andrew B. Duncan

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: