The validity of these approaches typically hinge on strong assumptions. Firstly, it is assumed
that the data across workers is independent and identically distributed, though a number of more
recent federated learning methods seek to weaken these requirements, e.g. through knowledge
distillation [ZHZ21] or Bayesian non-parametric modelling, e.g. [YAG+19]. Secondly, it is as-
sumed that workers use the same local model, though recent work on model personalization has
suggested some strategies to address this [MMRS20].
In this work, we address the problem of federated learning in a Bayesian setting, i.e. we seek
to generate samples from a global posterior probability distribution obtained as a multiplicative
composition of local posteriors distributed across the workers, and without sharing of model
and/or data. This is an inherently more challenging problem due to the fact that far more
information pertaining to the local posterior distribution must be somehow communicated with
other workers while ensuring privacy of data/model.
Previous works have sought to lift methodology from the federated learning to the Bayesian
setting, employing Stochastic Gradient Langevin Dynamics (SGLD)-based generalisations of Fed-
erated Learning counterparts, e.g. [ZLZ+19, EMMBK21, VPD+22]. Similarly, the Langevin-type
algorithm proposed in [SSR22] combines distributed MCMC with compression techniques to re-
duce the burden of communicating large gradients.
Related approaches seek to employ approximations of the local posterior contributions which
are used to communicate information to the central server. Such approaches include distributed
variational inference [ZBKM18, HWL+17], using Gaussian approximations [ASGXR20] and en-
semble approaches [LAD+21]. In [BGR22], local predictive posterior contributions are distilled
and stored into a neural network which is communicated with the central server.
Some works seek to reformulate the federated learning problem through the lens of Bayesian
model averaging, where the local model contributions are combined into an accurate global ap-
proximation as a model ensemble [CC20, TG20], building on other Bayesian uncertainty quan-
tification methods used in deep learning such as [MIG+19]. Related to this are approaches which
adopt a Bayesian hierarchical modelling view of Federated Learning, introducing hierarchical
priors and fixed and random effects to share global information across the different federated
workers, [KVMD22].
All of the approaches discussed above either employ local posterior approximations to enable
effective communication, and/or are contingent on very strict approximations on the structure
of the model. To our knowledge, there is no approach which can perform a full, (asymptotically)
exact Bayesian inference in this context for a general Bayesian model. In this paper we provide a
federated (or distributed) approach to Markov Chain Monte Carlo with the following properties:
(i) the correct posterior distribution is retained; (ii) the observational data may be distributed
among workers with no requirement to exchange information other than the algorithmic output;
(iii) the observational data amongst different workers does not have to be identically distributed,
nor do the local prior distributions have to be the same; (iv) the efficiency of the federated
approach compares favourably to the non-federated approach in the sense that the algorithmic
slowing down is compensated by the fact that computation is distributed among workers; (v) the
amount of information that is communicated between the workers and the server respects the
privacy requirements of the worker, which can be quantified from a differential privacy viewpoint.
We will base our approach on the framework of Piecewise Deterministic Monte Carlo [BVD17,
BFR19], which we will introduce in Section 2. As discussed in Section 3 this framework can be
easily extended to allow for a federated (or distributed) approach while retaining the correct
stationary distribution. The computational efficiency of our method is discussed in Section 4.
We will also consider our approach from the viewpoint of differential privacy in Section 5. We
provide numerical experiments for several examples to establish proof of concept and investigate
efficiency properties in Section 6.
2