methods to have a further performance boost. We also study
the effectiveness of several SOTA DG methods when they
are applied in the FL setting for image recognition. (d) We
give an intuitive (Section 4.4) and experimental analysis
(Section A) on the privacy-preserving performance of our
style vectors to demonstrate that one can hardly reconstruct
the original images merely from the style vectors using the
generator from a SOTA GAN [32] in FL setting.
2. Related Work
Domain generalization. Domain generalization is a
popular research field that aims to learn a model from
multiple source domains such that the model can gener-
alize on the unseen target domain. Many works are pro-
posed towards solving the domain shifts from various di-
rections under the centralized data setting. Those methods
can be divided into three categories [42], including ma-
nipulating data to enrich data diversity [18,44,37,47],
learning domain-invariant representations or disentangling
domain-shared and specific features to enhance the gen-
eralization ability of model [1,36,4,46] and exploiting
general learning strategies to promote generalizing capabil-
ity [26,17,7,8].
However, many of these methods require centralized
data of different domains, violating the local data preser-
vation in federated learning. Specifically, access for more
than one domain is needed to augment data or generate new
data in [37,18], domain invariant representation learning or
decomposing features is performed under the comparison
across domains [1,36,46] and some learning strategy based
methods utilize extra one domain for meta-update [26,7,8].
Nevertheless, some methods do not explicitly require cen-
tralized domains or can be adapted into federated learning
with minor changes. For example, MixStyle [47] can op-
tionally conduct the style randomization in a single domain
to augment data; [44] uses Fourier transformation to aug-
mentation that is free of sharing data; JiGen [4] proposes
a self-supervised task to enhance representation capability;
RSC [17] designs a learning strategy based on gradient op-
erations without explicit multi-domain requirements.
Federated / decentralized domain generalization. De-
spite many works on centralized domain generalization and
tackling non-IID issues in FL, there are few works address-
ing the DG problem in FL. FedDG [33] exchanges the am-
plitude information across images from different clients and
utilizes episodic learning to improve performance further.
However, it only focuses on the segmentation task with
superficial domain shift in data, and its performance on
image recognition with larger domain shift remains unex-
plored. COPA [43] propose only aggregating the weights
for domain-invariant feature extractor and maintaining an
assemble of domain-specific classifier heads to tackle the
decentralized DG. However, since COPA has to share clas-
sifier heads of all the clients locally and globally, it may
lead to privacy issues, heavier communication, and higher
test-time inference cost.
Neural style transfer. Neural style transfer (NST) aims
to transfer the style of an image to another content im-
age with its semantic structure reserved. The development
of NST has roughly gone through three stages: per-style-
per-model (PSPM), multiple-style-per-model (MSPM) and
arbitrary-style-per-model (ASPM) methods [20]. PSPM
methods [11,21,38] can only transfer a single style for each
trained model. MSPM methods [9,5,45,31] are able to
transfer multiple styles with a single trained model. How-
ever, PSPM and MSPM are expensive to deploy when too
many styles are required to be transferred in our setting.
ASPM [6,16,12,30] can transfer arbitrary styles to any
content images and is often faster than PSPM and MSPM,
which is more suitable for our scenario.
The first ASPM method is proposed by Chen and
Schmidt [6], but it cannot achieve real-time. AdaIN [16]
is the first real-time arbitrary style transfer method, which
utilizes the channel-wise mean and variance as style in-
formation. It performs de-stylization by normalizing the
VGG feature with its own style and then stylizes itself
by affine transformation with the mean and variance of
the style image feature. Another real-time ASPM method
[12] is a follow-up work of CIN [10]. They change the
MSPM method CIN into an ASPM method by predicting
the affine transformation parameters for each style image
through another style prediction network. However, the
level of style-content disentanglement of the predicted style
vector remains unknown, which may have privacy issue in
FL setting. Later, Li et al. [30] propose a universal style-
learning free ASPM method, which utilizes ZCA whitening
transform for de-stylization and coloring transform for style
transfer. However, this method is much slower than previ-
ous methods in practice. Therefore, we choose the neatest
and efficient real-time ASPM method AdaIN as our style
transfer model in our framework.
3. Method
The core idea of our method is to let the distributed
clients have as similar data distribution as possible by in-
troducing styles of other clients into each of them via cross-
client style transfer without dataset leakage. Figure 3shows
the data distribution before and after our CCST method. In
this way, we can make the trained local models learn to fit
all the source client styles and avoid aggregating the local
models biased to different styles. As a result, each client
can be regarded as a deep-all [4] setting, and the local mod-
els will have the same goal to fit styles from all the source
clients. We propose two types of styles that can be chosen
to transfer: one is overall domain style, the other is single
image style. In the following sections, we will introduce