FL Distributional Transformation 3
client alternately performs two steps in each round: 1) optimizing the personalized
offset to transform the local data via distribution shifts and 2) optimizing a local
model to fit its offsetted local data. After client-side optimization, the FL server
aggregates both the personalized offsets and the local models from all the clients
and sends the aggregated global model and offset back to each client. During
testing, each client adds its personalized offset to each testing input before using
the global model to predict its label.
DisTrans is designed with a special network architecture, called a double-
input-channel model, to accommodate client-side offsets. This double-input-
channel model has a backbone network shared by both channels, a dense layer
accepting outputs from two channels in parallel, and a logits layer that merges
channel-related outputs from the dense layer. This double architecture allows the
offset to be added to an (training or testing) input in one channel but subtracted
from the input in the other. Such addition and subtraction better preserves the
information in the original training and testing data because the original data
can be recovered from the data with offset in the two channels.
We perform extensive evaluation of DisTrans using five different image
datasets and compare it against state-of-the-art (SOTA) methods. Our eval-
uation shows that DisTrans outperforms SOTA FL methods across various
distributional settings of the clients’ local data by 1%–10% with respect to testing
accuracy. Moreover, our evaluation shows that DisTrans achieves 1%–7% higher
testing accuracy than other data transformation / augmentation approaches, i.e.,
mixup [
51
] and AdvProp [
46
]. The code for DisTrans is made available under
(https://github.com/hyhmia/DisTrans).
2 Related Work
Existing federated learning (FL) studies focus on improving accuracy [
35
,
50
,
43
,
39
],
convergence [
12
,
6
,
37
,
17
,
45
,
32
], communication cost [
24
,
41
,
22
,
3
,
23
,
13
,
34
,
49
], se-
curity and privacy [
36
,
10
,
5
,
4
], or others [
16
,
20
,
11
,
47
]. Our work focuses on FL
accuracy.
Personalized Federated Learning.
Prior studies [
43
,
50
,
39
] have attempted
to address personalization, i.e., to make a model better fit a client’s local training
data. For instance, FedAwS [
50
] investigates FL problems where each local model
only has access to the positive data associated with only a single class and
imposes a geometric regularizer at the server after each round to encourage
classes to spread out in the embedding space. pFedMe [
43
] formulates a new
bi-level optimization problem and uses Moreau envelopes to regularize each client
loss function and to decouple personalized model optimization from the global
model learning. pFedHN [
39
] utilizes a hypernetwork model as the global model
to generate weights for each local model. MOON [
29
] uses contrastive learning
to maximize the agreement between local and global model.
Data Transformation.
Data transformation applies label-preserving trans-
formations to images and is a standard technique to improve model accu-
racy in centralized learning. Most of the recent data transformation meth-