
Federated Continual Learning for Text Classification
via Selective Inter-client Transfer
Yatin Chaudhary1,2, Pranav Rai1,2, Matthias Schubert2, Hinrich Schütze2, Pankaj Gupta1
1DRIMCo GmbH, Munich, Germany |2University of Munich (LMU), Munich, Germany
{firstname.lastname}@drimco.net
Abstract
In this work, we combine the two paradigms:
Federated Learning (FL) and Continual Learn-
ing (CL) for text classification task in cloud-
edge continuum. The objective of Federated
Continual Learning (FCL) is to improve deep
learning models over life time at each client
by (relevant and efficient) knowledge transfer
without sharing data. Here, we address chal-
lenges in minimizing inter-client interference
while knowledge sharing due to heterogeneous
tasks across clients in FCL setup. In doing so,
we propose a novel framework, Federated Se-
lective Inter-client Transfer (FedSeIT) which
selectively combines model parameters of for-
eign clients. To further maximize knowledge
transfer, we assess domain overlap and select
informative tasks from the sequence of histor-
ical tasks at each foreign client while preserv-
ing privacy. Evaluating against the baselines,
we show improved performance, a gain of (av-
erage) 12.4% in text classification over a se-
quence of tasks using five datasets from di-
verse domains. To the best of our knowledge,
this is the first work that applies FCL to NLP.
1 Introduction
Federated Learning
(Yurochkin et al.,2019;Li
et al.,2020;Zhang et al.,2020;Karimireddy et al.,
2020;Caldas et al.,2018) in Edge Computing
1
(Wang et al.,2019) has gain attraction in recent
years due to (a) data privacy and sovereignty- espe-
cially imposed by government regulations (GDPR,
CCPA etc.), and (b) the need for sharing knowl-
edge across edge (client) devices such as mobile
phones, automobiles, wearable gadgets, etc. while
maintaining data localization. Federated Learning
(FL) is a privacy-preserving machine learning (ML)
technique that enables collaborative training of ML
models by sharing model parameters across dis-
tributed clients through a central server - without
1extends cloud computing services closer to data sources
sharing their data. In doing so, a central server ag-
gregates model parameters from each participating
client and then distribute the aggregated parame-
ters, where ML models at each client are optimized
using them - achieving inter-client transfer learn-
ing. In this direction, the recent works such as
FedAvg (McMahan et al.,2017), FedProx (Li et al.,
2020), FedCurv (Shoham et al.,2019) have intro-
duced parameter aggregation techniques and shown
improved learning at local clients - augmented by
the parameters of foreign clients.
On the other hand, the edge devices generate a
continuous stream of data where the data distribu-
tion can drift over time; therefore, the need for Con-
tinual Learning like humans do.
Continual Learn-
ing
(CL) (Thrun,1995;Kumar and Daume III,
2012;Kirkpatrick et al.,2017;Schwarz et al.,2018;
Gupta et al.,2020) empowers deep learning mod-
els to continually accumulate knowledge from a
sequence of tasks - reusing historical knowledge
while minimizing catastrophic forgetting (drift in
learning of the historical tasks) over life time.
Federated Continual Learning (FCL):
This
work investigates the combination of the two
paradigms of ML: Federated Learning and Con-
tinual Learning with an objective to model a se-
quence of tasks over time at each client via inter-
client transfer learning while preserving privacy
and addressing heterogeneity of tasks across clients.
There are two key challenges of FCL: (1) catas-
trophic forgetting, and (2) inter-client interference
due to heterogeneity of tasks (domains) at clients.
At central server, FedAvg (McMahan et al.,2017)
aggregates-averages model parameters from each
client without considering inter-client interference.
To address this, FedWeIT (Yoon et al.,2021) ap-
proach performs FCL by sharing task-generic (via
dense base parameters) and task-specific (via task-
adaptive parameters) knowledge across clients. In
doing so, at the server, they aggregate the dense
base parameters however, no aggregation of the
arXiv:2210.06101v2 [cs.CL] 12 Feb 2023