Federated Continual Learning for Text Classification via Selective Inter-client Transfer Yatin Chaudhary12 Pranav Rai12 Matthias Schubert2 Hinrich Schütze2 Pankaj Gupta1

2025-04-26 0 0 1.87MB 11 页 10玖币
侵权投诉
Federated Continual Learning for Text Classification
via Selective Inter-client Transfer
Yatin Chaudhary1,2, Pranav Rai1,2, Matthias Schubert2, Hinrich Schütze2, Pankaj Gupta1
1DRIMCo GmbH, Munich, Germany |2University of Munich (LMU), Munich, Germany
{firstname.lastname}@drimco.net
Abstract
In this work, we combine the two paradigms:
Federated Learning (FL) and Continual Learn-
ing (CL) for text classification task in cloud-
edge continuum. The objective of Federated
Continual Learning (FCL) is to improve deep
learning models over life time at each client
by (relevant and efficient) knowledge transfer
without sharing data. Here, we address chal-
lenges in minimizing inter-client interference
while knowledge sharing due to heterogeneous
tasks across clients in FCL setup. In doing so,
we propose a novel framework, Federated Se-
lective Inter-client Transfer (FedSeIT) which
selectively combines model parameters of for-
eign clients. To further maximize knowledge
transfer, we assess domain overlap and select
informative tasks from the sequence of histor-
ical tasks at each foreign client while preserv-
ing privacy. Evaluating against the baselines,
we show improved performance, a gain of (av-
erage) 12.4% in text classification over a se-
quence of tasks using five datasets from di-
verse domains. To the best of our knowledge,
this is the first work that applies FCL to NLP.
1 Introduction
Federated Learning
(Yurochkin et al.,2019;Li
et al.,2020;Zhang et al.,2020;Karimireddy et al.,
2020;Caldas et al.,2018) in Edge Computing
1
(Wang et al.,2019) has gain attraction in recent
years due to (a) data privacy and sovereignty- espe-
cially imposed by government regulations (GDPR,
CCPA etc.), and (b) the need for sharing knowl-
edge across edge (client) devices such as mobile
phones, automobiles, wearable gadgets, etc. while
maintaining data localization. Federated Learning
(FL) is a privacy-preserving machine learning (ML)
technique that enables collaborative training of ML
models by sharing model parameters across dis-
tributed clients through a central server - without
1extends cloud computing services closer to data sources
sharing their data. In doing so, a central server ag-
gregates model parameters from each participating
client and then distribute the aggregated parame-
ters, where ML models at each client are optimized
using them - achieving inter-client transfer learn-
ing. In this direction, the recent works such as
FedAvg (McMahan et al.,2017), FedProx (Li et al.,
2020), FedCurv (Shoham et al.,2019) have intro-
duced parameter aggregation techniques and shown
improved learning at local clients - augmented by
the parameters of foreign clients.
On the other hand, the edge devices generate a
continuous stream of data where the data distribu-
tion can drift over time; therefore, the need for Con-
tinual Learning like humans do.
Continual Learn-
ing
(CL) (Thrun,1995;Kumar and Daume III,
2012;Kirkpatrick et al.,2017;Schwarz et al.,2018;
Gupta et al.,2020) empowers deep learning mod-
els to continually accumulate knowledge from a
sequence of tasks - reusing historical knowledge
while minimizing catastrophic forgetting (drift in
learning of the historical tasks) over life time.
Federated Continual Learning (FCL):
This
work investigates the combination of the two
paradigms of ML: Federated Learning and Con-
tinual Learning with an objective to model a se-
quence of tasks over time at each client via inter-
client transfer learning while preserving privacy
and addressing heterogeneity of tasks across clients.
There are two key challenges of FCL: (1) catas-
trophic forgetting, and (2) inter-client interference
due to heterogeneity of tasks (domains) at clients.
At central server, FedAvg (McMahan et al.,2017)
aggregates-averages model parameters from each
client without considering inter-client interference.
To address this, FedWeIT (Yoon et al.,2021) ap-
proach performs FCL by sharing task-generic (via
dense base parameters) and task-specific (via task-
adaptive parameters) knowledge across clients. In
doing so, at the server, they aggregate the dense
base parameters however, no aggregation of the
arXiv:2210.06101v2 [cs.CL] 12 Feb 2023
task-adaptive parameters, and then broadcast both
the types of parameters. See further details in Fig-
ure 2and section 2.2. FedWeIT, the first approach
in FCL, investigates computer vision tasks (e.g.,
image classification), however the technique has
limitations in aligning domains of foreign clients
while augmented learning at each local client us-
ing task-adaptive parameters - that are often mis-
aligned with local model parameters in parameter
space (McMahan et al.,2017) due to heterogene-
ity in tasks. Therefore, a simple weighted additive
composition technique does not address inter-client
interference and determine domain relevance in for-
eign clients while performing transfer learning.
Contributions
: To the best of our knowledge,
this is the first work that applies FCL to NLP task
(text classification). At each local client, to max-
imize the inter-client transfer learning and mini-
mize inter-client interference, we propose a novel
approach, Federated Selective Inter-client Trans-
fer (FedSeIT) that aligns domains of the foreign
task-adaptive parameters via projection in the aug-
mented transfer learning. To exploit the effective-
ness of domain-relevance in handling a number of
foreign clients, we further extend FedSeIT by a
novel task selection strategy, Selective Inter-client
Transfer (SIT) that efficiently selects the relevant
task-adaptive parameters from the historical tasks
of (many) foreign clients - assessing domain over-
lap at the global server using encoded data repre-
sentations while preserving privacy. We evaluate
our proposed approaches: FedSeIT and SIT for
Text Classification task in FCL setup using five
NLP datasets from diverse domains and show that
they outperforms existing methods. Our main con-
tributions are as follows:
(1)
We have introduced Federated Continual
Learning paradigm to NLP task of text classifica-
tion that collaboratively learns deep learning mod-
els at distributed clients through a global server
while maintaining data localisation and contin-
ually learn over a sequence of tasks over life
time - minimizing catastrophic forgetting, mini-
mizing inter-client interference and maximizing
inter-client knowledge transfer.
(2)
We have presented novel techniques: Fed-
SeIT and SIT that align domains and select rele-
vant task-adaptive parameters of the foreign clients
while augmented transfer learning at each client via
a global server. Evaluating against the baselines,
we have demonstrated improved performance, a
Notation Description
c,scurrent client, global server
t,rcurrent task, current round
C,T,Rtotal number of clients, tasks, rounds
Tt
ctraining dataset for task tof client c
θt
cmodel parameter set for task tof client c
θGglobal aggregated server parameter
xt
i, yt
iinput document, label pair in Tt
c
z,ˆyCNN output dense vector; predicted label
Bt
c,At
clocal base and task-adaptive parameters
αt
c,mt
cscalar attention parameters, mask parameters
concatenation operation
Wt
c,Wt
fprojection matrices: alignment and augmentation
Knumber of parameters selected for transfer in SIT
D,Dword embedding dimension, dataset
Ld, Lt
cunique labels in dataset d, each task dataset Tt
c
Ffilter size of convolution layer
NFnumber of filters in convolution layer
λ1, λ2hyperparameters: sparsity, catastrophic forgetting
Table 1: Description of the notations used in this work
where matrices and vectors are denoted by uppercase
and lowercase bold characters respectively.
gain of (average) 12.4% in text classification over
a sequence of tasks using 5 datasets. Implementa-
tion of FedSeIT is available at https://github.com/
RaiPranav/FCL-FedSeIT (See appendix Cand D).
2 Methodology
2.1 Federated Continual Learning
Consider a global server
s
and
C
distributed clients,
such that each client
cc∈ {c1, ..., cC}
learns a
local ML model on its privately accessible se-
quence of tasks
{1, ..., t, ..., T }
with datasets
Tc
{T 1
c, ..., Tt
c, ..., TT
c}
, where
Tt
c={xt
i, yt
i}Nt
i=1
is a
labeled dataset for
tth
task consisting of
Nt
pairs
of documents
xt
i
and their corresponding labels
yt
i
. Please note that there is no relation among the
datasets
{T t
c}C
c=1
for task
t
across all clients. Now,
in each training round
r∈ {1, ..., R}
for task
t
, the
training within FCL setup can be broken down into
three steps:
Continual learning at client:
Each client
cc
ef-
fectively optimizes its model parameters
θt(r)
c
(for
task
t
) using task dataset
Tt
c
in a continual learning
setting such that: (a) it minimizes catastrophic for-
getting of past tasks, and (b) it boosts learning on
the current task using the knowledge accumulated
from the past tasks.
Parameter aggregation at server:
After train-
ing on task
t
, each client
cc
transmits updated
model parameters
θt(r)
c
to the server
s
and server
aggregates them into the global parameter
θG
to
What
Causes
Pneumonia
?
What
causes
pneumonia
?
...
...
Embedding
Local Client
Document
Lookup Applying
Convolution Max
Pooling Softmax
Layer
Convolution Filters
......
Foreign Client
Convolution Filters
Foreign Client
Convolution Filters
...
...
...
...
...
Projection Relevance
Augment
Alignment
(Segregated)
Convolution Filters
Local Client
Convolution Filters
Foreign Client
Convolution Filters
Foreign Client
Convolution Filters
Convolutional Filter Composition
+
(a) (b)
...
Composite
convolution
$lter
Figure 1: (a) Illustration of the proposed FedSeIT framework where, task-adaptive parameters of foreign clients
are segregated and domain-aligned for selective utilization. HowToRead: Note the coloring scheme in convolution
filters of local and foreign clients and their application in convolution. (b) Weighted additive filter composition
performed in the baseline model: FedWeIT. Note the composite θt
cvs segregated convolution filters of FedSeIT.
accumulate the knowledge across all clients.
Inter-client knowledge transfer:
Server trans-
mits global aggregated parameter
θG
to all partici-
pating clients for inter-client knowledge transfer in
the next training round r+ 1.
Challenges:
However, there are two main
sources of inter-client interference within FCL
setup: (1) using a single global parameter
θG
dur-
ing parameter aggregation at server to capture the
cross-client knowledge (Yoon et al.,2021) due
to model parameters trained on irrelevant foreign
client tasks, and (2) non-alignment of the foreign
client model parameters given the heterogeneous
task domains across clients. This leads to the hin-
drance of the local model training at each client
by updating its parameters in erroneous directions,
thus resulting in: (a) catastrophic forgetting of the
client’s historical tasks, and (b) sub-optimal learn-
ing of client’s current task. For brevity, we will
omit notation of round
r
from further equations
and mathematical formulation except algorithms.
2.2 Federated Selective Inter-client Transfer
To tackle the above-mentioned challenges, we pro-
pose Federated Selective Inter-client Transfer (Fed-
SeIT) framework which aims to minimize inter-
client interference and communication cost while
maximizing inter-client knowledge transfer in FCL
paradigm. Motivated by Yoon et al. (2021), Fed-
SeIT model decomposes each client’s model pa-
rameters
θt
c
into a set of three different parameters:
(1) dense local base parameters
Bt
c
which cap-
tures and accumulates the task-generic knowledge
across client’s private task sequence
Tc
, (2) sparse
task-adaptive parameters
At
c
which captures the
task-specific knowledge for each task in
Tc
, and
(3) sparse mask parameters
mt
c
which allow client
model to selectively utilize the global knowledge.
For each client
cc
,
Bt
c
is randomly initialized only
once before training on the first task and shared
throughout the task sequence
Tc
, while a new
At
c
and
mt
c
parameters are initialized for each task
t
.
At the global server, we have global parameter
θG
which accumulates task-generic knowledge across
all clients i.e., global knowledge, by aggregating
local base parameters sent from all clients. Finally,
for each client
cc
and task
t
, the model parameters
θt
ccan be described as:
Bt
cθG
θt
c=Bt
cmt
c+At
c
(1)
where, each client initializes
Bt
c
using
θG
received
from the server containing global knowledge, be-
fore training on task
t
, to enable inter-client knowl-
edge transfer. Therefore, the first term signifies
selective utilization of global knowledge using the
mask parameter
mt
c
, which restricts the impact of
inter-client interference during server aggregation.
Due to additive decomposition of parameters, the
second term At
ccaptures task specific knowledge.
Another key benefit of parameter decomposition
is that by accessing task-adaptive parameters
At
c
of the past tasks from foreign clients, a client can
selectively utilize task-specific knowledge of the
relevant tasks, thus further minimizing inter-client
interference and maximizing knowledge transfer.
摘要:

FederatedContinualLearningforTextClassicationviaSelectiveInter-clientTransferYatinChaudhary1,2,PranavRai1,2,MatthiasSchubert2,HinrichSchütze2,PankajGupta11DRIMCoGmbH,Munich,Germanyj2UniversityofMunich(LMU),Munich,Germany{firstname.lastname}@drimco.netAbstractInthiswork,wecombinethetwoparadigms:Fede...

展开>> 收起<<
Federated Continual Learning for Text Classification via Selective Inter-client Transfer Yatin Chaudhary12 Pranav Rai12 Matthias Schubert2 Hinrich Schütze2 Pankaj Gupta1.pdf

共11页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:11 页 大小:1.87MB 格式:PDF 时间:2025-04-26

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 11
客服
关注