Federated Continual Learning for Text Classiﬁcation via Selective Inter-client Transfer Yatin Chaudhary12 Pranav Rai12 Matthias Schubert2 Hinrich Schütze2 Pankaj Gupta1

2025-04-26 0 0 1.87MB 11 页 10玖币

侵权投诉

Federated Continual Learning for Text Classiﬁcation

via Selective Inter-client Transfer

Yatin Chaudhary1,2, Pranav Rai1,2, Matthias Schubert2, Hinrich Schütze2, Pankaj Gupta1

1DRIMCo GmbH, Munich, Germany |2University of Munich (LMU), Munich, Germany

{firstname.lastname}@drimco.net

Abstract

In this work, we combine the two paradigms:

Federated Learning (FL) and Continual Learn-

ing (CL) for text classiﬁcation task in cloud-

edge continuum. The objective of Federated

Continual Learning (FCL) is to improve deep

learning models over life time at each client

by (relevant and efﬁcient) knowledge transfer

without sharing data. Here, we address chal-

lenges in minimizing inter-client interference

while knowledge sharing due to heterogeneous

tasks across clients in FCL setup. In doing so,

we propose a novel framework, Federated Se-

lective Inter-client Transfer (FedSeIT) which

selectively combines model parameters of for-

eign clients. To further maximize knowledge

transfer, we assess domain overlap and select

informative tasks from the sequence of histor-

ical tasks at each foreign client while preserv-

ing privacy. Evaluating against the baselines,

we show improved performance, a gain of (av-

erage) 12.4% in text classiﬁcation over a se-

quence of tasks using ﬁve datasets from di-

verse domains. To the best of our knowledge,

this is the ﬁrst work that applies FCL to NLP.

1 Introduction

Federated Learning

(Yurochkin et al.,2019;Li

et al.,2020;Zhang et al.,2020;Karimireddy et al.,

2020;Caldas et al.,2018) in Edge Computing

(Wang et al.,2019) has gain attraction in recent

years due to (a) data privacy and sovereignty- espe-

cially imposed by government regulations (GDPR,

CCPA etc.), and (b) the need for sharing knowl-

edge across edge (client) devices such as mobile

phones, automobiles, wearable gadgets, etc. while

maintaining data localization. Federated Learning

(FL) is a privacy-preserving machine learning (ML)

technique that enables collaborative training of ML

models by sharing model parameters across dis-

tributed clients through a central server - without

1extends cloud computing services closer to data sources

sharing their data. In doing so, a central server ag-

gregates model parameters from each participating

client and then distribute the aggregated parame-

ters, where ML models at each client are optimized

using them - achieving inter-client transfer learn-

ing. In this direction, the recent works such as

FedAvg (McMahan et al.,2017), FedProx (Li et al.,

2020), FedCurv (Shoham et al.,2019) have intro-

duced parameter aggregation techniques and shown

improved learning at local clients - augmented by

the parameters of foreign clients.

On the other hand, the edge devices generate a

continuous stream of data where the data distribu-

tion can drift over time; therefore, the need for Con-

tinual Learning like humans do.

Continual Learn-

ing

(CL) (Thrun,1995;Kumar and Daume III,

2012;Kirkpatrick et al.,2017;Schwarz et al.,2018;

Gupta et al.,2020) empowers deep learning mod-

els to continually accumulate knowledge from a

sequence of tasks - reusing historical knowledge

while minimizing catastrophic forgetting (drift in

learning of the historical tasks) over life time.

Federated Continual Learning (FCL):

This

work investigates the combination of the two

paradigms of ML: Federated Learning and Con-

tinual Learning with an objective to model a se-

quence of tasks over time at each client via inter-

client transfer learning while preserving privacy

and addressing heterogeneity of tasks across clients.

There are two key challenges of FCL: (1) catas-

trophic forgetting, and (2) inter-client interference

due to heterogeneity of tasks (domains) at clients.

At central server, FedAvg (McMahan et al.,2017)

aggregates-averages model parameters from each

client without considering inter-client interference.

To address this, FedWeIT (Yoon et al.,2021) ap-

proach performs FCL by sharing task-generic (via

dense base parameters) and task-speciﬁc (via task-

adaptive parameters) knowledge across clients. In

doing so, at the server, they aggregate the dense

base parameters however, no aggregation of the

arXiv:2210.06101v2 [cs.CL] 12 Feb 2023

task-adaptive parameters, and then broadcast both

the types of parameters. See further details in Fig-

ure 2and section 2.2. FedWeIT, the ﬁrst approach

in FCL, investigates computer vision tasks (e.g.,

image classiﬁcation), however the technique has

limitations in aligning domains of foreign clients

while augmented learning at each local client us-

ing task-adaptive parameters - that are often mis-

aligned with local model parameters in parameter

space (McMahan et al.,2017) due to heterogene-

ity in tasks. Therefore, a simple weighted additive

composition technique does not address inter-client

interference and determine domain relevance in for-

eign clients while performing transfer learning.

Contributions

: To the best of our knowledge,

this is the ﬁrst work that applies FCL to NLP task

(text classiﬁcation). At each local client, to max-

imize the inter-client transfer learning and mini-

mize inter-client interference, we propose a novel

approach, Federated Selective Inter-client Trans-

fer (FedSeIT) that aligns domains of the foreign

task-adaptive parameters via projection in the aug-

mented transfer learning. To exploit the effective-

ness of domain-relevance in handling a number of

foreign clients, we further extend FedSeIT by a

novel task selection strategy, Selective Inter-client

Transfer (SIT) that efﬁciently selects the relevant

task-adaptive parameters from the historical tasks

of (many) foreign clients - assessing domain over-

lap at the global server using encoded data repre-

sentations while preserving privacy. We evaluate

our proposed approaches: FedSeIT and SIT for

Text Classiﬁcation task in FCL setup using ﬁve

NLP datasets from diverse domains and show that

they outperforms existing methods. Our main con-

tributions are as follows:

(1)

We have introduced Federated Continual

Learning paradigm to NLP task of text classiﬁca-

tion that collaboratively learns deep learning mod-

els at distributed clients through a global server

while maintaining data localisation and contin-

ually learn over a sequence of tasks over life

time - minimizing catastrophic forgetting, mini-

mizing inter-client interference and maximizing

inter-client knowledge transfer.

(2)

We have presented novel techniques: Fed-

SeIT and SIT that align domains and select rele-

vant task-adaptive parameters of the foreign clients

while augmented transfer learning at each client via

a global server. Evaluating against the baselines,

we have demonstrated improved performance, a

Notation Description

c,scurrent client, global server

t,rcurrent task, current round

C,T,Rtotal number of clients, tasks, rounds

ctraining dataset for task tof client c

θt

cmodel parameter set for task tof client c

θGglobal aggregated server parameter

i, yt

iinput document, label pair in Tt

z,ˆyCNN output dense vector; predicted label

c,At

clocal base and task-adaptive parameters

αt

c,mt

cscalar attention parameters, mask parameters

⊕concatenation operation

c,Wt

fprojection matrices: alignment and augmentation

Knumber of parameters selected for transfer in SIT

D,Dword embedding dimension, dataset

Ld, Lt

cunique labels in dataset d, each task dataset Tt

Fﬁlter size of convolution layer

NFnumber of ﬁlters in convolution layer

λ1, λ2hyperparameters: sparsity, catastrophic forgetting

Table 1: Description of the notations used in this work

where matrices and vectors are denoted by uppercase

and lowercase bold characters respectively.

gain of (average) 12.4% in text classiﬁcation over

a sequence of tasks using 5 datasets. Implementa-

tion of FedSeIT is available at https://github.com/

RaiPranav/FCL-FedSeIT (See appendix Cand D).

2 Methodology

2.1 Federated Continual Learning

Consider a global server

and

distributed clients,

such that each client

cc∈ {c1, ..., cC}

learns a

local ML model on its privately accessible se-

quence of tasks

{1, ..., t, ..., T }

with datasets

Tc≡

{T 1

c, ..., Tt

c, ..., TT

, where

c={xt

i, yt

i}Nt

i=1

is a

labeled dataset for

tth

task consisting of

pairs

of documents

and their corresponding labels

. Please note that there is no relation among the

datasets

{T t

c}C

c=1

for task

across all clients. Now,

in each training round

r∈ {1, ..., R}

for task

, the

training within FCL setup can be broken down into

three steps:

Continual learning at client:

Each client

ef-

fectively optimizes its model parameters

θt(r)

(for

task

) using task dataset

in a continual learning

setting such that: (a) it minimizes catastrophic for-

getting of past tasks, and (b) it boosts learning on

the current task using the knowledge accumulated

from the past tasks.

Parameter aggregation at server:

After train-

ing on task

, each client

transmits updated

model parameters

θt(r)

to the server

and server

aggregates them into the global parameter

θG

What

Causes

Pneumonia

What

causes

pneumonia

...

Embedding

Local Client

Document

Lookup Applying

Convolution Max

Pooling Softmax

Layer

Convolution Filters

......

Foreign Client

Convolution Filters

Foreign Client

Convolution Filters

...

Projection Relevance

Augment

Alignment

(Segregated)

Convolution Filters

Local Client

Convolution Filters

Foreign Client

Convolution Filters

Foreign Client

Convolution Filters

Convolutional Filter Composition

(a) (b)

...

Composite

convolution

$lter

Figure 1: (a) Illustration of the proposed FedSeIT framework where, task-adaptive parameters of foreign clients

are segregated and domain-aligned for selective utilization. HowToRead: Note the coloring scheme in convolution

ﬁlters of local and foreign clients and their application in convolution. (b) Weighted additive ﬁlter composition

performed in the baseline model: FedWeIT. Note the composite θt

cvs segregated convolution ﬁlters of FedSeIT.

accumulate the knowledge across all clients.

Inter-client knowledge transfer:

Server trans-

mits global aggregated parameter

θG

to all partici-

pating clients for inter-client knowledge transfer in

the next training round r+ 1.

Challenges:

However, there are two main

sources of inter-client interference within FCL

setup: (1) using a single global parameter

θG

dur-

ing parameter aggregation at server to capture the

cross-client knowledge (Yoon et al.,2021) due

to model parameters trained on irrelevant foreign

client tasks, and (2) non-alignment of the foreign

client model parameters given the heterogeneous

task domains across clients. This leads to the hin-

drance of the local model training at each client

by updating its parameters in erroneous directions,

thus resulting in: (a) catastrophic forgetting of the

client’s historical tasks, and (b) sub-optimal learn-

ing of client’s current task. For brevity, we will

omit notation of round

from further equations

and mathematical formulation except algorithms.

2.2 Federated Selective Inter-client Transfer

To tackle the above-mentioned challenges, we pro-

pose Federated Selective Inter-client Transfer (Fed-

SeIT) framework which aims to minimize inter-

client interference and communication cost while

maximizing inter-client knowledge transfer in FCL

paradigm. Motivated by Yoon et al. (2021), Fed-

SeIT model decomposes each client’s model pa-

rameters

θt

into a set of three different parameters:

(1) dense local base parameters

which cap-

tures and accumulates the task-generic knowledge

across client’s private task sequence

, (2) sparse

task-adaptive parameters

which captures the

task-speciﬁc knowledge for each task in

, and

(3) sparse mask parameters

which allow client

model to selectively utilize the global knowledge.

For each client

is randomly initialized only

once before training on the ﬁrst task and shared

throughout the task sequence

, while a new

and

parameters are initialized for each task

At the global server, we have global parameter

θG

which accumulates task-generic knowledge across

all clients i.e., global knowledge, by aggregating

local base parameters sent from all clients. Finally,

for each client

and task

, the model parameters

θt

ccan be described as:

c←θG

θt

c=Bt

cmt

c+At

(1)

where, each client initializes

using

θG

received

from the server containing global knowledge, be-

fore training on task

, to enable inter-client knowl-

edge transfer. Therefore, the ﬁrst term signiﬁes

selective utilization of global knowledge using the

mask parameter

, which restricts the impact of

inter-client interference during server aggregation.

Due to additive decomposition of parameters, the

second term At

ccaptures task speciﬁc knowledge.

Another key beneﬁt of parameter decomposition

is that by accessing task-adaptive parameters

of the past tasks from foreign clients, a client can

selectively utilize task-speciﬁc knowledge of the

relevant tasks, thus further minimizing inter-client

interference and maximizing knowledge transfer.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

FederatedContinualLearningforTextClassicationviaSelectiveInter-clientTransferYatinChaudhary1,2,PranavRai1,2,MatthiasSchubert2,HinrichSchütze2,PankajGupta11DRIMCoGmbH,Munich,Germanyj2UniversityofMunich(LMU),Munich,Germany{firstname.lastname}@drimco.netAbstractInthiswork,wecombinethetwoparadigms:Fede...

展开>> 收起<<

Federated Continual Learning for Text Classiﬁcation via Selective Inter-client Transfer Yatin Chaudhary12 Pranav Rai12 Matthias Schubert2 Hinrich Schütze2 Pankaj Gupta1.pdf

共11页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Federated Continual Learning for Text Classiﬁcation via Selective Inter-client Transfer Yatin Chaudhary12 Pranav Rai12 Matthias Schubert2 Hinrich Schütze2 Pankaj Gupta1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: