1 A Finite Blocklength Approach for Wireless Hierarchical Federated Learning in the Presence of

2025-04-30 0 0 4.77MB 6 页 10玖币
侵权投诉
1
A Finite Blocklength Approach for Wireless
Hierarchical Federated Learning in the Presence of
Physical Layer Security
Haonan Zhang§, Chuanchuan Yang§, Bin Dai§
School of Information Science and Technology, Southwest Jiaotong University, Chengdu, 610031, China.
Department of Electronics, Peking University, Beijing, 100871, China.
§Peng Cheng Laboratory, Shenzhen, 518055, China.
zhanghaonan@my.swjtu.edu.cn, yangchuanchuan@pku.edu.cn, daibin@home.swjtu.edu.cn.
Abstract—The wireless hierarchical federated learning (HFL)
in the presence of physical layer security (PLS) issue is revisited.
Though a framework of this problem has been established in the
previous work, practical secure finite blocklength (FBL) coding
scheme remains unknown. In this paper, we extend the already
existing FBL coding scheme for the white Gaussian channel with
noisy feedback to the wireless HFL with quasi-static fading duplex
channel, and derive achievable rate and upper bound on the
eavesdropper’s uncertainty of the extended scheme. The results
of this paper are further explained via simulation results.
Index Terms—Finite blocklength coding, physical layer security,
privacy-utility trade-off, wireless federated learning
I. INTRODUCTION
The wireless federated learning has been extensively studied
in the literature [1]-[4]. Recently, with the development of edge
computing, the client-edge-cloud hierarchical federated learning
(HFL) systems receive much attention [5]-[6]. However, due to
the broadcast nature of wireless communications, the wireless
FL is susceptible to eavesdropping. In this paper, we study the
wireless HFL in the presence of eavesdropping, see Figure 1.
In Figure 1, users, edge servers and the cloud server cooperate
with each other to jointly train a learning model, and in the
meanwhile, the malicious cloud server may infer the presence
of an individual data sample from a learnt model by various
attacks. Differential privacy (DP) has been proved to be an
effective way to protect the individual data against such attacks,
and hence before aggregation of all users’ gradients to the edge
servers, the Gaussian noise which is used as local differential
privacy (LDP) mechanism [7] is added to the gradient of each
user. Moreover, each edge server communicates with the cloud
server via a duplex fading channel, and due to the broadcast
nature of wireless communication, this channel is eavesdropped
by an external eavesdropper. The main object of Figure 1 is
to minimize the information leakage to the malicious cloud
server subject to a certain mount of utility of the polluted
data gradients (added by the Gaussian noises), and protect the
transmitted data in wireless channels from eavesdropping. In
[8], the fundamental limit in the utility-privacy-physical layer
security (PLS) trade-off was established. However, note that
the secrecy capacity in [8] cannot be approached by finite
blocklength (FBL) coding scheme since it is proved by using
random binning coding scheme [9]. Then it is natural to ask:
can we design a constructive FBL coding scheme for the edge
server, and confuse the eavesdropper as much as possible?
In this paper, first, we extend an already existing FBL coding
scheme for the white Gaussian channel with noisy feedback [10]
to the model of Figure 1, then we derive achievable rate and
upper bound on the eavesdropper’s uncertainty of the extended
scheme. Finally, we show the relationship between utility,
privacy, PLS and other parameters via simulation examples.
Figure 1: The wireless HFL in the presence of eavesdroppers
II. PRELIMINARY, MODEL FORMULATION AND MAIN
RESULTS
A. Preliminary: learning protocol
In Figure 1, there are a cloud server, Ledge servers indexed
by `, and Kusers indexed by kand `.{C`}L
`=1 represents
the disjoint user sets and |C`|is the number of users in
edge domain `,{S`,k}|C`|
k=1 represents the distributed datasets
and S`,k =|S`,k|is the cardinality of S`,k , where S`,k =
{(uk,j , vk,j )}|S`,k |
j=1 ,uk,j Rqis the j-th vector of covariates
with qfeatures and vk,j Ris the corresponding associated
label at user k. Denote the aggregated dataset in edge `domain
by S`, and each edge server aggregates gradients from its users.
The global loss function F(m)is given by
F(m) = 1
S
L
X
`=1
|C`|
X
k=1
S`,kF`,k (m),(2.1)
arXiv:2210.11747v2 [cs.IT] 2 Feb 2023
2
where mRqis the model vector and S=P`PkS`,k.
F`,k(·)is the local loss function for user k, where
F`,k(m) = 1
S`,k X
(uk,j ,vk,j )∈S`,k
f(m;uk,j , vk,j ) + λR(m),(2.2)
and f(m;uk,j , vk,j )is the sample-wise loss function. R(m)is
a strongly convex regularization function and λ0. The model
training by minimizing the global loss function as
m?= arg min
m
F(m).(2.3)
To minimize F(m), we use a distributed gradient descent iter-
ative algorithm. Specifically, in the t-th (t∈ {1,2, ..., T }) com-
munication round (the overall communication round is T), each
user kcomputes its own local gradient F`,k(mt)and the users
send the corrupted local gradients (added by Gaussian noises
for LDP) to the edge servers. Then, the edge server `computes
its estimation d
F`(mt)of the partial gradient and F`(mt) =
1
S`Pk∈C`S`,kF`,k(mt), where S`=|S`|is the total number
of S`. The cloud server’s estimation d
F(mt)of the global
gradient is given by F(mt) = 1
SPL
`=1 S`F`(mt). The
global model mt+1 updated by the cloud server is given by
mt+1 =mtµd
F(mt),(2.4)
where µis the learning rate. For convenience, in the t-th
communication round, we denote Wt,k =S`,kF`,k (mt).
B. Model formulation
In this paper, we assume that each edge server communicates
with the cloud server without interference from other edge
servers. Besides, we assume that the downlink communication
is perfect, which is similar to [2], and eavesdropper only shows
interest in the data transmitted in the uplink communication
between the edge servers and the cloud server. Hence we only
focus on the Trounds uplink communication between one
of the edge servers and the cloud server. An information-
theoretic approach of Figure 1 is illustrated in Figure 2. For
simplification, we make the following assumptions:
Similar to [2]-[3], we assume that the channel coefficients
stay constants during the transmission (quasi-static fading
channel).
Similar to [3]-[4], we assume that the cloud server and the
edge server have perfect channel state information (CSI)
of the feedforward channel and feedback channel.
From similar arguments in [11], we assume that eavesdrop-
per is an active user but it is un-trusted by the cloud server,
which indicates that the perfect CSI of eavesdropper’s
channel is known by the eavesdropper and the edge server.
Moreover, we assume that the eavesdropper also knows the
perfect CSI of the edge server-cloud server’s channels.
Information source: In Figure 2(a), we assume that Wt,k
Rqis the k-th (k∈ {1,2, ..., K}) user’s overall local gradient
vector in t-th (t∈ {1,2, ..., T }) communication round, where
Wt,k = (Wt,k,1, ..., Wt,k,q)T. Similar to [12], the elements
of Wt,k are independent and identically distributed (i.i.d.)
and Wt,k ∼ N(0, S`,kσ2
w,tI). Let ηt,k = (ηt,k,1, ..., ηt,k,q)T
be local artificial Gaussian noise i.i.d. according to distri-
bution N(0, σ2I). The corrupted local gradient W0
t,k =
(a) An information-theoretic approach of Figure 1:
encoding
(b) An information-theoretic approach of Figure 1: decoding
Figure 2: An information-theoretic approach of Figure 1
(W0
t,k,1, ..., W 0
t,k,q )Tthat is aggregated by the edge server is
given by
W0
t,k =Wt,k +ηt,k,(2.5)
where W0
t,k ∼ N(0,(S`,kσ2
w,t +σ2)I)for k∈ {1,2, ..., K}.
The overall local gradients and the overall noises are defined
as Wt= (Wt,1, ..., Wt,q)Tand ηt= (ηt,1, ..., ηt,q)T, re-
spectively, where Wt,i =PK
k=1 Wt,k,i,ηt,i =PK
k=1 ηt,k,i
(i∈ {1,2, ..., q}). According to (2.5), we define the over-
all corrupted local gradients sent to the edge server as
W0
t= (W0
t,1, ..., W 0
t,q)T, where W0
t,i =PK
k=1 W0
t,k,i (i
{1,2, ..., q}). Here note that since Wt,k and ηt,k are i.i.d.
generated, W0
tis also composed of i.i.d. components, where
W0
t∼ N(0,(S`σ2
w,t +Kσ2)I).
Definition 1 (Privacy by mutual information [13]): If
the mutual information between Wtand W0
tsatisfies
1
qT PT
t=1 I(Wt;W0
t), we say the LDP mechanism satisfies
-mutual-information privacy for some  > 0.
Definition 2 (Utility by quadratic distortion [14]): The
utility of W0
tis characterized by d(Wt,W0
t) = ||W0
t
Wt||2, where ||X|| represents the l2-norm of the vector X. If
1
qT PT
t=1 E(d(Wt,W0
t)) U, we say the utility of W0
tis up
to U.
Channels: At time instant i(i∈ {1,2, ..., Nt}of t-th
communication round, channel inputs and outputs are given by
Yi(t) = hXi(t) + η1,i(t), i = 1,2, ..., Nt,(2.6)
e
Yi(t) = e
he
Xi(t) + η2,i(t), i = 1,2, ..., Nt1,(2.7)
Zi(t) = gXi(t) + ege
Xi(t) + ηe,i(t), i = 1,2, ..., Nt,(2.8)
where Xi(t)and e
Xi(t)respectively are the feedforward
and feedback channel inputs, which satisfy the average
power constraints 1
NtPNt
i=1 E[Xi(t)Xi(t)H]Pand
1
Nt1PNt1
i=1 E[e
Xi(t)e
Xi(t)H]e
P.h, e
h, g, egCare the CSI
of the feedforward and feedback channels of the cloud server,
摘要:

1AFiniteBlocklengthApproachforWirelessHierarchicalFederatedLearninginthePresenceofPhysicalLayerSecurityHaonanZhangx,ChuanchuanYangzx,BinDaixSchoolofInformationScienceandTechnology,SouthwestJiaotongUniversity,Chengdu,610031,China.zDepartmentofElectronics,PekingUniversity,Beijing,100871,China.xPeng...

展开>> 收起<<
1 A Finite Blocklength Approach for Wireless Hierarchical Federated Learning in the Presence of.pdf

共6页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:6 页 大小:4.77MB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 6
客服
关注