SecureFedYJ a safe feature Gaussianization protocol for Federated Learning

2025-04-15 0 0 788.71KB 23 页 10玖币
侵权投诉
SecureFedYJ: a safe feature Gaussianization protocol
for Federated Learning
Tanguy Marchand
Owkin Inc., New York, USA.
tanguy.marchand@owkin.com
Boris Muzellec
Owkin Inc., New York, USA.
boris.muzellec@owkin.com
Constance BeguierJean Ogier du Terrail
Owkin Inc., New York, USA.
jean.du-terrail@owkin.com
Mathieu Andreux
Owkin Inc., New York, USA.
mathieu.andreux.com
Abstract
The Yeo-Johnson (YJ) transformation is a standard parametrized per-feature unidi-
mensional transformation often used to Gaussianize features in machine learning.
In this paper, we investigate the problem of applying the YJ transformation in a
cross-silo Federated Learning setting under privacy constraints. For the first time,
we prove that the YJ negative log-likelihood is in fact convex, which allows us
to optimize it with exponential search. We numerically show that the resulting
algorithm is more stable than the state-of-the-art approach based on the Brent
minimization method. Building on this simple algorithm and Secure Multiparty
Computation routines, we propose SECUREFEDYJ, a federated algorithm that
performs a pooled-equivalent YJ transformation without leaking more information
than the final fitted parameters do. Quantitative experiments on real data demon-
strate that, in addition to being secure, our approach reliably normalizes features
across silos as well as if data were pooled, making it a viable approach for safe
federated feature Gaussianization.
1 Introduction
Federated Learning (FL) [
45
,
32
] is an approach that was recently proposed to train machine learning
(ML) models across multiple data holders, or clients, without centralizing data points, notably for
privacy reasons. While many FL applications have been proposed, two main settings have emerged
[
23
]: cross-device FL, involving a large number of small edge devices, and cross-silo FL, dealing
with a smaller number of clients, with larger computational capabilities. Due to the sensitivity and
relative local scarcity of medical data, healthcare is a promising application of cross-silo FL [
40
], e.g.
to train a biomedical ML model between different hospitals as if all the datasets were pooled in a
central server. In this paper, we focus on the cross-silo setting.
The constraints of cross-silo FL Although cross-silo FL resembles standard distributed learning,
it faces at least two important distinct challenges: privacy and heterogeneity. Due to data sensitivity,
clients might impose stringent security and privacy constraints on FL collaborations. This arises in
coopetitive FL projects, where models are jointly trained on industrial competitors’ datasets [
55
], as
well as medical FL applications, where conservative data regulations might apply. In this setting,
using standard FL algorithms such as FEDAVG [
32
] might not provide enough privacy guarantees, as
privacy attacks such as data reconstruction can be carried out based on the clients’ gradients [
56
,
54
].
Contribution done while at Owkin, Inc.
36th Conference on Neural Information Processing Systems (NeurIPS 2022).
arXiv:2210.01639v2 [cs.LG] 13 Oct 2022
Various protocols based on Secure Multiparty Computation (SMC) (see Section 2 for more details),
such as Secure Aggregation [
4
], can mitigate this shortcoming by disclosing only the sum of the
gradients from all clients to the server, without disclosing each gradient individually.
An additional constraint is that data might present statistical heterogeneity across clients, i.e. the
local clients’ data distributions may not be identical. In the case of medical applications, such
heterogeneity may be caused e.g. by environmental variations or differences in the material that was
used for acquisition [
43
,
47
,
2
]. While different ways of adapting federated training algorithms have
been proposed to automatically tackle heterogeneity [
28
,
29
,
24
], these solutions do not address data
harmonization and normalization prior to FL training.
Preprocessing in ML
Data preprocessing is a crucial step in many ML applications, leading to
important performance gains. Among others, common preprocessing methods include data whitening,
principal component analysis (PCA) [
22
] or zero component analysis [
27
,
20
,
46
]. However, linear
normalization methods might not suffice when the original data distribution is highly non-Gaussian.
For tabular and time series data, a popular approach to Gaussianize the marginal distributions is to
apply feature-wise non-linear transformations. Two commonly-used parametric methods are the
Box-Cox [
5
] transformation and its extension, the Yeo-Johnson (YJ) transformation [
52
]. Both have
been used in multiple applications, such as climate and weather forecast [
53
,
50
,
51
], economics [
13
]
and genomic studies [7, 58, 9].
Problem and contributions
In this paper, we investigate the problem of data normalization in
the cross-silo FL setting, by exploring how to apply the YJ transformation to a distributed dataset.
This problem arises frequently in medical cross-silo FL, e.g. when trying to jointly train models on
genetic data (see e.g. [
19
,
57
]). Due to data heterogeneity, no single client can act as a reference
client: indeed, there is no guarantee that transformation parameters fitted on a single client would
be relevant for other clients’ data. Hence, it is necessary to fit normalization methods on the full
federated dataset. Moreover, in this setting, data privacy is of paramount importance, and therefore
FL protocols should be carefully designed. Our main contributions to this problem are as follows:
1.
We prove that the negative YJ log-likelihood is convex (Section 3), which is a novel result,
to the best of our knowledge.
2.
Building on this property, we introduce EXPYJ, a method to fit the YJ transformation based
on exponential search (Section 3). We numerically show that this method is more stable
than standard approaches for fitting the YJ transformation based on the Brent minimization
method [6].
3.
We propose SECUREFEDYJ (Section 4), a secure way to extend EXPYJ in the cross-silo
FL setting using SMC. We show that SECUREFEDYJ does not leak any information on
the datasets apart from what is leaked by the parameters minimizing the YJ negative log-
likelihood (Section 4 and Proposition 4.1). By construction, SECUREFEDYJ provides the
same results as the pooled-equivalent EXPYJ, regardless of how the data is split across the
clients. We check this property in numerical experiments (Section 4). The core ideas behind
the resulting algorithm, SECUREFEDYJ, are summarised in Figure 7.
Finally, we illustrate our contributions in numerical applications on synthetic and genomic data in
Section 5.
2 Background
The Yeo-Johnson transformation
The YJ transformation [
52
] was introduced in order to Gaus-
sianize data that can be either positive or negative. It was proposed as a generalization of the Box-Cox
transformation [
5
], that only applies to non-negative data. The YJ transformation consists in applying
to each feature a monotonic function
Ψ(λ, ·)
parametrized by a scalar
λ
, independently of the other
features. Thus, there are as many
λ
s as there are features. For a real number
x
,
Ψ(λ, x)
is defined as:
Ψ(λ, x) =
[(x+ 1)λ1]/λ, if x0, λ 6= 0,
ln(x+ 1),if x0, λ = 0,
[(x+ 1)2λ1]/(2 λ),if x < 0, λ 6= 2,
ln(x+ 1),if x < 0, λ = 2.
(1)
2
(a) Ψ(λ, ·)for various λ
Yeo-Johnson
(b) The YJ transformation applied to a skew-normal distribution
Figure 1: The Yeo-Johnson transformation applies a 1-D univariate transform to Gaussianize data.
Figure 1a shows the shape of the YJ function for various values of λ.
The Yeo-Johnson likelihood
Let us consider real-valued samples
{xi}i=1,··· ,n
, and let us apply
the YJ transformation
Ψ(λ, ·)
to these samples to Gaussianize their distribution. The log-likelihood
that
{Ψ(λ, xi)}i=1,··· ,n
comes from a Gaussian with mean
µ
and variance
σ2
is given by (derivation
details are provided in Appendix A.1):
log LYJ(λ, σ2, µ) = n
2log(2πσ2)1
2σ2
n
X
i=1
[Ψ(λ, xi)µ]2+ (λ1)
n
X
i=1
sgn(xi) log(|xi|+ 1).
For a given
λ
, the log-likelihood is maximized for
µ=1
nPn
i=1 Ψ(λ, xi)
and
σ2
=
1
nPn
i=1(Ψ(xi, λ)µ)2. Once we replace µand σ2by µand σ2
, it becomes:
log LYJ(λ) = n
2log(σ2
Ψ(λ,{xi}))+(λ1)
n
X
i=1
sgn(xi) log(|xi|+ 1) n
2log(2π),(2)
see [
52
]. Maximizing the YJ log-likelihood is therefore a 1-dimensional problem for each feature.
Once the optimal
λ
is found, the transformed data
Ψ(λ, xi)
is usually renormalized by subtracting
its empirical mean µand dividing by the square root of its empirical variance σ2
. Figure 1b shows
an example of the YJ transformation applied to a skew-normal distribution. Note that in a typical
application, the triplet
(λ, µ, σ2
)
is fitted on the training data only, and is then used to Gaussianize
the test dataset during inference.
Minimization methods in dimension 1
As seen above, fitting a YJ transformation can be reduced
to a 1D optimization problem. To tackle this problem, we introduce two standard 1D minimization
methods: (i) Brent minimization [6] and (ii) exponential search [3].
Brent minimization [
6
] (not to be confused with the Brent-Dekker method, see [
6
], chapters 3 and 4)
is a widely used method for 1D optimization. It is based on golden section search and successive
parabolic interpolations, and does not require evaluating any derivatives. This algorithm is guaranteed
to converge to a local minimum with superlinear convergence of order at least 1.3247. Standard
implementations of the YJ transformation, in particular the scikit-learn implementation [
36
], are
based on the Brent minimization method to minimize the negative log-likelihood provided by Eq.
(2)
.
Exponential search [
3
] is a dichotomic algorithm designed for unbounded search spaces. The idea is
to first find bounds, and then to perform a classic binary search within these bounds. This algorithm
can be used to find the minimum of convex differentiable functions with linear convergence, as
explained in Appendix B. In this work, we build on exponential search to propose a federated version
of the YJ transform, for two main reasons: (i) it is more numerically stable than Brent minimization,
as shown in Section 3 and Figure 2, (ii), it may conveniently be adapted to a federated setting, as
shown in Section 4, and (iii), this latter federated adaptation offers strong privacy garantees, as shown
by Proposition 4.1.
Secure Multiparty Computation
As illustrated by various privacy gradient attacks [
56
,
54
], sensi-
tive information on the clients’ datasets can be leaked to the central server during an FL training. One
way to mitigate this risk is to use Secure Multiparty Computation (SMC) protocols to hide individual
3
contributions to the server. SMC enables one to evaluate functions with inputs distributed across
different users without revealing intermediate results and is often based on secret sharing. SMC
protocols tailored for ML use-cases have been recently proposed [
12
,
14
,
34
,
39
,
48
,
33
,
49
,
41
].
These protocols are either designed to enhance the privacy of FL trainings, or to perform secure
inference, i.e. to enable the evaluation of model trained privately on a server without revealing the
data nor the model.
A popular FL algorithm relying on SMC is Secure Aggregation (SA) [
4
]. Schematically, in SA each
client adds a random mask to their model update before sending it to the central server. These masks
have been tailored in such a way that they all together sum to zero. Therefore, the central server
cannot see the individual updates of the clients, but it can recover the sum of these updates by adding
all the masked quantities sent from them.
More generally, an SMC routine schematically works as follows (we refer to Appendix D for
further details). Let us consider the setting where
K
parties
k= 1, . . . , K
want to compute
g=
f(h(1), . . . , h(K))
for a known function
f
, where
(h(1), . . . , h(K))
denote private inputs. Each party
k
knows
h(k)
and is not willing to share it. During the first step, secret sharing, each party splits its
private input
h(k)
into K secret shares
h(k)
1, . . . , h(k)
K
, and sends the shares
h(k)
k0
to the party
k0
. These
secret shares are constructed in such a way that (i) knowing
h(k)
k0
does not provide any information on
the value of h(k), and (ii) h(k)can be reconstructed from the vector (h(k)
1, . . . , h(k)
K). For simplicity,
we denote
Jh(k)K= (h(k)
1, . . . , h(k)
K)
the vector of share secrets. In a second step, the computation,
each party
k0
computes the quantity denoted
gk0
using the secret shares they know along with
intermediate quantities exchanged with the other parties. The way to compute
gk0
depends on
f
and
on the SMC protocol that is used, and is chosen so that
g=f(h(1), . . . , h(K))
can be reconstructed
from
(g1,...gK)
. Said otherwise,
gk0
are secret shares of
g
:
JgK= (g1,...gK)
. Finally, during
the reveal step, each party
k
reveals
gk
to all other parties, and each party can reconstruct
g
from
(g1,...gK).
Threat model
In this work, we consider an honest-but-curious setting [
35
]. Neither the clients nor
the server will deviate from the agreed protocol, but each party can potentially try to infer as much
information as possible using data they see during the protocol. This setting is relevant for cross-silo
FL, where participants are often large institutions whose reputation could be ternished by a more
malicious behaviour.
3 A novel method to optimize the Yeo-Johnson log-likelihood: EXPYJ
Algorithm 1 EXPYJ
Input: data xi, total data size n, number of steps tmax
Initialize λt=0 0,λ+
t=0 ← ∞,λ
t=0 ← −∞
Compute Sϕ
for t= 1 to tmax
for g∈ {Ψ(λ, ·),Ψ(λ, ·)2, ∂λΨ(λ, ·), ∂λΨ(λ, ·)2}
Compute Sg
end for
t= sgn hnSΨ22SΨSΨ2SϕSΨ2S2
Ψ
ni
λt, λ
t, λ+
tEXPUPDATE(λt1, λ
t1, λ+
t1,t)
end for
λλtmax
Compute µ=SΨ/n and σ2
=SΨ2/n µ2
Output: The fitted triplet (λ, µ, σ2
)
In this section, we leverage the convex-
ity of the negative log-likelihood of the
YJ transformation (see Proposition 3.1)
to propose a new method to find the opti-
mal
λ
using exponential search. While
this method only offers linear conver-
gence, compared to the super-linear con-
vergence of Brent minimization method,
we demonstrate two of its advantages: (i)
it is more numerically stable, and (ii) it
is easily amenable to an FL setting with
strong privacy guarantees. The method
proposed in this section is based on the
following result.
Proposition 3.1.
The negative log-
likehood
λ7→ −log LYJ(λ)(2)
is
strictly convex.
The proof of Proposition 3.1 builds upon the work of [
26
] which shows that the negative log-likelihood
of the Box-Cox transformation [5] is convex. The complete proof is deferred to Appendix C.
4
The exponential YJ algorithm
The pseudo-code of the proposed algorithm is presented in Al-
gorithm 1, and relies on the exponential search presented in Algorithm 2 (cf Appendix B for more
details on exponential search). An illustration of EXPYJ is shown in Figure 6 in Appendix B. Due
to the strict convexity of the negative log-likelihood of the YJ transformation, we may perform the
exponential search described in Section 2 and Appendix B. To do so, it is enough to obtain the sign
of the derivative. Let
λΨ(λ, ·)2= 2Ψ(λ, ·)λΨ(λ, ·)
and
ϕ(x) = sgn(x) + log(|x|+ 1).
Further,
for
g∈ {Ψ(λ, ·), ∂λΨ(λ, ·),Ψ(λ, ·)2, ∂λΨ(λ, ·)2, ϕ}
, let us define
Sg
def
=Pn
i=1 g(xi).
The derivative
of the log-likelihood is available in closed form (see Appendix A.3):
λlog LYJ =n
2
SΨ22(SΨSΨ)/n
SΨ2S2
Ψ/n Sϕ.
Notice that
SΨ2S2
Ψ/n
can be expressed as a variance, hence is non-negative. We may therefore
obtain sgn [λlog LYJ]while avoiding performing division by computing
sgn [λlog LYJ] = sgn nSΨ22SΨSΨ2Sϕ(SΨ2S2
Ψ/n).(3)
Avoiding this division is crucial to make the overall procedure more numerical stable, as explained
below, and eases the use of SMC routines.
Figure 2: Comparison of EXPYJ and scikit-learn.
Left
: For each of the 106 features (see Ap-
pendix E.1), we compute the relative difference
δλ =|λEXPYJ λsk|/|λsk|
and plot its median,
maximum and
25%
-
75%
and
10%
-
90%
percentiles across the 106 features.
Right
: Negative log-
likehood of the YJ transformation for the mean area of the cell of each sample of the Breast Cancer
dataset. Full orange bars correspond to values of
λ
for which the likelihood computed using scikit-
learn returns
as
σ2
λ({xi})
is equal to
0
up to float-64 machine precision. Dotted lines correspond
to the λfound using Brent minimization or EXPYJ with one client.
Algorithm 2 EXPUPDATE
Input: λ,λ+,λ,∈ {−1,1}
if ∆=1then
λλ
λ(λ++λ)/2if λ+<
else λmax(2λ, 1)
else
λ+λ
λ(λ+λ)/2if λ>−∞
else λmin(2λ, 1)
end if
Output: Updated λ, λ+, λ
Accuracy of EXPYJ
We check the accuracy of EXPYJ on
the datasets presented in Appendix E.1. In particular, we com-
pare the results provided by EXPYJ with the outputs of the
scikit-learn algorithm based on Brent minimization.
For 2 of the 108 features present in the datasets, the scikit-
learn implementation leads to numerical instabilities discussed
hereafter. Therefore, we focus our comparison on the 106
remaining features, that we aggregated regardless of the dataset.
Figure 2 reports the relative difference
δλ
between the results
obtained by EXPYJ and by the scikit-learn implementation
as a function of the number of iteration
tmax
(as defined in
Algorithm 1). These results show that this relative difference
is of order less than 106when tmax = 40.
Numerical stability of EXPYJ
Our experiments demonstrate that EXPYJ is numerically more
stable than Brent minimization. Indeed, for some values of
λ
and some datasets
{xi}
, the transforma-
tion
Ψ(λ, ·)
concentrates all data points in a small interval such that the values of
Ψ(λ, {xi})
are all
5
摘要:

SecureFedYJ:asafefeatureGaussianizationprotocolforFederatedLearningTanguyMarchandOwkinInc.,NewYork,USA.tanguy.marchand@owkin.comBorisMuzellecOwkinInc.,NewYork,USA.boris.muzellec@owkin.comConstanceBeguierJeanOgierduTerrailOwkinInc.,NewYork,USA.jean.du-terrail@owkin.comMathieuAndreuxOwkinInc.,NewYork...

展开>> 收起<<
SecureFedYJ a safe feature Gaussianization protocol for Federated Learning.pdf

共23页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:学术论文 价格:10玖币 属性:23 页 大小:788.71KB 格式:PDF 时间:2025-04-15

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 23
客服
关注