SecureFedYJ a safe feature Gaussianization protocol for Federated Learning

2025-04-15 0 0 788.71KB 23 页 10玖币

侵权投诉

SecureFedYJ: a safe feature Gaussianization protocol

for Federated Learning

Tanguy Marchand

Owkin Inc., New York, USA.

tanguy.marchand@owkin.com

Boris Muzellec

Owkin Inc., New York, USA.

boris.muzellec@owkin.com

Constance Beguier∗Jean Ogier du Terrail

Owkin Inc., New York, USA.

jean.du-terrail@owkin.com

Mathieu Andreux

Owkin Inc., New York, USA.

mathieu.andreux.com

Abstract

The Yeo-Johnson (YJ) transformation is a standard parametrized per-feature unidi-

mensional transformation often used to Gaussianize features in machine learning.

In this paper, we investigate the problem of applying the YJ transformation in a

cross-silo Federated Learning setting under privacy constraints. For the ﬁrst time,

we prove that the YJ negative log-likelihood is in fact convex, which allows us

to optimize it with exponential search. We numerically show that the resulting

algorithm is more stable than the state-of-the-art approach based on the Brent

minimization method. Building on this simple algorithm and Secure Multiparty

Computation routines, we propose SECUREFEDYJ, a federated algorithm that

performs a pooled-equivalent YJ transformation without leaking more information

than the ﬁnal ﬁtted parameters do. Quantitative experiments on real data demon-

strate that, in addition to being secure, our approach reliably normalizes features

across silos as well as if data were pooled, making it a viable approach for safe

federated feature Gaussianization.

1 Introduction

Federated Learning (FL) [

] is an approach that was recently proposed to train machine learning

(ML) models across multiple data holders, or clients, without centralizing data points, notably for

privacy reasons. While many FL applications have been proposed, two main settings have emerged

[

]: cross-device FL, involving a large number of small edge devices, and cross-silo FL, dealing

with a smaller number of clients, with larger computational capabilities. Due to the sensitivity and

relative local scarcity of medical data, healthcare is a promising application of cross-silo FL [

], e.g.

to train a biomedical ML model between different hospitals as if all the datasets were pooled in a

central server. In this paper, we focus on the cross-silo setting.

The constraints of cross-silo FL Although cross-silo FL resembles standard distributed learning,

it faces at least two important distinct challenges: privacy and heterogeneity. Due to data sensitivity,

clients might impose stringent security and privacy constraints on FL collaborations. This arises in

coopetitive FL projects, where models are jointly trained on industrial competitors’ datasets [

], as

well as medical FL applications, where conservative data regulations might apply. In this setting,

using standard FL algorithms such as FEDAVG [

] might not provide enough privacy guarantees, as

privacy attacks such as data reconstruction can be carried out based on the clients’ gradients [

∗Contribution done while at Owkin, Inc.

36th Conference on Neural Information Processing Systems (NeurIPS 2022).

arXiv:2210.01639v2 [cs.LG] 13 Oct 2022

Various protocols based on Secure Multiparty Computation (SMC) (see Section 2 for more details),

such as Secure Aggregation [

], can mitigate this shortcoming by disclosing only the sum of the

gradients from all clients to the server, without disclosing each gradient individually.

An additional constraint is that data might present statistical heterogeneity across clients, i.e. the

local clients’ data distributions may not be identical. In the case of medical applications, such

heterogeneity may be caused e.g. by environmental variations or differences in the material that was

used for acquisition [

]. While different ways of adapting federated training algorithms have

been proposed to automatically tackle heterogeneity [

], these solutions do not address data

harmonization and normalization prior to FL training.

Preprocessing in ML

Data preprocessing is a crucial step in many ML applications, leading to

important performance gains. Among others, common preprocessing methods include data whitening,

principal component analysis (PCA) [

] or zero component analysis [

]. However, linear

normalization methods might not sufﬁce when the original data distribution is highly non-Gaussian.

For tabular and time series data, a popular approach to Gaussianize the marginal distributions is to

apply feature-wise non-linear transformations. Two commonly-used parametric methods are the

Box-Cox [

] transformation and its extension, the Yeo-Johnson (YJ) transformation [

]. Both have

been used in multiple applications, such as climate and weather forecast [

], economics [

]

and genomic studies [7, 58, 9].

Problem and contributions

In this paper, we investigate the problem of data normalization in

the cross-silo FL setting, by exploring how to apply the YJ transformation to a distributed dataset.

This problem arises frequently in medical cross-silo FL, e.g. when trying to jointly train models on

genetic data (see e.g. [

]). Due to data heterogeneity, no single client can act as a reference

client: indeed, there is no guarantee that transformation parameters ﬁtted on a single client would

be relevant for other clients’ data. Hence, it is necessary to ﬁt normalization methods on the full

federated dataset. Moreover, in this setting, data privacy is of paramount importance, and therefore

FL protocols should be carefully designed. Our main contributions to this problem are as follows:

We prove that the negative YJ log-likelihood is convex (Section 3), which is a novel result,

to the best of our knowledge.

Building on this property, we introduce EXPYJ, a method to ﬁt the YJ transformation based

on exponential search (Section 3). We numerically show that this method is more stable

than standard approaches for ﬁtting the YJ transformation based on the Brent minimization

method [6].

We propose SECUREFEDYJ (Section 4), a secure way to extend EXPYJ in the cross-silo

FL setting using SMC. We show that SECUREFEDYJ does not leak any information on

the datasets apart from what is leaked by the parameters minimizing the YJ negative log-

likelihood (Section 4 and Proposition 4.1). By construction, SECUREFEDYJ provides the

same results as the pooled-equivalent EXPYJ, regardless of how the data is split across the

clients. We check this property in numerical experiments (Section 4). The core ideas behind

the resulting algorithm, SECUREFEDYJ, are summarised in Figure 7.

Finally, we illustrate our contributions in numerical applications on synthetic and genomic data in

Section 5.

2 Background

The Yeo-Johnson transformation

The YJ transformation [

] was introduced in order to Gaus-

sianize data that can be either positive or negative. It was proposed as a generalization of the Box-Cox

transformation [

], that only applies to non-negative data. The YJ transformation consists in applying

to each feature a monotonic function

Ψ(λ, ·)

parametrized by a scalar

, independently of the other

features. Thus, there are as many

’s as there are features. For a real number

Ψ(λ, x)

is deﬁned as:

Ψ(λ, x) = 









[(x+ 1)λ−1]/λ, if x≥0, λ 6= 0,

ln(x+ 1),if x≥0, λ = 0,

−[(−x+ 1)2−λ−1]/(2 −λ),if x < 0, λ 6= 2,

−ln(−x+ 1),if x < 0, λ = 2.

(1)

(a) Ψ(λ, ·)for various λ

Yeo-Johnson

(b) The YJ transformation applied to a skew-normal distribution

Figure 1: The Yeo-Johnson transformation applies a 1-D univariate transform to Gaussianize data.

Figure 1a shows the shape of the YJ function for various values of λ.

The Yeo-Johnson likelihood

Let us consider real-valued samples

{xi}i=1,··· ,n

, and let us apply

the YJ transformation

Ψ(λ, ·)

to these samples to Gaussianize their distribution. The log-likelihood

that

{Ψ(λ, xi)}i=1,··· ,n

comes from a Gaussian with mean

and variance

σ2

is given by (derivation

details are provided in Appendix A.1):

log LYJ(λ, σ2, µ) = −n

2log(2πσ2)−1

2σ2

i=1

[Ψ(λ, xi)−µ]2+ (λ−1)

i=1

sgn(xi) log(|xi|+ 1).

For a given

, the log-likelihood is maximized for

µ∗=1

nPn

i=1 Ψ(λ, xi)

and

σ2

∗=

nPn

i=1(Ψ(xi, λ)−µ∗)2. Once we replace µand σ2by µ∗and σ2

∗, it becomes:

log LYJ(λ) = −n

2log(σ2

Ψ(λ,{xi}))+(λ−1)

i=1

sgn(xi) log(|xi|+ 1) −n

2log(2π),(2)

see [

]. Maximizing the YJ log-likelihood is therefore a 1-dimensional problem for each feature.

Once the optimal

λ∗

is found, the transformed data

Ψ(λ, xi)

is usually renormalized by subtracting

its empirical mean µ∗and dividing by the square root of its empirical variance σ2

∗. Figure 1b shows

an example of the YJ transformation applied to a skew-normal distribution. Note that in a typical

application, the triplet

(λ∗, µ∗, σ2

∗)

is ﬁtted on the training data only, and is then used to Gaussianize

the test dataset during inference.

Minimization methods in dimension 1

As seen above, ﬁtting a YJ transformation can be reduced

to a 1D optimization problem. To tackle this problem, we introduce two standard 1D minimization

methods: (i) Brent minimization [6] and (ii) exponential search [3].

Brent minimization [

] (not to be confused with the Brent-Dekker method, see [

], chapters 3 and 4)

is a widely used method for 1D optimization. It is based on golden section search and successive

parabolic interpolations, and does not require evaluating any derivatives. This algorithm is guaranteed

to converge to a local minimum with superlinear convergence of order at least 1.3247. Standard

implementations of the YJ transformation, in particular the scikit-learn implementation [

], are

based on the Brent minimization method to minimize the negative log-likelihood provided by Eq.

(2)

Exponential search [

] is a dichotomic algorithm designed for unbounded search spaces. The idea is

to ﬁrst ﬁnd bounds, and then to perform a classic binary search within these bounds. This algorithm

can be used to ﬁnd the minimum of convex differentiable functions with linear convergence, as

explained in Appendix B. In this work, we build on exponential search to propose a federated version

of the YJ transform, for two main reasons: (i) it is more numerically stable than Brent minimization,

as shown in Section 3 and Figure 2, (ii), it may conveniently be adapted to a federated setting, as

shown in Section 4, and (iii), this latter federated adaptation offers strong privacy garantees, as shown

by Proposition 4.1.

Secure Multiparty Computation

As illustrated by various privacy gradient attacks [

], sensi-

tive information on the clients’ datasets can be leaked to the central server during an FL training. One

way to mitigate this risk is to use Secure Multiparty Computation (SMC) protocols to hide individual

contributions to the server. SMC enables one to evaluate functions with inputs distributed across

different users without revealing intermediate results and is often based on secret sharing. SMC

protocols tailored for ML use-cases have been recently proposed [

These protocols are either designed to enhance the privacy of FL trainings, or to perform secure

inference, i.e. to enable the evaluation of model trained privately on a server without revealing the

data nor the model.

A popular FL algorithm relying on SMC is Secure Aggregation (SA) [

]. Schematically, in SA each

client adds a random mask to their model update before sending it to the central server. These masks

have been tailored in such a way that they all together sum to zero. Therefore, the central server

cannot see the individual updates of the clients, but it can recover the sum of these updates by adding

all the masked quantities sent from them.

More generally, an SMC routine schematically works as follows (we refer to Appendix D for

further details). Let us consider the setting where

parties

k= 1, . . . , K

want to compute

f(h(1), . . . , h(K))

for a known function

, where

(h(1), . . . , h(K))

denote private inputs. Each party

knows

h(k)

and is not willing to share it. During the ﬁrst step, secret sharing, each party splits its

private input

h(k)

into K secret shares

h(k)

1, . . . , h(k)

, and sends the shares

h(k)

to the party

. These

secret shares are constructed in such a way that (i) knowing

h(k)

does not provide any information on

the value of h(k), and (ii) h(k)can be reconstructed from the vector (h(k)

1, . . . , h(k)

K). For simplicity,

we denote

Jh(k)K= (h(k)

1, . . . , h(k)

the vector of share secrets. In a second step, the computation,

each party

computes the quantity denoted

gk0

using the secret shares they know along with

intermediate quantities exchanged with the other parties. The way to compute

gk0

depends on

and

on the SMC protocol that is used, and is chosen so that

g=f(h(1), . . . , h(K))

can be reconstructed

from

(g1,...gK)

. Said otherwise,

gk0

are secret shares of

JgK= (g1,...gK)

. Finally, during

the reveal step, each party

reveals

to all other parties, and each party can reconstruct

from

(g1,...gK).

Threat model

In this work, we consider an honest-but-curious setting [

]. Neither the clients nor

the server will deviate from the agreed protocol, but each party can potentially try to infer as much

information as possible using data they see during the protocol. This setting is relevant for cross-silo

FL, where participants are often large institutions whose reputation could be ternished by a more

malicious behaviour.

3 A novel method to optimize the Yeo-Johnson log-likelihood: EXPYJ

Algorithm 1 EXPYJ

Input: data xi, total data size n, number of steps tmax

Initialize λt=0 ←0,λ+

t=0 ← ∞,λ−

t=0 ← −∞

Compute Sϕ

for t= 1 to tmax

for g∈ {Ψ(λ, ·),Ψ(λ, ·)2, ∂λΨ(λ, ·), ∂λΨ(λ, ·)2}

Compute Sg

end for

∆t= sgn hnS∂Ψ2−2SΨS∂Ψ−2SϕSΨ2−S2

ni

λt, λ−

t, λ+

t←EXPUPDATE(λt−1, λ−

t−1, λ+

t−1,∆t)

end for

λ∗←λtmax

Compute µ∗=SΨ/n and σ2

∗=SΨ2/n −µ2

∗

Output: The ﬁtted triplet (λ∗, µ∗, σ2

∗)

In this section, we leverage the convex-

ity of the negative log-likelihood of the

YJ transformation (see Proposition 3.1)

to propose a new method to ﬁnd the opti-

mal

λ∗

using exponential search. While

this method only offers linear conver-

gence, compared to the super-linear con-

vergence of Brent minimization method,

we demonstrate two of its advantages: (i)

it is more numerically stable, and (ii) it

is easily amenable to an FL setting with

strong privacy guarantees. The method

proposed in this section is based on the

following result.

Proposition 3.1.

The negative log-

likehood

λ7→ −log LYJ(λ)(2)

strictly convex.

The proof of Proposition 3.1 builds upon the work of [

] which shows that the negative log-likelihood

of the Box-Cox transformation [5] is convex. The complete proof is deferred to Appendix C.

The exponential YJ algorithm

The pseudo-code of the proposed algorithm is presented in Al-

gorithm 1, and relies on the exponential search presented in Algorithm 2 (cf Appendix B for more

details on exponential search). An illustration of EXPYJ is shown in Figure 6 in Appendix B. Due

to the strict convexity of the negative log-likelihood of the YJ transformation, we may perform the

exponential search described in Section 2 and Appendix B. To do so, it is enough to obtain the sign

of the derivative. Let

∂λΨ(λ, ·)2= 2Ψ(λ, ·)∂λΨ(λ, ·)

and

ϕ(x) = sgn(x) + log(|x|+ 1).

Further,

for

g∈ {Ψ(λ, ·), ∂λΨ(λ, ·),Ψ(λ, ·)2, ∂λΨ(λ, ·)2, ϕ}

, let us deﬁne

def

=Pn

i=1 g(xi).

The derivative

of the log-likelihood is available in closed form (see Appendix A.3):

∂λlog LYJ =n

S∂Ψ2−2(SΨS∂Ψ)/n

SΨ2−S2

Ψ/n −Sϕ.

Notice that

SΨ2−S2

Ψ/n

can be expressed as a variance, hence is non-negative. We may therefore

obtain sgn [∂λlog LYJ]while avoiding performing division by computing

sgn [∂λlog LYJ] = sgn nS∂Ψ2−2SΨS∂Ψ−2Sϕ(SΨ2−S2

Ψ/n).(3)

Avoiding this division is crucial to make the overall procedure more numerical stable, as explained

below, and eases the use of SMC routines.

Figure 2: Comparison of EXPYJ and scikit-learn.

Left

: For each of the 106 features (see Ap-

pendix E.1), we compute the relative difference

δλ =|λEXPYJ −λsk|/|λsk|

and plot its median,

maximum and

25%

75%

and

10%

90%

percentiles across the 106 features.

Right

: Negative log-

likehood of the YJ transformation for the mean area of the cell of each sample of the Breast Cancer

dataset. Full orange bars correspond to values of

for which the likelihood computed using scikit-

learn returns

∞

σ2

λ({xi})

is equal to

up to ﬂoat-64 machine precision. Dotted lines correspond

to the λ∗found using Brent minimization or EXPYJ with one client.

Algorithm 2 EXPUPDATE

Input: λ,λ+,λ−,∆∈ {−1,1}

if ∆=1then

λ−←λ

λ←(λ++λ)/2if λ+<∞

else λ←max(2λ, 1)

else

λ+←λ

λ←(λ−+λ)/2if λ−>−∞

else λ←min(2λ, −1)

end if

Output: Updated λ, λ+, λ−

Accuracy of EXPYJ

We check the accuracy of EXPYJ on

the datasets presented in Appendix E.1. In particular, we com-

pare the results provided by EXPYJ with the outputs of the

scikit-learn algorithm based on Brent minimization.

For 2 of the 108 features present in the datasets, the scikit-

learn implementation leads to numerical instabilities discussed

hereafter. Therefore, we focus our comparison on the 106

remaining features, that we aggregated regardless of the dataset.

Figure 2 reports the relative difference

δλ

between the results

obtained by EXPYJ and by the scikit-learn implementation

as a function of the number of iteration

tmax

(as deﬁned in

Algorithm 1). These results show that this relative difference

is of order less than 10−6when tmax = 40.

Numerical stability of EXPYJ

Our experiments demonstrate that EXPYJ is numerically more

stable than Brent minimization. Indeed, for some values of

and some datasets

{xi}

, the transforma-

tion

Ψ(λ, ·)

concentrates all data points in a small interval such that the values of

Ψ(λ, {xi})

are all

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

SecureFedYJ:asafefeatureGaussianizationprotocolforFederatedLearningTanguyMarchandOwkinInc.,NewYork,USA.tanguy.marchand@owkin.comBorisMuzellecOwkinInc.,NewYork,USA.boris.muzellec@owkin.comConstanceBeguierJeanOgierduTerrailOwkinInc.,NewYork,USA.jean.du-terrail@owkin.comMathieuAndreuxOwkinInc.,NewYork...

展开>> 收起<<

SecureFedYJ a safe feature Gaussianization protocol for Federated Learning.pdf

共23页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

SecureFedYJ a safe feature Gaussianization protocol for Federated Learning

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: