Various protocols based on Secure Multiparty Computation (SMC) (see Section 2 for more details),
such as Secure Aggregation [
4
], can mitigate this shortcoming by disclosing only the sum of the
gradients from all clients to the server, without disclosing each gradient individually.
An additional constraint is that data might present statistical heterogeneity across clients, i.e. the
local clients’ data distributions may not be identical. In the case of medical applications, such
heterogeneity may be caused e.g. by environmental variations or differences in the material that was
used for acquisition [
43
,
47
,
2
]. While different ways of adapting federated training algorithms have
been proposed to automatically tackle heterogeneity [
28
,
29
,
24
], these solutions do not address data
harmonization and normalization prior to FL training.
Preprocessing in ML
Data preprocessing is a crucial step in many ML applications, leading to
important performance gains. Among others, common preprocessing methods include data whitening,
principal component analysis (PCA) [
22
] or zero component analysis [
27
,
20
,
46
]. However, linear
normalization methods might not suffice when the original data distribution is highly non-Gaussian.
For tabular and time series data, a popular approach to Gaussianize the marginal distributions is to
apply feature-wise non-linear transformations. Two commonly-used parametric methods are the
Box-Cox [
5
] transformation and its extension, the Yeo-Johnson (YJ) transformation [
52
]. Both have
been used in multiple applications, such as climate and weather forecast [
53
,
50
,
51
], economics [
13
]
and genomic studies [7, 58, 9].
Problem and contributions
In this paper, we investigate the problem of data normalization in
the cross-silo FL setting, by exploring how to apply the YJ transformation to a distributed dataset.
This problem arises frequently in medical cross-silo FL, e.g. when trying to jointly train models on
genetic data (see e.g. [
19
,
57
]). Due to data heterogeneity, no single client can act as a reference
client: indeed, there is no guarantee that transformation parameters fitted on a single client would
be relevant for other clients’ data. Hence, it is necessary to fit normalization methods on the full
federated dataset. Moreover, in this setting, data privacy is of paramount importance, and therefore
FL protocols should be carefully designed. Our main contributions to this problem are as follows:
1.
We prove that the negative YJ log-likelihood is convex (Section 3), which is a novel result,
to the best of our knowledge.
2.
Building on this property, we introduce EXPYJ, a method to fit the YJ transformation based
on exponential search (Section 3). We numerically show that this method is more stable
than standard approaches for fitting the YJ transformation based on the Brent minimization
method [6].
3.
We propose SECUREFEDYJ (Section 4), a secure way to extend EXPYJ in the cross-silo
FL setting using SMC. We show that SECUREFEDYJ does not leak any information on
the datasets apart from what is leaked by the parameters minimizing the YJ negative log-
likelihood (Section 4 and Proposition 4.1). By construction, SECUREFEDYJ provides the
same results as the pooled-equivalent EXPYJ, regardless of how the data is split across the
clients. We check this property in numerical experiments (Section 4). The core ideas behind
the resulting algorithm, SECUREFEDYJ, are summarised in Figure 7.
Finally, we illustrate our contributions in numerical applications on synthetic and genomic data in
Section 5.
2 Background
The Yeo-Johnson transformation
The YJ transformation [
52
] was introduced in order to Gaus-
sianize data that can be either positive or negative. It was proposed as a generalization of the Box-Cox
transformation [
5
], that only applies to non-negative data. The YJ transformation consists in applying
to each feature a monotonic function
Ψ(λ, ·)
parametrized by a scalar
λ
, independently of the other
features. Thus, there are as many
λ
’s as there are features. For a real number
x
,
Ψ(λ, x)
is defined as:
Ψ(λ, x) =
[(x+ 1)λ−1]/λ, if x≥0, λ 6= 0,
ln(x+ 1),if x≥0, λ = 0,
−[(−x+ 1)2−λ−1]/(2 −λ),if x < 0, λ 6= 2,
−ln(−x+ 1),if x < 0, λ = 2.
(1)
2