
dataset. Thus previous algorithms usually do not scale or perform well to the big data scenarios.
2
This leads us to consider the following question:
How to train machine learning models using sublinear communication complexity
in terms of the dataset size without sacrificing the performance in the vertical
federated learning (VFL) setting?
In this paper, we try to answer this question, and our method is based on the notion of coreset [
27
,
22
,
23
]. Roughly speaking, coreset can be viewed as a small data summary of the original dataset, and the
machine learning model trained on the coreset performs similarly to the model trained using the full
dataset. Therefore, as long as we can obtain a coreset in the VFL setting in a communication-efficient
way, we can then run existing algorithms on the coreset instead of the full dataset.
Our contribution
We study the communication-efficient methods for vertical federated learning
with an emphasis on scalability, and design a general paradigm through the lens of coreset. Concretely,
we have the following key contributions:
1.
We design a unified framework for coreset construction in the vertical federated learning
setting (Section 3), which can help reduce the communication complexity (Theorem 2.5).
2.
We study the regularized linear regression (Definition 2.1) and
k
-means (Definition 2.2)
problems in the VFL setting, and apply our unified coreset construction framework to them.
We show that we can get
ε
-approximation for these two problems using only
o(n)
sublinear
communications under mild conditions, where
n
is the size of the dataset (Section 4 and 5).
3.
We conduct numerical experiments to validate our theoretical results. Our numerical experi-
ments corroborate our findings that using coresets can drastically reduce the communication
complexity, while maintaining the quality of the solution (Section 6). Moreover, compared
to uniform sampling, applying our coresets can achieve a better solution with the same or
smaller communication complexity.
1.1 More related works
Federated learning
Federated learning was introduced by McMahan et al.
[54]
, and received
increasing attention in recent years. There exist many works studied in the horizontal federated
learning (HFL) setting, such as algorithms with multiple local update steps [
54
,
17
,
39
,
25
,
56
,
77
] .
There are also many algorithms with communication compression [
38
,
55
,
47
,
45
,
24
,
46
,
60
,
21
,
76
,
61, 78] and algorithms with privacy preserving [70, 30, 79, 64, 48].
Vertical federated learning
Due to the difficulties of VFL, people designed VFL algorithms for
some particular machine learning models, including linear regression [
50
,
74
], logistic regression [
75
,
73
,
29
], gradient boosting trees [
63
,
11
,
10
], and
k
-means [
19
]. For
k
-means, Ding et al.
[19]
proposed an algorithm that computes the global centers based on the product of local centers, which
requires
O(nT )
communication complexity. For linear regression, Liu et al.
[50]
and Yang et al.
[74]
used uniform sampling to get unbiased gradient estimation and improved the communication
efficiency, but the performance may not be good compared to that without sampling. Yang et al.
[73]
also applied uniform sampling to quasi-Newton algorithm and improved communication complexity
for logistic regression. People also studied other settings in VFL, e.g., how to align the data among
different parties [
62
], how to adopt asynchronous training [
9
,
26
], and how to defend against attacks
in VFL [
49
,
53
]. In this work, we aim to develop communication-efficient algorithms to handle
large-scale VFL problems.
Coreset
Coresets have been applied to a large family of problems in machine learning and statistics,
including clustering [
22
,
7
,
31
,
15
,
16
], regression [
20
,
43
,
6
,
13
,
34
,
12
], low rank approximation [
14
],
and mixture model [
52
,
33
]. Specifically, Chhaya et al.
[12]
investigated coreset construction for reg-
ularized regression with different norms. Feldman and Langberg
[22]
, Braverman et al.
[7]
proposed
an importance sampling framework for coreset construction for clustering (including
k
-means). The
coreset size for
k
-means clustering has been improved by several following works [
31
,
15
,
16
] to
˜
O(kε−4)
, and Cohen-Addad et al.
[16]
proved a lower bound of size
Ω(ε−2k)
. Due to the mergable
2In our numerical experiments (Section 6), we provide some results to justify this claim.
2