Federated Boosted Decision Trees with Differential Privacy_2

2025-04-27 0 0 1.53MB 23 页 10玖币

侵权投诉

Federated Boosted Decision Trees with Dierential Privacy∗

Samuel Maddock†

University of Warwick

Graham Cormode

Meta AI

Tianhao Wang‡

University of Virginia

Carsten Maple

University of Warwick

Somesh Jha‡

University of Wisconsin-Madison

ABSTRACT

There is great demand for scalable, secure, and ecient privacy-

preserving machine learning models that can be trained over dis-

tributed data. While deep learning models typically achieve the

best results in a centralized non-secure setting, dierent models can

excel when privacy and communication constraints are imposed. In-

stead, tree-based approaches such as XGBoost have attracted much

attention for their high performance and ease of use; in particular,

they often achieve state-of-the-art results on tabular data. Conse-

quently, several recent works have focused on translating Gradient

Boosted Decision Tree (GBDT) models like XGBoost into feder-

ated settings, via cryptographic mechanisms such as Homomorphic

Encryption (HE) and Secure Multi-Party Computation (MPC). How-

ever, these do not always provide formal privacy guarantees, or

consider the full range of hyperparameters and implementation

settings. In this work, we implement the GBDT model under Dier-

ential Privacy (DP). We propose a general framework that captures

and extends existing approaches for dierentially private decision

trees. Our framework of methods is tailored to the federated setting,

and we show that with a careful choice of techniques it is possible to

achieve very high utility while maintaining strong levels of privacy.

KEYWORDS

Gradient Boosting, Dierential Privacy, Federated Learning

1 INTRODUCTION

It is well known that machine learning models can leak private in-

formation about individuals in the training set [

]. Dierential

privacy (DP) [

] is a popular denition that has been developed to

mitigate such privacy risks and has become the dominant notion

of privacy in recent years. Much of the current research on private

machine learning is focused on training deep learning models with

dierential privacy [

]. DP is often combined with feder-

ated learning, where data resides on client devices, and only small

information about model updates is collected from clients, in order

to further minimize the privacy risk [36].

While deep learning models are powerful for a range of real-

world tasks in a centralized setting, they are sometimes beaten by

“simpler” models on tabular datasets. One such competitor is Gradi-

ent Boosted Decision Trees (GBDTs) [

] [

]. GBDT meth-

ods build an ensemble of weak decision trees that incrementally

correct for past mistakes in training to improve predictions. Many

GBDT frameworks such as XGBoost [

], LightGBM [

], and Cat-

Boost [

] have seen widespread industry adoption [

∗Full version of a paper to appear at ACM CCS’22

†Author correspondence to s.maddock@warwick.ac.uk

‡Work was done while part-time at Meta AI

GBDT methods are an attractive alternative to deep learning due

to their speed, scalability, ease of use, and impressive performance

on tabular datasets.

Recent works have studied GBDT implementations such as XG-

Boost under secure training in the federated setting [

These methods typically rely on cryptographic techniques such

as Homomorphic Encryption (HE) or Secure Multi-Party Compu-

tation (MPC). While this allows secure joint training of a GBDT

model without any participant directly releasing their data, the

end model may not necessarily be private and will not guarantee

formal dierential privacy (DP) [

]. For instance, in the case of

decision trees, split decisions in a tree can directly reveal sensitive

information regarding the training set. Moreover, such reliance on

heavyweight cryptographic techniques such as HE or MPC often

makes methods computationally intensive or require a large num-

ber of communication rounds, making them impractical to scale

beyond more than a few participants [17, 42].

In parallel, many works have studied decision tree models under

the central model of DP [

]. Most studies focus on training

random forest (RF) models and there has been little research to

explore trade-os between gradient boosting and DP; those that do

often use central DP mechanisms that cannot easily be extended to

federated settings [

]. It therefore remains an open problem to im-

plement GBDTs in the federated DP setting, and show how to obtain

utility comparable to their centralized non-private counterparts.

Our focus is on DP-GBDT methods that operate within the fed-

erated setting via lightweight MPC methods such as secure aggre-

gation [

]. This setting has recently risen to prominence, as it

promises an attractive trade-o between the computational e-

ciency of central DP techniques and the security of cryptographic

methods. Recent federated works that consider GBDTs have pro-

posed methods under the local model of DP, but due to the use of

local noise, incur a signicant loss in utility [43, 61, 62].

In this paper, we bring together existing methods under a unied

framework where we propose techniques to satisfy DP that are

well suited to the federated setting. We nd that by dissecting the

GBDT algorithm into its constituent parts and carefully considering

the options for each component of the algorithm we can identify

specic combinations that achieve the best balance of privacy and

utility. We also emphasise variants that can train such private GBDT

models in only a small number of communication rounds, which is

of particular importance to the federated setting.

Our high-level nding is that it is possible to achieve high perfor-

mance with GBDT models, even comparable to that of non-private

methods. In order to do so, one must allocate privacy budget to

the quantities that are most important for the learning process. For

example, we show that spending such budget on computing split

decisions of trees is not as important as spending it on the leaf

arXiv:2210.02910v1 [cs.CR] 6 Oct 2022

weights. Using our ndings under the ecient privacy accounting

of Rényi Dierential Privacy (RDP) leads to performance that is far

closer to the non-private setting than seen in previous works.

Our main contributions are as follows:

•

A clear and concise framework for dierentially private gra-

dient boosting with decision trees. We deconstruct the GBDT

algorithm into ve main components, showing how to fed-

erate each component while satisfying Rényi Dierential

Privacy (RDP). We present a unifying approach, capturing

recently proposed DP tree-based models as special cases.

•

A new set of techniques for improving the utility of private

federated GBDT models. For example, we propose a private

method for discretising continuous features that makes as

much use of the private training information as possible,

incurring little additional privacy cost. Additionally, we ex-

plore batching weight updates, showing it is possible to

maintain competitive model performance while reducing

the number of communication rounds needed.

•

An extensive set of experiments on a range of benchmark

datasets exploring the trade-os between various options

in our framework. By evaluating the choices in each of the

components of our framework,we nd a clear dominant ap-

proach is formed by adapting and simplifying the GBDT

algorithm while combining it with our improved split candi-

date method. We show it is possible to achieve higher utility

than state-of-the-art (SOTA) DP-RF and DP-GBDT methods

on a range of datasets with reasonable levels of privacy.

•

We provide open-source code at https://github.com/Samuel-

Maddock/federated-boosted-dp-trees

Roadmap. In Section 2 we outline technical preliminaries required

to understand dierentially private GBDTs before covering related

works in Section 3. In Section 4 and 5 we describe our framework

for DP-GBDTs, tting existing methods within this and proposing

combinations to study. In Section 6 we provide extensive experi-

mental evaluations, comparing our methods to existing baselines

within our framework before concluding with Section 7.

2 PRELIMINARIES

2.1 Gradient Boosted Decision Trees (GBDT)

Tree-based ensemble methods form a collection of

𝑇

decision trees

that predict ˆ

𝑦𝑖for each input 𝒙𝑖:

𝑦𝑖:=𝑓(𝒙𝑖)=Í𝑇

𝑡=1𝑓𝑡(𝒙𝑖)

For a specic tree

𝑓𝑡

let

𝐿𝑡

denote the number of leaf nodes. Each

leaf node of a tree contains a weight, which will be the output of

the tree for observations that are classied into that leaf. We denote

𝒘(𝑡)∈R𝐿𝑡as the vector of leaf node weights for a tree 𝑓𝑡.

GBDT methods train trees sequentially making use of past pre-

dictions to correct for mistakes. This is in contrast to random forest

(RF) methods that train

𝑇

trees in parallel, averaging the weights

of trees for the nal prediction.

For a set of examples

𝐷={(𝒙𝑖, 𝑦𝑖)}𝑛

𝑖=1

with corresponding

predictions {ˆ

𝑦𝑖}𝑛

𝑖=1the GBDT objective function is dened as

L(𝑓)=Í𝑛

𝑖=1ℓ(𝑦𝑖,ˆ

𝑦𝑖) + Í𝑇

𝑡=1Ω(𝑓𝑡)(1)

where

ℓ

is a twice-dierentiable loss function, typically the cross-

entropy loss (binary classication) or squared-error loss (regres-

sion). The term

Ω(𝑓𝑡)=𝛾𝐿𝑡+𝜆

2||𝒘(𝑡)||2

is a form of regularisation

such that

𝛾≥

0penalises the size of the tree and

𝜆≥

0penalises

the magnitude of weights. This regularisation term is present in

the popular XGBoost algorithm but is often omitted in other GBDT

variants; we adopt it for our experimental study.

Equation

(1)

evades direct optimization. Rather, GBDT models

are trained sequentially based on previous models. At step

𝑡

we can

dene the model prediction

𝑦(𝑡)

𝑖=Í𝑡

𝑘=1𝑓𝑘(𝒙𝑖)=ˆ

𝑦(𝑡−1)

𝑖+𝑓𝑡(𝒙𝑖)

The objective for optimising 𝑓𝑡becomes

L(𝑡)(𝑓𝑡)=Í𝑛

𝑖=1ℓ(𝑦𝑖,ˆ

𝑦(𝑡−1)

𝑖+𝑓𝑡(𝒙𝑖)) + Ω(𝑓𝑡)(2)

For step

𝑡

we are concerned with nding a tree

𝑓𝑡

that minimises

(2)

. Since

𝑓𝑡

is not dierentiable we can use a Taylor approxima-

tion. Taking the rst-order approximation leads to the standard

Gradient Boosting Machine (GBM) method. Taking a second-order

approximation leads to Newton boosting as used by XGBoost [

When taking a rst-order approximation we obtain

L(𝑡)(𝑓𝑡) ≈ Í𝑛

𝑖=1ℓ(𝑦𝑖,ˆ

𝑦(𝑡−1)) +𝑔(𝑡)

𝑖𝑓𝑡(𝒙𝑖)+Ω

where

𝑔(𝑡)

𝑖=𝜕

𝜕ˆ

𝑦(𝑡−1)

𝑖

ℓ(𝑦𝑖,ˆ

𝑦(𝑡−1)

𝑖)

is the gradient of the loss function

at the start of step

𝑡

. By considering the index sets of examples

mapped to leaf node

𝑙

i.e.,

𝐼𝑙={𝑖|𝒙𝑖belongs to leaf 𝑙of 𝑓𝑡}

one can

show by expanding the above and dierentiating with respect to

𝑤(𝑡)

𝑙that the optimal leaf weight is

𝑤(𝑡)

𝑙

=−Í𝑖∈𝐼𝑙𝑔(𝑡)

𝑖

|𝐼𝑙|+𝜆(3)

We denote this as a gradient weight update. Taking a second-order

approximation of (2) instead gives

L(𝑡)(𝑓𝑡) ≈ Í𝑛

𝑖=1ℓ(𝑦𝑖,ˆ

𝑦(𝑡−1)) +𝑔(𝑡)

𝑖𝑓𝑡(𝒙𝑖) +ℎ(𝑡)

𝑖

𝑓2

𝑡(𝒙𝑖)

2+Ω(𝑓𝑡)

where

ℎ(𝑡)

𝑖=𝜕2

𝜕(ˆ

𝑦(𝑡−1)

𝑖)2ℓ(𝑦𝑖,ˆ

𝑦(𝑡−1)

𝑖)

is the Hessian of the loss at the

start of step

𝑡

. As before one can show that the optimal weights

this time are

𝑤(𝑡)

𝑙

=−Í𝑖∈𝐼𝑙𝑔(𝑡)

𝑖

Í𝑖∈𝐼𝑙ℎ(𝑡)

𝑖+𝜆(4)

which we denote as a Newton weight update. Substituting optimal

weights from either the rst or second-order approximation into

Equation

(2)

leads to quantities that can be used to measure a split

score. In other words, when considering a split option that partitions

examples into disjoint index sets

𝐼=𝐼1∪𝐼2

, the split score is a

measure of how useful a split is for classication. The split score

for Newton updates can be computed as

𝑆𝑆 (𝐼1, 𝐼2)=1

2(Í𝑖∈𝐼1𝑔𝑖)2

Í𝑖∈𝐼1ℎ𝑖+𝜆+(Í𝑖∈𝐼2𝑔𝑖)2

Í𝑖∈𝐼2ℎ𝑖+𝜆−(Í𝑖∈𝐼𝑔𝑖)2

Í𝑖∈𝐼ℎ𝑖+𝜆−𝛾(5)

In practice to form such split options, GBDT methods often dis-

cretize continuous features (e.g., via quantiles) into

𝑄

split candi-

dates. In order to handle categorical features, GBDT methods like

XGBoost typically transform them e.g., via a one-hot encoding. In

either case, this leads to splits of the form

𝐼≤={𝑖

𝑥𝑖 𝑗 ≤𝑠𝑗

𝑞}

for a split candidate

𝑠𝑗

𝑞

. Equation

(5)

can then be used to greedily

choose the feature split-candidate pair with the largest score when

growing the tree structure during training.

2.2 Dierential Privacy

Dierential Privacy (DP) is a formal denition of privacy that guar-

antees the output of a data analysis does not depend signicantly

on a single individual’s data item. Such a denition can be based

on the notion of privacy loss.

Denition 2.1 (Privacy Loss Random Variable). Given a randomised

mechanism

X → Y

we dene the privacy loss random vari-

able 𝐿M,𝑥,𝑥′over “neighbouring” datasets 𝑥, 𝑥 ′∈ X as

𝐿M,𝑥,𝑥′=log 𝑝M(𝑥)(𝑋)

𝑝M(𝑥′)(𝑋)

where

𝑋∼ M(𝑥)

and

𝑝M(·)

is the density of the mechanism

applied to the respective dataset.

We take neighbouring datasets

𝑥, 𝑥 ′∈ X

to mean that

𝑥

and

𝑥′

dier on a single individual. The privacy loss allows us to succinctly

describe dierential privacy.

Denition 2.2 (Dierential Privacy in terms of privacy loss [

]). We

say that a randomised mechanism

X → Y

satises

(𝜖, 𝛿)

-DP

if for any adjacent datasets 𝑥, 𝑥 ′∈ X

P(𝐿M,𝑥,𝑥′≥𝜖) ≤ 𝛿

The privacy parameter

𝜖

is referred to as the privacy budget. When

𝛿=

0, we say that

satises

𝜖

-DP. In this work we only consider

privacy guarantees where 𝛿>0i.e., the case of approximate-DP.

While

(𝜖, 𝛿)

-DP is a useful denition of privacy it does not allow

us to tightly quantify the privacy loss from the composition of mul-

tiple mechanisms [

]. This is particularly important in machine

learning where we wish to use mechanisms many times over the

same dataset to train models. Instead, the notion of Rényi Dier-

ential Privacy (RDP) provides a succinct way to track the privacy

loss from a composition of multiple mechanisms by representing

privacy guarantees through moments of the privacy loss.

Denition 2.3 (Rényi Dierential Privacy [

]). A mechanism

X → Y

is said to satisfy

(𝛼, 𝜏 )

-RDP if the following holds for

any two adjacent datasets 𝑥, 𝑥 ′∈ X

Eh𝐿(𝛼−1)

M,𝑥,𝑥′i≤exp((𝛼−1)𝜏)

One of the simplest and most widely-used mechanisms to guar-

antee (𝛼, 𝜏 )-RDP is the Gaussian mechanism.

Fact 2.1 (Gaussian Mechanism [

]). The Gaussian mecha-

nism M:X → R𝑚of the form

M(𝑥)=𝑞(𝑥) + 𝑁(0,Δ2(𝑞)2𝜎2𝐼𝑚)

satises (𝛼, 𝜏 )-RDP with 𝜏=𝛼

2𝜎2and

Δ2(𝑞)=max𝑥,𝑥′∥𝑞(𝑥) −𝑞(𝑥′)∥2

The quantity

Δ2(𝑞)

is the

𝐿2

-sensitivity of the query

𝑞

. The above

shows that in order to make a real-valued query

𝑞

dierentially

private we just need to add suitably calibrated Gaussian noise.

An attractive property of this formulation of DP is that it is easy

to reason about the privacy of an analysis where mechanisms are

used multiple times on the same dataset.

Fact 2.2 (Parallel Composition). Given a dataset

𝑋

, a disjoint

partition

𝑋=𝑋1∪𝑋2· ·· ∪ 𝑋𝑘

and a mechanism

that satises

(𝛼, 𝜏)

-RDP. Then the mechanism

M′(𝑋)

=(M(𝑋1), . . . , M(𝑋𝑘))

satises (𝛼, 𝜏 )-RDP.

Fact 2.3 (Seqential Composition). If

and

are

(𝛼, 𝜏1)

RDP and

(𝛼, 𝜏2)

-RDP respectively then the mechanism that releases

(M1(·),M2(·)) is (𝛼, 𝜏1+𝜏2)-RDP.

Fact 2.4 (Post-Processing). If

is an

(𝛼, 𝜏)

-RDP mechanism

and

𝑓

is any function that does not depend on any private data then

𝑓(M(·)) is also (𝛼, 𝜏)-RDP.

Sequential composition tells us that using a mechanism multiple

times on the same data leads to an increase in privacy loss. In the

case of composing

𝑘

Gaussian mechanisms, we must increase the

noise added through 𝜎by the order of √𝑘under RDP.

In practice, we care about obtaining the more meaningful notion

(𝜖, 𝛿)

-DP. When working with RDP we can rely on conversion

lemmas such as those presented in [

] to convert between

(𝛼, 𝜏)

RDP and

(𝜖, 𝛿)

-DP. In our implementations, we use the analytical

moment accountant developed by Wang et al. to provide tight

numerical accounting of the privacy loss under RDP [63].

It is common to x the privacy parameters

(𝜖, 𝛿)

before the

analysis and then minimises

𝜎

over a range of

𝛼

values to obtain the

smallest such noise needed to guarantee the chosen level of privacy.

We use the autodp package

to verify our accounting provides the

correct

(𝜖, 𝛿)

-DP guarantee. An additional benet of working with

RDP is then that our framework easily extends to other mechanisms

that satisfy RDP such as the Skellam mechanism which may be

more suited to distributed settings [2].

2.3 The Federated Model of Computation

Federated Learning (FL) has become a popular paradigm for large-

scale distributed training of machine learning models [

]. In this

work, we consider the horizontal setting, where a set of participants

each hold a local dataset over the same space of

𝑚

features. We

assume that there are

𝑛

data items in total and we consider the

problem of training a dierentially private GBDT model over the

distributed dataset. A powerful tool is secure aggregation, which

allows the computation of a sum without revealing any intermediate

values [

]. Specically, when each participant

𝑃𝑘

has a number

𝑥𝑘∈Z

, secure aggregation computes the result

Í𝑘𝑥𝑘

securely

without any participant directly sharing their 𝑥𝑘.

Our focus is on a framework that combines secure aggregation

with DP to securely and privately train GBDT models. For the

rest of this paper, we present algorithms as if the data were held

centrally, with the understanding that all the operations we use

can be performed in the federated model (with rounding to xed

precision)

. This means that we avoid techniques designed for

central evaluation such as the exponential mechanism [

Threat Model:

In this work, in common with many other works

in the federated setting, we assume an honest-but-curious model,

where the clients do not trust others with their raw data. We study

1https://github.com/yuxiangw/autodp

The rounding introduces a small amount of imprecision in representing values, but

this is overwhelmed by the noise added for privacy.

Table 1: Summary of our Private Federated GBDT Framework

Component Methods Privacy Cost (in terms of 𝜅𝑠, 𝜅𝑤, 𝜅𝑐)

(C1) Split Method •Histogram-based (Hist) (§4.3.1) 𝜅𝑠=𝑇𝑚𝑑

•Partially Random (PR) (§4.3.2) 𝜅𝑠=𝑇𝑚𝑑, does not require construction of a histogram

•Totally Random (TR) (§4.3.2) 𝜅𝑠=0

(C2) Weight Update •Averaging (§4.4.1) If using a Hist or PR 𝜅𝑤=0otherwise 𝜅𝑤=𝑇

•Gradient (§4.4.2)

•Newton (§4.4.3)

(C3) Split Candidate •Uniform, Log (§4.5.1) Data-independent, 𝜅𝑐=0

•Quantiles (non-private) (§4.5.1) N/A

•Iterative Hessian (IH) (§4.5.2) If using Hist, 𝜅𝑐=0. If using TR, with 𝑠rounds of IH, 𝜅𝑐=𝑠𝑚

(A1) Feature Interactions •Cyclical 𝑘-way (§5.1) If using Hist or PR, 𝜅𝑠=𝑇 𝑘𝑑, if 𝑘=1then 𝜅𝑠=𝑇.

•Random 𝑘-way (§5.1) If using TR with IH then 𝜅𝑐=𝑠𝑘

(A2) Batched Updates •𝐵=1(Boosting) (§5.2) Post-processing, no eect on privacy

•𝐵=𝑇(RF-type predictions) (§5.2)

•𝐵=𝑝·𝑇for some 𝑝∈ (0,1)(§5.2)

the aggregating server’s knowledge based on the information gath-

ered from clients. While there is potential for clients to attempt to

disrupt the protocol, we leave the detailed study of more malicious

threat models and model poisoning to future work. In order to com-

bine secure aggregation with DP, we act as if there were a trusted

central server that securely aggregates quantities and adds the re-

quired DP noise before sending the updated (private) model back to

participants (as assumed in [

]). In practice, we can eliminate the

need for a central server by well-established implementations of

secure computation that rely on techniques from secure multi-party

computation, either among a small number of honest-but-curious

servers, or via clients working with small groups of neighbors and

a single untrusted server [

]. Sucient noise for DP guarantees

can be added by honest-but-curious servers, or introduced by each

client adding a small amount of discrete noise, such that the total

noise across clients adds up to the desired volume [

–

3 RELATED WORK

Dierentially private decision trees have been well studied in the

central setting with a strong focus on random forest (RF) models [

]. However, the boosted approach (i.e., private GBDT models)

has been less well-explored. Recently, federated XGBoost models

have been presented, with most works focused on secure training

via cryptographic primitives such as Homomorphic Encryption

(HE) and Secure Multi-Party Computation (MPC) and with no DP

guarantees [17, 18, 24].

Some related works (e.g., [

]) study XGBoost in a federated

setting with local DP (LDP) guarantees. The closest work to ours

in this regard is the FEVERLESS method [

], which translates

the XGBoost algorithm into the vertical federated setting using

secure aggregation and the Gaussian mechanism. In particular,

FEVERLESS securely aggregates gradient information into a private

histogram which is used to compute split scores and leaf weights

(Equations

(4)

and

(5)

). A certain subset of the participants are

chosen as “noise leaders” to add Gaussian noise to their gradients

information before aggregating to achieve an overall DP guarantee

after securely aggregating across all participants. As we will see, the

main disadvantage of directly translating the XGBoost algorithm

in this way is the high privacy cost of repeatedly computing split

scores. This results in having to add more noise into split score/leaf

weight calculations and a lower utility model.

To reduce this privacy cost, one can consider making split de-

cisions independently of the data. These so-called totally random

(TR) trees have been studied in both the non-private and private

settings with random forests [

]. In the private setting, pro-

posed methods often use central DP mechanisms that are hard to

federate [

]. For example, Fletcher and Islam [

] propose a DP-

RF method that utilises the exponential mechanism to output the

majority label in leaf nodes under the notion of smooth sensitivity,

which is unsuited to the federated setting.

In this work, we also consider TR trees as an option under our

framework but for a federated and private GBDT model. To the

best of our knowledge, the only other work that considers private

boosting with random trees is that of Nori et al. [

]. They consider

a central DP setting with a focus on training private explainable

models via Explainable Boosting Machine (EBMs). We compare the

technical dierences in Section 5.1 and empirically in Section 6.6.

4 PRIVATE GBDT FRAMEWORK

In this section, we perform a comprehensive investigation of the

main components needed to train GBDT models in the federated

setting. We propose a framework of methods for training DP-GBDT

models by identifying three main components that require DP noise

and two additional components that interact with these. The full

framework is summarized in Table 1.

We explain the various options in each component and how

they aect privacy guarantees and conclude by instantiating re-

lated work into the framework before empirically evaluating meth-

ods in Section 6. A particular strategy we highlight is replacing

data-dependent choices with random or uniform choices. Although

counter-intuitive, it often holds that the privacy “cost” of tting

the choices to the data is not made up for by the utility gain, and

picking among a set of random options is sucient for good results.

This is evaluated in our experimental study.

Algorithm 1 General GBDT

Input:

Number of trees

𝑇

, maximum depth

𝑑

, number of split

candidates 𝑄, privacy parameters 𝜖, 𝛿

For each feature

𝑗=

, . . . , 𝑚

generate

𝑄

split candidates

𝑆𝑗:={𝑠𝑗

1, . . . , 𝑠 𝑗

𝑄}(C3)

2: Initialise the forest T ← ∅

3: for 𝑡=1, . . . , 𝑇 do

For each

(𝑥𝑖, 𝑦𝑖) ∈ 𝐷

compute the required gradient

information (𝑔𝑖, ℎ𝑖)based on ˆ

𝑦(𝑡−1)

𝑖(C2)

Choose a subset of features

𝐹(𝑡)⊆ {

, . . . , 𝑚}

with

|𝐹(𝑡)|=𝑘for the current tree 𝑓𝑡(A1)

6: while depth of the current node (in 𝑓𝑡) is ≤𝑑do

Choose a feature split candidate pair

(𝑗, 𝑠 𝑗

𝑞)

from

𝐹(𝑡)

(C1)

Split the current node with observations

𝐼

into two

child nodes with index sets

𝐼≤={𝑖

𝑥𝑖 𝑗 ≤𝑠𝑗

𝑞}

and

𝐼>=𝐼\𝐼≤

Repeat (6)-(9) recursing separately on the child nodes

10:

For each leaf

𝑙

calculate a weight

𝑤(𝑡)

𝑙

from the examples

in the leaf according to the chosen update method (C2)

11: Update predictions ˆ

𝑦(𝑡)

𝑖or batch updates (A2)

12: Add the 𝑡th tree 𝑓𝑡to the ensemble, T=T ∪ {𝑓𝑡}

13: return the trained forest T

For simplicity, we assume that each participant holds a single

data item

(𝒙𝑖, 𝑦𝑖)

with

𝑛

participants (data items) in total. We ad-

ditionally assume that we have (publicly) known bounds on each

feature. All of these assumptions can be easily removed, potentially

with some additional privacy cost. Table 7 in the appendix displays

commonly used notation for convenience.

4.1 A General Recipe

In order to train the GBDT algorithm outlined in Section 2.1 we only

need to specify a few core choices: How to pick split candidates

(for discretizing continuous features), calculate the split scores at

each internal node, and compute the leaf weights for prediction.

One can note from Equations

(4)

and

(5)

that the leaf weights and

split scores only depend on the sum of gradients and Hessians at

an internal or leaf node of a decision tree. It is therefore natural to

utilise secure aggregation as a tool to federate the GBDT algorithm.

In Algorithm 1 we present the general GBDT algorithm assuming

these quantities can be gathered. Looking closely at Algorithm 1,

the only time we need to directly query participants’ data is when

we compute the three quantities just mentioned.

Based on this we divide the general algorithm into 3 core com-

ponents that require some form of DP noise: Split Methods

(C1)

Weight Updates

(C2)

, and Split Candidates

(C3)

. These are the core

components required for training a GBDT model. We also consider

two additional aspects to specify when training a GBDT model: Fea-

ture Interactions

(A1)

and Batched Updates

(A2)

. These are aspects

that interact with the core components but do not require any addi-

tional noise. To reason about the privacy guarantees of our GBDT

framework, we introduce some variables to count the number of

queries needed when training a GBDT model with

𝑇

trees. Let

𝜅𝑐

denote the number of queries needed to calculate split candidates;

𝜅𝑠

for the queries needed to calculate inner node splits; and

𝜅𝑤

for the queries to calculate leaf weights. Counting the number of

queries needed for each component is enough to give a privacy

guarantee for Algorithm 1.

Theorem 4.1. Suppose that each mechanism for the framework

components satises

(𝛼, 𝜏𝑐),(𝛼, 𝜏𝑠),(𝛼, 𝜏𝑤)

-RDP respectively. Then

the GBDT algorithm satises

(𝛼, 𝜏 )

-RDP with

𝜏=𝜅𝑐𝜏𝑐+𝜅𝑠𝜏𝑠+𝜅𝑤𝜏𝑤

The above simply follows from the sequential composition prop-

erties of RDP. In our experimental study we utilise the Gaussian

mechanism for each core component, hence

𝜏=(𝜅𝑐+𝜅𝑠+𝜅𝑤)𝛼

2𝜎2

and so

𝜎=𝑂𝜖(√𝜅𝑐+𝜅𝑠+𝜅𝑤)

. This shows that if we can minimise

the number of queries that each main component requires, then we

reduce the amount of noise we add to the learning process while

still maintaining privacy. Various methods aect the privacy cost

in dierent ways. The privacy implications for dierent choices in

terms of 𝜅𝑐, 𝜅𝑠, 𝜅𝑤are shown in Table 1.

4.2 Federating GBDTs

At the start of building the

𝑡

-th tree, each participant calculates

gradient information

(𝑔(𝑡)

𝑖, ℎ(𝑡)

𝑖)

for their examples. Throughout

the training of a single tree, to calculate the desired components we

can rely on querying data in the form

𝑞(𝐼)=Í𝑖∈𝐼𝑔(𝑡)

𝑖,Í𝑖∈𝐼ℎ(𝑡)

𝑖

over some set of observations

𝐼

, e.g., all observations in a specic

tree node. To do this securely, we can apply secure aggregation to

aggregate gradient information at the various stages that require it

in Algorithm 1 (C1- C3).

In order to apply the Gaussian mechanism, we must bound the

sensitivity of such a query function. In this case, we need bounds

on the gradient quantities

𝑔(𝑡)

𝑖, ℎ(𝑡)

𝑖

. Our focus in this paper is

on binary-classication problems. In binary-classication our loss

function is of the form

ℓ(𝑦𝑖,ˆ

𝑦𝑖)=𝑦𝑖log(ˆ

𝑦𝑖)+(

−𝑦𝑖)log(

−

𝑦𝑖)

(i.e., binary cross-entropy) and has gradients

𝑔𝑖∈ [−

]

and

Hessians

ℎ𝑖∈ [

. Hence the sensitivity of aggregating gradient

information is

Δ2(𝑞)=1+1

16 =√17

.If the chosen loss function

has unbounded gradient information (e.g., regression problems) we

can employ gradient clipping (similarly to DP-SGD) to obtain a

bounded sensitivity [1].

The computational and communication costs of these steps are

low. Decision tree-based methods are often preferred for their ease

of construction, and this translates to the federated setting: each

client computes its local updates (e.g., gradients and Hessians) and

shares these through secure aggregation. The communication costs

are linear in the size of the updates computed, which are fairly low

dimensional: we quantify this in the subsequent sections.

4.3 Component 1: Split Methods

4.3.1 Greedy Approach: Histogram-Based. As described in Section

2.1, the standard GBDT algorithm will calculate

𝑄

split-scores for

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

FederatedBoostedDecisionTreeswithDifferentialPrivacy∗SamuelMaddock†UniversityofWarwickGrahamCormodeMetaAITianhaoWang‡UniversityofVirginiaCarstenMapleUniversityofWarwickSomeshJha‡UniversityofWisconsin-MadisonABSTRACTThereisgreatdemandforscalable,secure,andefficientprivacy-preservingmachinelearningmod...

展开>> 收起<<

Federated Boosted Decision Trees with Differential Privacy_2.pdf

共23页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Federated Boosted Decision Trees with Differential Privacy_2

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: