Performances of Symmetric Loss for Private Data from Exponential Mechanism Jing Bi Vorapong Suppakitpaisarn

2025-05-02 0 0 584.49KB 12 页 10玖币

侵权投诉

Performances of Symmetric Loss for

Private Data from Exponential Mechanism

Jing Bi, Vorapong Suppakitpaisarn

Graduate School of Information Science and Technology

The University of Tokyo

Abstract—This study explores the robustness of learning

by symmetric loss on private data. Speciﬁcally, we leverage

exponential mechanism (EM) on private labels. First, we

theoretically re-discussed properties of EM when it is used for

private learning with symmetric loss. Then, we propose numerical

guidance of privacy budgets corresponding to different data

scales and utility guarantees. Further, we conducted experiments

on the CIFAR-10 dataset to present the traits of symmetric loss.

Since EM is a more generic differential privacy (DP) technique,

it being robust has the potential for it to be generalized, and to

make other DP techniques more robust.

Index Terms—Machine learning, Symmetric loss, Differential

privacy, Label differential privacy, Exponential mechanism

I. INTRODUCTION

Individual information privacy has become a severe issue

with the development of data analysis techniques. Risk

of exposing sensitive data would discourage respondents

to cooperate honestly with our caring investigations. Even

worse, it can also discourage the participation of respondents.

Therefore, many researchers propose to corrupt a part of data

before publication. Data corruption and publication techniques

that reveal the maximum possible amount of knowledge, with

minimum possible risk of exposure of sensitive data, are

imperative for data analysis.

There are several notions proposed for private data

publications such as k-anonymity [1] or `-diversity [2]. One

of the most commonly-known notions is DP [3]. Many

researchers are interested in DP because it can precisely

quantify information leakage, usually called privacy budget,

in each publication.

Several researches in DP aim to ﬁnd the most precise

publication under a given privacy budget. Many, such as [4],

aim to publish a precise machine learning (ML) output under

DP. However, none of them consider relationship between the

precision and loss function, which is one of the most important

components in several ML approaches.

A. Our Contributions

In this study, we consider a type of loss function called

symmetric loss. Symmetric loss, and, in particular, barrier

hinge loss, are proven to be robust against corrupted data [5].

As we corrupt data for privacy in DP, we believe that the loss

is suitable for DP. Indeed, the corrupted model considered in

[5] is identical to the corruption in randomized response (RR)

[6], [7]––which is one of the publication techniques under DP.

We can immediately apply the results in [5] to RR.

We question if symmetric losses can also be robust for other

publication techniques. In this work, we consider publications

under EM [8], [9]. While RR requires that all data records must

be independently sampled, we do not need that assumption

in EM. Exponential mechanism is then more applicable than

RR, and thus, is worth considering 1. Our contributions can

be summarized as follows:

1) Theoretical Results: We presented and proved a series

of properties between data scale and the expected utility

range while implementing EM. Further, we provide

numerical guidance of privacy budgets for EM. In

particular, we provide theoretical evidence that an

accurate learning output can be obtained with almost

certain probability using EM and symmetric loss.

2) Experimental Results: We investigated the robustness

of barrier hinge loss on an EM processed CIFAR-10

dataset. The experiment results have been presented.

Even with a small privacy budget = 0.5, the output

accuracy could reach a level of 80%. The accuracy is

larger for larger . In this sense, we are allowed to use

highly protected data.

3) We corrected the mistake in the formula for

implementing EM in [9].

B. Related Works

Current DP research includes two major directions. The ﬁrst

direction is concerned with how to quantify privacy distances.

The original deﬁnition of DP is sometimes too trivial for

different privacy demands. Therefore, providing a wide range

of deﬁnitions, such as relaxed (, δ)-DP [10], R´

enyi-DP [11],

and so forth––provides higher dimensions for researchers to

demonstrate inspirations for new mechanisms.

The second direction is to provide new mechanisms, several

of which are for ML applications. As ML and DP are both

data based, applying ML approaches in DP mechanisms has

become popular. Speciﬁcally, as label-DP regime [4] naturally

ﬁts the ML methods, rolling-up ML approaches for better

utility DP frameworks is simple, such as in [4], [12]. In [4],

many aggregated ML-DP frameworks are presented.

Our discussion is different from the above two directions

as follows. Firstly, this work is still based on the traditional

DP deﬁnition. Therefore, this research can easily be extended

1Unfortunately, the robustness proof in [5] assumes that the records are

independently sampled. We are considering a robustness proof that does not

require the independence.

arXiv:2210.04132v1 [cs.CR] 9 Oct 2022

to other privacy quantiﬁcations. Further, we do not focus on

aggregating multiple methods to make new DP frameworks.

We only present performances of loss functions on EM

processed data. Unlike the examples in [4] that purely

pursue better performance with unsupervised or ensemble

learning pre-processing, our experiments still demonstrate the

insufﬁciency of only depending on loss functions––especially

while the privacy budget is small. However, as loss functions

are fundamental elements in ML, testing them can provide

a reference of modiﬁcations for a series of ML-DP hybrid

frameworks.

C. Paper Organization

The rest of the paper is organized as follows. In section

II, we present an introduction to symmetric loss with its

properties, and EM with its implementation. In section III, we

discuss some theoretical properties of EM. In section IV, we

show the experiment results on the CIFAR-10 dataset, which

shows the robustness of barrier hinge loss on EM.

According to PDAA regular paper format, proofs of

properties in section III are moved to Appendix, which is not

included in the publication. One can ﬁnd the full version on

the ArXiv.

II. PRELIMINARIES

A. Symmetric Loss in Machine Learning

Any given data can be represented by a series of real values,

a(d+k)-tuple (x∈Rd,y∈Rk). In ML, xare called features,

and yare called labels. Our dataset Dis a collection of those

tuples; i.e., D:= hxi,yiin

i=1 is a dataset with size n. It is

not necessary to have all meaningful values for all entries in a

tuple in a dataset D. For example, we may have 0’s or wrong

values for some entries. Normally, features are treated as fully

and correctly known elements; otherwise, we can move the

unclear entry or entries to the label side.

Machine learning is meant to ﬁnd the internal relationship

between features and labels in a set, and make predictions for

prospective data. Traditionally, if all y’s are correctly valued,

the process is called supervised learning; if they are partially

unclear, the process is called semi-supervised learning; if

all labels are unknown, the process is called unsupervised

learning. Formally, a function f:Rd→Rk, which can

describe this relation as best as possible for the whole dataset

is the learning goal (i.e., |f(x)−y|= 0 for all (x,y)’s is

desired).

In this work, we consider binary classiﬁcation problems,

which is the task when y∈ {1,−1}. We call (x,+1) and

(x,−1) as positive and negative data, respectively. We also

deﬁne DP={(x,y)∈D:y= 1}and DN={(x,y)∈D:

y=−1}.

The function fis controlled by parameters and hyper-

parameters. The learning process is to optimize these

parameters through given objectives. An intuitive objective,

for example, can be Bayes risk R(f) = E[l(f(x),y)], where

lsatisfying the deﬁnition of the distance function is called the

loss function. Since it is impossible to know and calculate all

possible or potential data in the same problem, we normally

restrict the expectations to certain sizes, which are named

empirical risks, in the form of 1

|D|P(x,y)∈Dl(f(x),y).

Trivial empirical risk unbiasedly weighs the losses from

every data point, which outputs functions that suffer from

disparate data treats. Manipulating it is necessary and,

and different predictors can clearly be derived. In binary

classiﬁcation problems, to fairly evaluate the errors from

positive and negative labeled data, one would leverage the

receiver operating characteristic curve (AUC) [13], or the

balanced error rate (BER) [14], [15]. AUC and BER are

deﬁned in the following deﬁnitions.

Deﬁnition 1. Empirical AUC-risk is deﬁned as:

RAUC =EDP[EDN[l(f(xP)−f(xN)]]

=P(xP,1)∈DPP(xN,−1)∈DNl(f(xP)−f(xN))

|DP||DN|

Deﬁnition 2. Empirical BER-risk is deﬁned as:

RBER =1

2[EDP[l(yf(x))] + EDN[l(yf(x))]]

2

X

(x,1)∈DP

l(f(x)) + X

(x,−1)∈DN

l(−f(x))



The restrictions to certain datasets still carry in distances

between empirical minimizers and Bayes minimizers. Let f∗

be the optimal function that describes the relation of the

problem, and Fis a collection of functions restricted to

D. Such distances are described in a Bayes sense R(f∗)−

inff∈F R(f), named excess risks [16]. Classical excess risks

are deﬁned on a 0-1 loss function, which is the function

l01 such that l01(x)=1for y≤0, and l01(x)=0

otherwise. However, because of the non-differentiability and

non-convexity of a 0-1 loss at the crucial point 0, which

harms the learning ﬂuency and cost, we normally adopt other

(surrogate) losses instead––like a logic sigmoid function, or

hinge functions, and so forth.

The unavoidable issue is how to guarantee the minimizer

of surrogate excess risk to coincide with the minimizer of 0-1

excess risk. This fundamental property is called classiﬁcation-

calibration, and has been proven to be satisﬁed for most

commonly used surrogate losses (e.g., exponential, quadratic,

hinge, and sigmoid [16]). [5] has extended this property to

also be satisﬁed by symmetric functions, which are function

f’s such that there is Cwhere f(x) + f(−x) = Cfor all x.

(Therefore, the function is symmetric about the intersection

point with y-axis, not symmetric about y-axis.) Despite this

merit, normal symmetric functions cannot simultaneously

satisfy non-negative values, and convexity [17], [18] (except

for the horizontal lines). To cope with the issue, [5] proposes

the use of a restricted symmetric loss, called a barrier hinge

function, which can be deﬁned as follows:

Deﬁnition 3 ( [5]).A barrier hinge function with parameter

b, r can be deﬁned as follows:

fb,r(x) = max(−b(r+x) + r, max(b(x−r), r −x)).

The barrier hinge function is symmetric on x∈[−r, r],

when b > 1and r > 0. Therefore, it satisﬁes classiﬁcation-

calibration for the sensitive interval [−r, r], and satisﬁes the

non-negativity and the convexity conditions at the same time.

In our experiments, we leverage both AUC and BER

risks as complements to each other to evaluate the overall

performances of several surrogate losses. Further, we also

show the superiority of the barrier hinge loss over other losses

on mild privacy cost.

B. Exponential Mechanism

Differential privacy [3] aims to protect sensitive information

of each individual with minimum harm to the utilities (learning

results). The strategy of adding noise is to obfuscate every

possible dataset Dwith its most likely set of datasets

{D0|D0∼D}with a distribution corresponding to the

similarity (e.g., Hamming distance as in the following).

Formally, we say a method or a randomized algorithm A

is -differential privacy preservative [3] if, for all possible

measurements M,

Pr [A(D)∈ M]

Pr [A(D0)∈ M]∈[exp(−),exp()],

for any given privacy budget .

Label DP [4] is a relaxation version of DP, which constrains

the private values to only lie on a subset of columns in

an information table. Since the result of this relaxation

ﬁts the ML regime naturally, without any extra work, ML

models could be interspersed. [4] demonstrated their frames of

leveraging unsupervised and semi-supervised learning models

in improving utilities of private data.

One of most classical and wildly used DP methods is RR,

introduced in [6]. By simply randomly ﬂipping the data with

a ﬁxed probability, we are able to protect individual privacy,

and infer the classiﬁcation ratio of the sensitive information.

However, this classical method requires the plausibly deniable

data to be mutually independent. Otherwise, ﬂipping a part

of the dependent data would not be sufﬁcient. Unfortunately,

several existing datasets are not entirely independent––thus,

RR would not be efﬁcient because of this. Instead, researches

prefer to adopt EM, an upgraded version from Laplace

mechanism [10].

Deﬁnition 4 (Exponential Mechanism (EM) [8]).Given

any real valued function q:D → R, with ∆q:=

maxD∼D0|q(D)−q(D0)|. The EM can pose any possible

dataset D00 ∈ D with the probability P(D00)∝

exp(q(D00 ,D∗)

2∆q).

Proposition 1. Exponential mechanism is -differentially

private.

Deﬁning the real valued function qis the essential step [8],

which provides the original power of this mechanism. The

global sensitivity ∆qof the maximum distance of adjacent

datasets is used for binding the range of the distribution.

One common approach in deﬁning qis based on Hamming

distance [19]: Let δ(D=hxi, yiin

i=1, D0=hxi, y0

iin

i=1) :=

|{i|yi6=y0

i}|. We deﬁne the Hamming distance qas

q(D, D∗) := |DP∩D∗

P|+|DP∪D∗

P|=|D∗|−δ(D, D∗)[9].

The implementation of EM contains the calculations of the

probability distribution over D:= {D0=hxi, y0

iin

i=1 :

hyiin

i=1 ∈ {−1,+1}n}. A straightforward implementation

would cost O(n2)time and space. Redundant exponential

function calculations carry an extra burden. Therefore, we

apply the two-step approach [9]. Firstly, we compute the

distribution for each possible score by logarithmic recursion.

Please note that there is a misprint in [9] where they drop the

term /2there.

log Pr(q=i) = 









−|D∗|log(1 + exp( 

2∆ )) when i= 0

log(|D∗| − i+ 1) −log i

+

2+ log Pr(q=i−1) when i > 0.

Then, we uniform-randomly select one table within the same

score group, i.e., from the selected value of q, and select

Dfrom the set {D0∈ D :q(D0, D∗) = q}with uniform

probability. This two-step method cuts down the time and

space complexity to a linear size. One noticeable point is,

by this deﬁnition of q, EM will degrade to RR when the data

size goes to inﬁnity. Despite this odd fact, this degradation

would not happen in real situations, but is still worth our

study. Therefore, it is an open problem to theoretically prove

the relation between EM and RR.

III. PROPERTIES IN EXPONENTIAL MECHANISM

There is however one issue of the EM for binary

classiﬁcation using semi-supervised learning. The table, which

we use for the classiﬁcation, might be heavily ﬂipped such that

the learning algorithms cannot correctly learn the function.

When we use symmetric loss, it is known that a learning

algorithm can still provide a good result if the ﬂipping rate

is no more than half of the dataset––i.e., q≥ |D|/2[5].

In this section, we consider the probability that q≥ |D|/2

for each dataset size n=|D|, and each privacy budget .

Speciﬁcally, we consider Pi≥|D|/2Pr[q=i]. We conduct

theoretical analyses, then give numerical guidance for real

implementations.

In the following subsections, we only give out the short

and straight proofs for corollaries. The important proofs of

properties are demonstrated in the appendices.

A. Success probability and the dataset size

In this subsection, we study the relationship between

Pi≥n/2Pr[q=i], and the size of dataset n. We begin

our illustration by proving generic properties of a binomial

distribution.

Deﬁnition 5. A truncated binomial distribution Swith ntrials

is the partial sum of the binomial distribution restricted to

some certain number of events. For example, between jand

S(n, j, k) = Pr [j≤In,p ≤k] =

i=jn

ipi(1 −p)n−i,

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

PerformancesofSymmetricLossforPrivateDatafromExponentialMechanismJingBi,VorapongSuppakitpaisarnGraduateSchoolofInformationScienceandTechnologyTheUniversityofTokyoAbstractThisstudyexplorestherobustnessoflearningbysymmetriclossonprivatedata.Specically,weleverageexponentialmechanism(EM)onprivatelabel...

展开>> 收起<<

Performances of Symmetric Loss for Private Data from Exponential Mechanism Jing Bi Vorapong Suppakitpaisarn.pdf

共12页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Performances of Symmetric Loss for Private Data from Exponential Mechanism Jing Bi Vorapong Suppakitpaisarn

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: