Performances of Symmetric Loss for Private Data from Exponential Mechanism Jing Bi Vorapong Suppakitpaisarn

2025-05-02 0 0 584.49KB 12 页 10玖币
侵权投诉
Performances of Symmetric Loss for
Private Data from Exponential Mechanism
Jing Bi, Vorapong Suppakitpaisarn
Graduate School of Information Science and Technology
The University of Tokyo
Abstract—This study explores the robustness of learning
by symmetric loss on private data. Specifically, we leverage
exponential mechanism (EM) on private labels. First, we
theoretically re-discussed properties of EM when it is used for
private learning with symmetric loss. Then, we propose numerical
guidance of privacy budgets corresponding to different data
scales and utility guarantees. Further, we conducted experiments
on the CIFAR-10 dataset to present the traits of symmetric loss.
Since EM is a more generic differential privacy (DP) technique,
it being robust has the potential for it to be generalized, and to
make other DP techniques more robust.
Index Terms—Machine learning, Symmetric loss, Differential
privacy, Label differential privacy, Exponential mechanism
I. INTRODUCTION
Individual information privacy has become a severe issue
with the development of data analysis techniques. Risk
of exposing sensitive data would discourage respondents
to cooperate honestly with our caring investigations. Even
worse, it can also discourage the participation of respondents.
Therefore, many researchers propose to corrupt a part of data
before publication. Data corruption and publication techniques
that reveal the maximum possible amount of knowledge, with
minimum possible risk of exposure of sensitive data, are
imperative for data analysis.
There are several notions proposed for private data
publications such as k-anonymity [1] or `-diversity [2]. One
of the most commonly-known notions is DP [3]. Many
researchers are interested in DP because it can precisely
quantify information leakage, usually called privacy budget,
in each publication.
Several researches in DP aim to find the most precise
publication under a given privacy budget. Many, such as [4],
aim to publish a precise machine learning (ML) output under
DP. However, none of them consider relationship between the
precision and loss function, which is one of the most important
components in several ML approaches.
A. Our Contributions
In this study, we consider a type of loss function called
symmetric loss. Symmetric loss, and, in particular, barrier
hinge loss, are proven to be robust against corrupted data [5].
As we corrupt data for privacy in DP, we believe that the loss
is suitable for DP. Indeed, the corrupted model considered in
[5] is identical to the corruption in randomized response (RR)
[6], [7]––which is one of the publication techniques under DP.
We can immediately apply the results in [5] to RR.
We question if symmetric losses can also be robust for other
publication techniques. In this work, we consider publications
under EM [8], [9]. While RR requires that all data records must
be independently sampled, we do not need that assumption
in EM. Exponential mechanism is then more applicable than
RR, and thus, is worth considering 1. Our contributions can
be summarized as follows:
1) Theoretical Results: We presented and proved a series
of properties between data scale and the expected utility
range while implementing EM. Further, we provide
numerical guidance of privacy budgets for EM. In
particular, we provide theoretical evidence that an
accurate learning output can be obtained with almost
certain probability using EM and symmetric loss.
2) Experimental Results: We investigated the robustness
of barrier hinge loss on an EM processed CIFAR-10
dataset. The experiment results have been presented.
Even with a small privacy budget = 0.5, the output
accuracy could reach a level of 80%. The accuracy is
larger for larger . In this sense, we are allowed to use
highly protected data.
3) We corrected the mistake in the formula for
implementing EM in [9].
B. Related Works
Current DP research includes two major directions. The first
direction is concerned with how to quantify privacy distances.
The original definition of DP is sometimes too trivial for
different privacy demands. Therefore, providing a wide range
of definitions, such as relaxed (, δ)-DP [10], R´
enyi-DP [11],
and so forth––provides higher dimensions for researchers to
demonstrate inspirations for new mechanisms.
The second direction is to provide new mechanisms, several
of which are for ML applications. As ML and DP are both
data based, applying ML approaches in DP mechanisms has
become popular. Specifically, as label-DP regime [4] naturally
fits the ML methods, rolling-up ML approaches for better
utility DP frameworks is simple, such as in [4], [12]. In [4],
many aggregated ML-DP frameworks are presented.
Our discussion is different from the above two directions
as follows. Firstly, this work is still based on the traditional
DP definition. Therefore, this research can easily be extended
1Unfortunately, the robustness proof in [5] assumes that the records are
independently sampled. We are considering a robustness proof that does not
require the independence.
1
arXiv:2210.04132v1 [cs.CR] 9 Oct 2022
to other privacy quantifications. Further, we do not focus on
aggregating multiple methods to make new DP frameworks.
We only present performances of loss functions on EM
processed data. Unlike the examples in [4] that purely
pursue better performance with unsupervised or ensemble
learning pre-processing, our experiments still demonstrate the
insufficiency of only depending on loss functions––especially
while the privacy budget is small. However, as loss functions
are fundamental elements in ML, testing them can provide
a reference of modifications for a series of ML-DP hybrid
frameworks.
C. Paper Organization
The rest of the paper is organized as follows. In section
II, we present an introduction to symmetric loss with its
properties, and EM with its implementation. In section III, we
discuss some theoretical properties of EM. In section IV, we
show the experiment results on the CIFAR-10 dataset, which
shows the robustness of barrier hinge loss on EM.
According to PDAA regular paper format, proofs of
properties in section III are moved to Appendix, which is not
included in the publication. One can find the full version on
the ArXiv.
II. PRELIMINARIES
A. Symmetric Loss in Machine Learning
Any given data can be represented by a series of real values,
a(d+k)-tuple (xRd,yRk). In ML, xare called features,
and yare called labels. Our dataset Dis a collection of those
tuples; i.e., D:= hxi,yiin
i=1 is a dataset with size n. It is
not necessary to have all meaningful values for all entries in a
tuple in a dataset D. For example, we may have 0s or wrong
values for some entries. Normally, features are treated as fully
and correctly known elements; otherwise, we can move the
unclear entry or entries to the label side.
Machine learning is meant to find the internal relationship
between features and labels in a set, and make predictions for
prospective data. Traditionally, if all ys are correctly valued,
the process is called supervised learning; if they are partially
unclear, the process is called semi-supervised learning; if
all labels are unknown, the process is called unsupervised
learning. Formally, a function f:RdRk, which can
describe this relation as best as possible for the whole dataset
is the learning goal (i.e., |f(x)y|= 0 for all (x,y)s is
desired).
In this work, we consider binary classification problems,
which is the task when y∈ {1,1}. We call (x,+1) and
(x,1) as positive and negative data, respectively. We also
define DP={(x,y)D:y= 1}and DN={(x,y)D:
y=1}.
The function fis controlled by parameters and hyper-
parameters. The learning process is to optimize these
parameters through given objectives. An intuitive objective,
for example, can be Bayes risk R(f) = E[l(f(x),y)], where
lsatisfying the definition of the distance function is called the
loss function. Since it is impossible to know and calculate all
possible or potential data in the same problem, we normally
restrict the expectations to certain sizes, which are named
empirical risks, in the form of 1
|D|P(x,y)Dl(f(x),y).
Trivial empirical risk unbiasedly weighs the losses from
every data point, which outputs functions that suffer from
disparate data treats. Manipulating it is necessary and,
and different predictors can clearly be derived. In binary
classification problems, to fairly evaluate the errors from
positive and negative labeled data, one would leverage the
receiver operating characteristic curve (AUC) [13], or the
balanced error rate (BER) [14], [15]. AUC and BER are
defined in the following definitions.
Definition 1. Empirical AUC-risk is defined as:
RAUC =EDP[EDN[l(f(xP)f(xN)]]
=P(xP,1)DPP(xN,1)DNl(f(xP)f(xN))
|DP||DN|
Definition 2. Empirical BER-risk is defined as:
RBER =1
2[EDP[l(yf(x))] + EDN[l(yf(x))]]
=1
2
X
(x,1)DP
l(f(x)) + X
(x,1)DN
l(f(x))
The restrictions to certain datasets still carry in distances
between empirical minimizers and Bayes minimizers. Let f
be the optimal function that describes the relation of the
problem, and Fis a collection of functions restricted to
D. Such distances are described in a Bayes sense R(f)
inff∈F R(f), named excess risks [16]. Classical excess risks
are defined on a 0-1 loss function, which is the function
l01 such that l01(x)=1for y0, and l01(x)=0
otherwise. However, because of the non-differentiability and
non-convexity of a 0-1 loss at the crucial point 0, which
harms the learning fluency and cost, we normally adopt other
(surrogate) losses instead––like a logic sigmoid function, or
hinge functions, and so forth.
The unavoidable issue is how to guarantee the minimizer
of surrogate excess risk to coincide with the minimizer of 0-1
excess risk. This fundamental property is called classification-
calibration, and has been proven to be satisfied for most
commonly used surrogate losses (e.g., exponential, quadratic,
hinge, and sigmoid [16]). [5] has extended this property to
also be satisfied by symmetric functions, which are function
fs such that there is Cwhere f(x) + f(x) = Cfor all x.
(Therefore, the function is symmetric about the intersection
point with y-axis, not symmetric about y-axis.) Despite this
merit, normal symmetric functions cannot simultaneously
satisfy non-negative values, and convexity [17], [18] (except
for the horizontal lines). To cope with the issue, [5] proposes
the use of a restricted symmetric loss, called a barrier hinge
function, which can be defined as follows:
Definition 3 ( [5]).A barrier hinge function with parameter
b, r can be defined as follows:
fb,r(x) = max(b(r+x) + r, max(b(xr), r x)).
2
The barrier hinge function is symmetric on x[r, r],
when b > 1and r > 0. Therefore, it satisfies classification-
calibration for the sensitive interval [r, r], and satisfies the
non-negativity and the convexity conditions at the same time.
In our experiments, we leverage both AUC and BER
risks as complements to each other to evaluate the overall
performances of several surrogate losses. Further, we also
show the superiority of the barrier hinge loss over other losses
on mild privacy cost.
B. Exponential Mechanism
Differential privacy [3] aims to protect sensitive information
of each individual with minimum harm to the utilities (learning
results). The strategy of adding noise is to obfuscate every
possible dataset Dwith its most likely set of datasets
{D0|D0D}with a distribution corresponding to the
similarity (e.g., Hamming distance as in the following).
Formally, we say a method or a randomized algorithm A
is -differential privacy preservative [3] if, for all possible
measurements M,
Pr [A(D)∈ M]
Pr [A(D0)∈ M][exp(),exp()],
for any given privacy budget .
Label DP [4] is a relaxation version of DP, which constrains
the private values to only lie on a subset of columns in
an information table. Since the result of this relaxation
fits the ML regime naturally, without any extra work, ML
models could be interspersed. [4] demonstrated their frames of
leveraging unsupervised and semi-supervised learning models
in improving utilities of private data.
One of most classical and wildly used DP methods is RR,
introduced in [6]. By simply randomly flipping the data with
a fixed probability, we are able to protect individual privacy,
and infer the classification ratio of the sensitive information.
However, this classical method requires the plausibly deniable
data to be mutually independent. Otherwise, flipping a part
of the dependent data would not be sufficient. Unfortunately,
several existing datasets are not entirely independent––thus,
RR would not be efficient because of this. Instead, researches
prefer to adopt EM, an upgraded version from Laplace
mechanism [10].
Definition 4 (Exponential Mechanism (EM) [8]).Given
any real valued function q:D R, with q:=
maxDD0|q(D)q(D0)|. The EM can pose any possible
dataset D00 D with the probability P(D00)
exp(q(D00 ,D)
2∆q).
Proposition 1. Exponential mechanism is -differentially
private.
Defining the real valued function qis the essential step [8],
which provides the original power of this mechanism. The
global sensitivity qof the maximum distance of adjacent
datasets is used for binding the range of the distribution.
One common approach in defining qis based on Hamming
distance [19]: Let δ(D=hxi, yiin
i=1, D0=hxi, y0
iin
i=1) :=
|{i|yi6=y0
i}|. We define the Hamming distance qas
q(D, D) := |DPD
P|+|DPD
P|=|D|δ(D, D)[9].
The implementation of EM contains the calculations of the
probability distribution over D:= {D0=hxi, y0
iin
i=1 :
hyiin
i=1 ∈ {−1,+1}n}. A straightforward implementation
would cost O(n2)time and space. Redundant exponential
function calculations carry an extra burden. Therefore, we
apply the two-step approach [9]. Firstly, we compute the
distribution for each possible score by logarithmic recursion.
Please note that there is a misprint in [9] where they drop the
term /2there.
log Pr(q=i) =
−|D|log(1 + exp(
2∆ )) when i= 0
log(|D| − i+ 1) log i
+
2+ log Pr(q=i1) when i > 0.
Then, we uniform-randomly select one table within the same
score group, i.e., from the selected value of q, and select
Dfrom the set {D0∈ D :q(D0, D) = q}with uniform
probability. This two-step method cuts down the time and
space complexity to a linear size. One noticeable point is,
by this definition of q, EM will degrade to RR when the data
size goes to infinity. Despite this odd fact, this degradation
would not happen in real situations, but is still worth our
study. Therefore, it is an open problem to theoretically prove
the relation between EM and RR.
III. PROPERTIES IN EXPONENTIAL MECHANISM
There is however one issue of the EM for binary
classification using semi-supervised learning. The table, which
we use for the classification, might be heavily flipped such that
the learning algorithms cannot correctly learn the function.
When we use symmetric loss, it is known that a learning
algorithm can still provide a good result if the flipping rate
is no more than half of the dataset––i.e., q≥ |D|/2[5].
In this section, we consider the probability that q≥ |D|/2
for each dataset size n=|D|, and each privacy budget .
Specifically, we consider Pi≥|D|/2Pr[q=i]. We conduct
theoretical analyses, then give numerical guidance for real
implementations.
In the following subsections, we only give out the short
and straight proofs for corollaries. The important proofs of
properties are demonstrated in the appendices.
A. Success probability and the dataset size
In this subsection, we study the relationship between
Pin/2Pr[q=i], and the size of dataset n. We begin
our illustration by proving generic properties of a binomial
distribution.
Definition 5. A truncated binomial distribution Swith ntrials
is the partial sum of the binomial distribution restricted to
some certain number of events. For example, between jand
k:
S(n, j, k) = Pr [jIn,p k] =
k
X
i=jn
ipi(1 p)ni,
3
摘要:

PerformancesofSymmetricLossforPrivateDatafromExponentialMechanismJingBi,VorapongSuppakitpaisarnGraduateSchoolofInformationScienceandTechnologyTheUniversityofTokyoAbstract—Thisstudyexplorestherobustnessoflearningbysymmetriclossonprivatedata.Specically,weleverageexponentialmechanism(EM)onprivatelabel...

展开>> 收起<<
Performances of Symmetric Loss for Private Data from Exponential Mechanism Jing Bi Vorapong Suppakitpaisarn.pdf

共12页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:12 页 大小:584.49KB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 12
客服
关注