I Prefer Not To Say Protecting User Consent in Models with Optional Personal Data Tobias Leemann1 2 Martin Pawelczyk3 Christian Thomas Eberle1 Gjergji Kasneci2

2025-05-08 0 0 1.01MB 31 页 10玖币

侵权投诉

I Prefer Not To Say: Protecting User Consent in Models with

Optional Personal Data

Tobias Leemann1, 2, Martin Pawelczyk3, Christian Thomas Eberle1, Gjergji Kasneci2

1University of T¨

ubingen, T¨

ubingen, Germany

2Technical University of Munich, Munich, Germany

3Harvard University, Cambridge, MA, USA

tobias.leemann@uni-tuebingen.de, martin.pawelczyk.1@gmail.com, ct.eberle@protonmail.ch, gjergji.kasneci@tum.de

Abstract

We examine machine learning models in a setup where in-

dividuals have the choice to share optional personal infor-

mation with a decision-making system, as seen in modern

insurance pricing models. Some users consent to their data

being used whereas others object and keep their data undis-

closed. In this work, we show that the decision not to share

data can be considered as information in itself that should

be protected to respect users’ privacy. This observation raises

the overlooked problem of how to ensure that users who pro-

tect their personal data do not suffer any disadvantages as a

result. To address this problem, we formalize protection re-

quirements for models which only use the information for

which active user consent was obtained. This excludes im-

plicit information contained in the decision to share data or

not. We offer the ﬁrst solution to this problem by propos-

ing the notion of Protected User Consent (PUC), which we

prove to be loss-optimal under our protection requirement.

We observe that privacy and performance are not fundamen-

tally at odds with each other and that it is possible for a de-

cision maker to beneﬁt from additional data while respecting

users’ consent. To learn PUC-compliant models, we devise a

model-agnostic data augmentation strategy with ﬁnite sample

convergence guarantees. Finally, we analyze the implications

of PUC on challenging real datasets, tasks, and models.

1 Introduction

While the day-to-day impact of automated data process-

ing is steadily growing, modern regulations such as the

European Union’s General Data Protection Regulation

(GDPR) (GDPR 2016) or the California Consumer Privacy

Act (CCPA) (OAG 2021) strive to give individuals more

control over their personal data. In light of these regulations,

we consider machine-learned classiﬁers in which individu-

als have the freedom to decide themselves on which data

they would like to provide to an automated decision system.

Such systems are increasingly being deployed (Henning

2022): As a running example, we consider a realistic use-

case of health insurance pricing: Suppose in an automated

pricing model all potential customers are asked to ﬁll out an

application form where they enter certain base features, for

instance information such as their state of residence and age.

decision maker

Goal: make most accurate

predictions, respect legal

requirements

sharer

Goal: obtain more

accurate predictions

non-sharer

Goal: keep

information private,

no disadvantages

undisclosed optional

feature

z }| {z}|{

features x1features x2

required base

features

voluntary optional

feature

Figure 1: Overview of the relevant stakeholders. We con-

sider a case where users can voluntarily provide information

on optional features or choose to leave them undisclosed.

The goals of sharers, non-sharers, and the decision maker

have to be reconciled.

To improve the pricing model, the insurance offers an ad-

ditional service, a “companion ﬁtness app”, through which

additional health data about the customer’s physical condi-

tion are collected. The customers decide whether to use the

app or not; alternatively, customers can sign up for a pol-

icy without consenting to use the app. The health data that

customers share may however inﬂuence the premium of the

insurance policy they receive. We refer to data that provide

additional, non-mandatory information beyond the base fea-

tures as optional features. With ﬁtness trackers and smart-

watches rapidly gaining popularity (Reeder and David 2016;

Zimmer et al. 2020;Statista 2023), such systems are already

being deployed in practice, e.g., by major health insurance

ﬁrms in Australia (Henning 2022).

The outlined scenario is challenging as there are three

groups of stakeholders whose interests need to be recon-

ciled: (1) The group of non-sharing individuals who do not

want to provide additional information, for instance due to

privacy concerns. We refer to them as non-sharers. For this

group, the decision maker does not want to or cannot force

them to provide the additional information for legal reasons.

Consequently, the non-sharers do not want the additional in-

formation to be considered in the decision making process;

in return, they are willing to sacriﬁce some accuracy, but

they do not want to face other systematic disadvantages. (2)

arXiv:2210.13954v5 [cs.LG] 2 Feb 2024

On the other hand, individuals who voluntarily share data

(sharers) explicitly want the additional information to be

considered and want to obtain more accurate predictions.

(3) Finally, the decision makers themselves desire the most

accurate predictions with the lowest overall costs while re-

specting the users’ privacy and legal requirements.

Among these requirements, it is crucial to the non-sharers

to explicitly exclude the information contained in the de-

cision to share or not to share. To see this, we note that

smartwatch users are more likely to exercise in general than

non-wearers (DeMarco 2023) which usually create lower

costs for the insurance company as ﬁtter customers take less

sick days on average. Thus, only through observing the de-

cision to share data, the insurance ﬁrm could make infer-

ences about a person’s ﬁtness. This is problematic for two

reasons: First, the company would unethically infer private

data, that the non-sharers explicitly did not give consent to.

Prior work (Wachter and Mittelstadt 2019) has argued for

a “right to reasonable inferences”. This rules out inferences

from unrelated factors that are purely predictive and may

infringe privacy, as they open the door for discriminatory

and invasive decision-making (Mittelstadt et al. 2016). Sec-

ond, this would lead to non-sharers being assigned a higher

insurance premium than the estimate of the legacy model

which only considered their base features. Many countries

have laws that prohibit insurers from raising the base pre-

mium for users who do not share their data, as this is seen

as a coercive and unfair practice. For example, the US only

permits ﬁve factors to affect the premium, which are loca-

tion, age, tobacco use, plan category, and dependent cover-

age (US Government. U.S. Centers for Medicare & Medi-

caid Services. 2023). It is however possible for insurers –

and desired by many users – to award bonuses which re-

duce the premium based on participation in optional reward

and incentive programs (Madison, Schmidt, and Volpp 2013;

Henning 2022).

To summarize, we study machine learning models that

can handle optional features and meet legal requirements

and desiderata of three groups of stakeholders: the sharers,

the non-sharers, and the decision makers. We consider it es-

sential for these models to not make inferences based on the

unavailability of a feature value for the non-sharers, a con-

straint that we term Availability Inference Restriction (AIR).

Finally, we are interested in obtaining models with optimal

performance under this requirement.

Contribution. We address the problem of how to fairly

and privately predict outcomes for users who share optional

data and those who do not. We tackle this overlooked issue

by making the following contributions:

•Deﬁnition. We introduce models with Protected User

Consent (PUC), which are optimal under our protec-

tion requirement AIR. We derive performance guaran-

tees, which formally show that it is possible to reconcile

the decision maker’s interest in improved predictions and

the non-sharer’s privacy preferences.

•Algorithm. We propose a PUC-inducing data augmenta-

tion (PUCIDA) technique that can be applied to any type

of predictive architecture (e.g., tree or neural network)

and any convex loss function (e.g., mean squared error or

cross-entropy loss) to obtain such models

•Analysis. We prove that predictive models trained with

PUCIDA satisfy PUC asymptotically, and provide ﬁ-

nite sample convergence results that demonstrate that

PUCIDA produces PUC-compliant models in practice.

•Empirical evaluation. We empirically show that with-

out enforcing PUC, the average absolute prediction out-

come (e.g., insurance quote) of users who do not share

data can be almost 20 % worse than justiﬁed by their base

data. We then evaluate our data augmentation technique

on various ML models and show that PUC is achieved

regardless of the model.

2 Related Work

In this Section, we review the most relevant streams of re-

lated work (see Appendix A.1 for additional references).

Classiﬁcation with Missing Values. Classiﬁcation mod-

els that can handle missing data have been studied previ-

ously with the goal of minimizing costs or increasing perfor-

mance (Zhang et al. 2005;Aleryani, Wang, and De La Igle-

sia 2020), obtaining uncertainty estimates (Kachuee et al.

2020), or fulﬁlling classical fairness notions (Zhang and

Long 2021;Jeong, Wang, and Calmon 2022;Wang and

Singh 2021;Fernando et al. 2021). However, the mech-

anisms underlying missingness is different in this work,

as missing values indicate explicit non-consent by the

user, leading to different implications. In a related line of

work, classiﬁcation with noisy (Fogliato, Chouldechova,

and G’Sell 2020) or missing labels (Kilbertus et al. 2020;

Rateike et al. 2022) has been investigated, where the miss-

ingness is often a result of selection bias. The setting con-

sidered in this work is different in the sense that we are not

concerned with fulﬁlling a fairness notion with respect to a

sensitive attribute, but consider the interests of subjects that

have and have not provided optional information.

Data Minimization. The principle of Data Minimiza-

tion is anchored in the GDPR (GDPR 2016). Data Mini-

mization demands minimal data collection. Several works

are concerned with implementing (Goldsteen et al. 2021)

or auditing compliance with this principle (Rastegarpanah,

Gummadi, and Crovella 2021). Rastegarpanah et al. (Raste-

garpanah, Crovella, and Gummadi 2020) consider decision

systems that can handle optional features from a data min-

imization perspective where the decision maker decides

which features are collected for each individual. This prin-

ciple is distinct from the “right to be forgotten” (Biega et al.

2020), which enables individuals to submit requests to have

their data deleted. In response to these regulations, several

works consider the problem of updating an ML model with-

out the need of retraining the entire model (Wu, Dobriban,

and Davidson 2020;Ginart et al. 2019;Izzo et al. 2021;Go-

latkar, Achille, and Soatto 2020) or the effect of removals

on model explanations (Rong et al. 2022;Pawelczyk et al.

2023). Our work differs from these works as our goal is

to train a model where users decide themselves which data

they deem relevant through sharing one or many optional

features.

Algorithmic Fairness. A multitude of formal fairness

deﬁnitions have been put forward in the literature (Verma

and Rubin 2018). Examples include statistical parity (Dwork

et al. 2012), predictive parity (Chouldechova 2017), equal-

ized odds, equality of opportunity (Hardt, Price, and Srebro

2016), and individual fairness (Dwork et al. 2012). How-

ever, they are still a topic of discussion, for instance, be-

cause these deﬁnitions are known to be incompatible (Klein-

berg, Mullainathan, and Raghavan 2016;Lipton, McAuley,

and Chouldechova 2018). Additionally, there are a several

deﬁnitions that rely on causal mechanisms to assess fair-

ness, e.g., counterfactual fairness (Kusner et al. 2017), and

the notion of unresolved discrimination (Kilbertus et al.

2017). While causal approaches to fairness might be prefer-

able, they require information about the causal structure of

the data generating process. Moreover, it has recently been

shown that causal deﬁnitions may lead to adverse conse-

quences, such as lower diversity (Nilforoshan et al. 2022).

We discuss how existing fairness deﬁnitions could possibly

be applied to the setting with optional features, but we ﬁnd

that none of the fairness deﬁnitions aligns with our desider-

ata theoretically and experimentally (see Appendix A.2).

Strategic Classiﬁcation. In an even broader context, this

work also relates to the ﬁeld of strategic classiﬁcation (Hardt

et al. 2016). However, it is worth noting that in strategic

classiﬁcation research, the focus primarily revolves around

users strategically manipulating their features for optimal

outcomes, which may also involve information withholding

(Krishnaswamy et al. 2021). In contrast to our work, privacy

concerns are neglected in this research stream. As far as we

are aware, there are no prior works on the speciﬁc problem

of balancing the interests of all three groups of stakeholders

(the non-sharers, sharers, and the decision makers).

3 Problem Formulation

3.1 Formalization and Notation

In this work, each data instance contains a realization of a

number of base features b∈ Xb, where Xb⊆Rnis the

space of the base features. Furthermore, let there be some

optional information z∈ Xz, where Xz⊆Ris the value

space of the optional feature.1It is the users’ choice to de-

cide if they want to disclose zto the system, which results

in an availability variable a∈ {0,1}. Accordingly, only

imputed samples z∗={zif a=1,else N/A}are observed,

where a value of N/A indicates that a user did not reveal the

optional information, e.g., did not use the companion app.

In summary, the data observations are tuples x= (b, a, z∗)

that reside in X=Xb×{0,1}×(Xz∪{N/A}). Each train-

ing sample comes with a label y∈ Y. Further, there is a

data generating distribution pwith support X × Y and we

have access to an i.i.d. training sample (x, y)∼p. Figure 2

shows such a data sample. We denote the random variables

for the respective quantities by B, A, Z, Z∗, Y . The label is

probabilistically determined through the base features Band

the hidden feature Zbut the sharing decision does not inﬂu-

1We extend our deﬁnitions to integrate multiple optional fea-

tures a later section.

base features bopt. feat. z∗alabel y

state plan ﬁtness scoreavail.treatment costs

New South Wales basic 87 % 1 3k$

Queensland gold N/A 0 17k$

New South Wales basic 92 % 1 5k$

New South Wales basic N/A 0 64k$

Victoria premium 56 % 1 22k$

Figure 2: Samples for the insurance use-case. We have two

base features band one optional feature z∗, which either

takes an observed value z, or it takes a value of N/A if un-

observed. The variable a∈ {0,1}indicates the availability

of the feature. The goal is to predict the label y.

ence the true label for a given B, Z, such that Y⊥⊥ A|B, Z.

In many applications, the goal is to ﬁnd a function f:

X → Y that models the observed data. In particular, f:

X → [0,1] may predict a probability of a positive outcome

or f:X → Rmay return a numerical score. The test

data for which the model will be used come from the same

distribution p, though with the label yunobserved, and we

suppose that the information provided is always correct. We

consider a convex loss function L:Y ×Y → R, e.g., mean-

squared-error (MSE) or binary cross entropy (BCE), for

which we minimize the expected loss for a sample from the

data distribution. For instance, using the common MSE loss

L(f(x), y)=(f(x)−y)2, an optimal predictor is given by

f∗

L(x) = arg minf(x)Ep(Y|x)(f(x)−Y)2=E[Y|x],

the conditional expectation. However, this notion can be

generalized to other loss functions: An optimal predictor

f∗

L(x)for the loss function Lfulﬁlls ∀x:

f∗

L(x) = FL

p[Y|x]:= arg min

f(x)

Ep(Y|x)[L(f(x), Y )] .(1)

We use FL[Y|x]to denote a generalized expected value that

minimizes the expected loss conditioned on x. To ease our

derivations, we suppose this minimum to be unique and ﬁ-

nite. Intuitively, it represents the best guess of Ygiven x. For

the MSE-Loss, FLis equivalent to the expectation operator

E. In the following statements, the reader may thus mentally

replace FLwith an expectation Ewithout further ramiﬁca-

tions in order to get the high level intuition. Finally, we in-

troduce two key terms, namely, base feature model and full

feature model. The former refers to a model trained on the

base features only, while the latter refers to a model trained

on all features where some strategy is used to replace un-

available feature values. Typically these strategies are called

imputation and replace unavailable values by zeros, a fea-

ture’s mean or median (Emmanuel et al. 2021).

3.2 Desiderata

Our goal is to learn models f:X → Y that comply with

the desideratum of Availability Inference Restriction, which

we brieﬂy introduced in Section 1, to protect the interests

of the non-sharers. Under this constraint, the model should

provide the best predictive performance to reﬂect the need

of the sharers and the decision maker for most accurate pre-

dictions.

Desideratum 1: Availability Inference Restriction. We

start by considering the intricate case of individuals who

do not want to share optional information. In this case, the

model should compute the prediction based on the infor-

mation the user gave their consent to. In particular, (a) the

model should only use the base features and (b) should not

use information that could be derived from the unavailabil-

ity of the optional features to compute the prediction to avoid

violating the user’s consent.

For (a), this requires that the predictor does not use the

information as an explicit input, i.e., the predictor should

behave as if it only used base features bvia some function

g:Xb→ Y :f|a=0 (b, a, z∗) = g(b). For (b), although a=0

is not an explicit input to g, a sufﬁciently complex function

may still be implicitly adapting to the group a= 0 and thus

incorporate information that the user did not give their con-

sent to. We would like to make sure that the predictions of

gcannot use more information than contained in the over-

all conditional distribution, given the base features b. This

overadaption can be prevented by constraining the model’s

loss on the population of non-sharers to match the loss of the

optimal base model f∗

Lon this population. The reasoning be-

hind this rationale is that all models that would beat the per-

formance of this model must implicitly use some additional

side knowledge about this group that was not provided by

the users.

Deﬁnition 1 (Availability Inference Restriction).For in-

dividuals that choose not to provide the optional feature

(a=0), only the provided data bis used to compute the

outcome in the decision process, i.e., f|a=0 (b, a, z∗)=g(b),

where g:Xb→ Y is a base feature model. Further, we

require

E[L(g(B), Y )|A= 0] ≥E[L(f∗

L(B), Y )|A= 0] .(2)

This deﬁnition summarizes our intuition that the informa-

tion encoded through the unavailability of feature informa-

tion should neither be used explicitly (a) nor implicitly (b).

We show how this constraint can analogously be derived

from information-theoretic considerations in Appendix B.3.

Desideratum 2: Optimality. Our Deﬁnition 1restricts

the information that the predictor can use when the optional

information is unavailable. To meet the interests of the deci-

sion maker and the sharers, we also want to ﬁnd models with

optimal performance, i.e., lowest loss, under this constraint.

4 Protecting User Consent

We are therefore looking for an optimal model within the

class of predictors that comply with Availability Inference

Restriction. In this Section, we derive a novel notion called

Protected User Consent (PUC) that fulﬁlls this purpose.

4.1 One-Dimensional PUC

The next result encodes an intuitive notion of protection for

the users that do not want to share data on the optional fea-

tures (a=0): Their prediction under fis then constrained to

the best estimate for a user with the same base characteris-

tics, no matter if additional data was provided. Contrarily,

when additional information through the optional feature is

provided, the predictor returns the best estimate using the

available optional information:

Theorem 1 (1D-PUC).Let f:X → Y ⊆ Rbe a full fea-

ture model (i.e., including optional features). Among all pre-

dictors compatible with the Availability Inference Restric-

tion, a model fwith minimal loss is given by:

f∗

PUC (b, a, z∗) = FL[Y|b],if a= 0

FL[Y|b, A = 1, Z∗=z∗]if a= 1.

We defer all proofs in this work to Appendix D. PUC

is different from existing notions of group fairness, that do

not fulﬁll the two desiderata in general (see Appendix A.2

for a discussion). Under the mentioned requirements, there

is no model that can outperform f∗

PUC. We stress that 1D-

PUC-compliant models have performance guarantees. These

models match or improve upon an optimal base feature

model f∗

L(B) = FL[Y|b]. This model can be seen as an up-

per bound for practical models obtained after model selec-

tion. Therefore, models that can beat its performance may

offer improvements even after extensive hyper-parameter

tuning and model selection, a property which we refer to as

Predictive Non-Degradation (PND): a model ffulﬁlls PND

if its loss is smaller than that of the base feature model:

E[L(Y, f ∗

L(B))] ≥E[L(Y, f (B, A, Z∗))].(3)

We prove the following result:

Corollary 1 (Predictive Non-Degradation of f∗

PUC).For any

density p, a PUC-compliant model f∗

PUC fulﬁlls Predictive

Non-Degradation, i.e., it has a loss upper-bounded by the

optimal base feature model f∗

This is a remarkable result as it testiﬁes that the decision

maker can beneﬁt from additional information in terms of

loss, while protecting the privacy of users. This highlights

that the interests of the different stakeholders are not contra-

dictory and models that beneﬁt all stakeholders do exist.

4.2 PUC under Strategic Considerations and

Monotonicity Constraints

We have initially considered the case where the users desire

the highest possible accuracy under data usage restrictions.

However, in some cases such as our initial insurance exam-

ple, the motivation to receive a lower premium might be a

more important concern to some users than receiving an ac-

curate prediction or their privacy concerns. If all users have

full information (i.e., they see premiums with and without

their optional data) and act strategically by sharing the value

of zonly if it would decrease their premiums, we obtain the

following result.

Theorem 2 (Optimality of f∗

PUC under strategic actions).Let

p′(B, Z, Y )be any prior density on base features, true op-

tional features and labels and let f(b, a = 0, z) = FL[Y|b],

i.e., the decision maker uses the base feature model when

no optional data is available. Further suppose that users

strategically choose to share the optional feature zonly if

f(b, a = 1, z)≤f(b, a = 0,N/A). Under these condi-

tions, the model f∗

PUC (Theorem 1) has minimal loss among

all predictors.

This result underlines that PUC models remain optimal

if the decision maker cannot increase the premiums be-

yond the predictions of the current base model for the non-

sharers. This is reasonable in many cases, where legal con-

straints mandate that the decision maker cannot implicitly

force users to share data by inﬂating the base premium, as

outlined in the introduction. The sharing decision can also

be automated for the users by simply dropping the optional

feature if it does not lead to a decrease in premiums. This

would result in the aforementioned bonus systems, where

sharing more data cannot increase the premium. We show

that among the class of models with such a monotonicity

constraint, the outlined PUC-model with automatic sharing

decisions is still optimal under the same conditions as in

Theorem 2in Appendix D.5.

4.3 r-dimensional PUC

Next, we generalize our notion such that rfeatures can

be provided optionally. For example, the insurance ﬁrm

might also accept voluntary results from prior medical ex-

aminations or diagnostic tests. Therefore, let there now

be roptional features such that z∈Xz

1× ··· × Xz

rand

a∈{0,1}r, where Xz

iare the respective supports of each

optional feature. By I⊆[r]= {1, . . . , r}, we denote an

index set that contains all feature indices present, i.e.,

I(a)= {i|ai= 1, i = 1, . . . , r}. When we index vectors

with this set, e.g., ZI, we refer to the subvector that only

contains the indices in I.

Deﬁnition 2 (Protected User Consent, PUC).Let f:X →

Y ⊆ Rbe a full feature model. The model f∗

PUC that fulﬁlls

Protected User Consent is given by

f∗

PUC (b,a,z∗) =

(B,A,Z∗)∼phYB=b,AI(a)=1,ZI(a)=z∗

I(a)i,

where AI(a)=1means that each element that is set to 1 in

aneeds to be one in Aas well.

For a single feature (r=1), the index set can either be

I=∅or I={1}and the deﬁnition corresponds to 1D-PUC.

The conditional expectation with AI(a)=1effectively con-

strains the features in Ito be available, but marginalizes over

samples with or without further information.

5 Implementing Protected User Consent

In this section, we derive a model-agnostic approach called

PUC-inducing data augmentation (PUCIDA) to achieve

protected user consent. By using theoretical analysis, we es-

tablish that PUCIDA will result in exact protected user con-

sent. Furthermore, we establish performance guarantees that

provide an upper bound on the deviation between practical,

ﬁnite sample-based PUC-compliant models and their theo-

retical inﬁnite sample limits.

state plan score costs

NSW basic 87 % 3k$

⃝NSW basic N/A 3k$

NSW basic 92 % 5k$

⃝NSW basic N/A 5k$

NSW basic N/A 64k$

Figure 3: Explaining PUCIDA. Our data augmentation pro-

cedure expands each instance with optional information into

two samples: The original instance and a synthetic sam-

ple ( +

⃝). The synthetic samples retain the base features and

the labels, but the information on the optional features is

dropped (ﬁtness score −→ N/A). The model sees samples

with the same base features with a missing value and will

thus base its decision only on the base features. In this exam-

ple, given the base features (“NSW”, basic) and no optional

statements, the model would estimate the costs to be 24k$,

which is the dataset average conditioned on these values.

5.1 PUCIDA: PUC-inducing Data Augmentation

Intuitively, we want to prevent the model from making infer-

ence from a feature’s missingness patterns. The core insight

is to leverage synthetic samples that make the distribution of

the labels given missingness equal to the overall label dis-

tribution. Thereby, we prevent the derivation of predictive

information from the missingness itself (see Table 3).

For a single optional feature, extensively enumerating all

samples as in the table is possible while for multiple fea-

tures this may be intractable. Therefore, we do not list all

samples but propose a stochastic, multifeature variant of

the algorithm: (1) Instead of drawing samples with uniform

probability from the distribution p, we use non-normalized

weights w:

w(x) = w(b,a,z∗)=2|I(a)|.(4)

This step corresponds to the expansion of an instance into

2|I(a)|synthetic ones; e.g., a sample with a single optional

feature is assigned a weight of two (cf. Figure 3). Train-

ing instances are drawn with a probability proportional to

these weights. This results in data instances with optional

information being more frequently sampled. (2) We require

a sample modiﬁcation where optional features are randomly

dropped from the samples. For each sampled item, we drop

each available optional feature with probability p=0.5:

qi∼Bern(0.5), i = 1, . . . , r;a=q⊙a;(5)

z∗

i={z∗

iif ai=1,else N/A}, i = 1, . . . , r. (6)

(3) We train the predictive model on the modiﬁed samples

(x, y) = ((b,a,z∗), y)∼pderived through this procedure.

5.2 Theoretical Analysis

We summarize PUCIDA in pseudo-code in Appendix D.8

and provide the following theorem to demonstrate that

PUCIDA leads to PUC-compliant models.

Theorem 3. The loss-minimal model f(b,a,z∗) =

p[Y|b,A=a,Z∗=z∗]on the modiﬁed distribution p

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

IPreferNotToSay:ProtectingUserConsentinModelswithOptionalPersonalDataTobiasLeemann1,2,MartinPawelczyk3,ChristianThomasEberle1,GjergjiKasneci21UniversityofT¨ubingen,T¨ubingen,Germany2TechnicalUniversityofMunich,Munich,Germany3HarvardUniversity,Cambridge,MA,USAtobias.leemann@uni-tuebingen.de,martin.pa...

展开>> 收起<<

I Prefer Not To Say Protecting User Consent in Models with Optional Personal Data Tobias Leemann1 2 Martin Pawelczyk3 Christian Thomas Eberle1 Gjergji Kasneci2.pdf

共31页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

I Prefer Not To Say Protecting User Consent in Models with Optional Personal Data Tobias Leemann1 2 Martin Pawelczyk3 Christian Thomas Eberle1 Gjergji Kasneci2

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: