Generalization Analysis on Learning with a Concurrent Veriﬁer Masaaki Nishino Kengo Nakamura Norihito Yasuda

2025-05-06 0 0 413.28KB 17 页 10玖币

侵权投诉

Generalization Analysis on

Learning with a Concurrent Veriﬁer

Masaaki Nishino, Kengo Nakamura, Norihito Yasuda

NTT Communication Science Laboratories, NTT Corporation

{masaaki.nishino.uh, kengo.nakamura.dx, norihito.yasuda.hn}@hco.ntt.co.jp

Abstract

Machine learning technologies have been used in a wide range of practical systems.

In practical situations, it is natural to expect the input-output pairs of a machine

learning model to satisfy some requirements. However, it is difﬁcult to obtain a

model that satisﬁes requirements by just learning from examples. A simple solution

is to add a module that checks whether the input-output pairs meet the require-

ments and then modiﬁes the model’s outputs. Such a module, which we call a

concurrent veriﬁer (CV), can give a certiﬁcation, although how the generalizability

of the machine learning model changes using a CV is unclear. This paper gives a

generalization analysis of learning with a CV. We analyze how the learnability of a

machine learning model changes with a CV and show a condition where we can

obtain a guaranteed hypothesis using a veriﬁer only in the inference time. We also

show that typical error bounds based on Rademacher complexity will be no larger

than that of the original model when using a CV in multi-class classiﬁcation and

structured prediction settings.

1 Introduction

As machine learning technology matures, many systems have been developed that exploit machine

learning models. When developing a system that uses a machine learning model, a model with

merely small prediction error is not satisfactory due to real-ﬁeld requirements. For example, an object

recognition model that is sensitive to slight noise would cause security issues [

], or a model with

unexpected output would increase a system’s cost for dealing with it. Thus, we want the input-output

pairs of a machine learning model to satisfy some requirements. However, it is difﬁcult to obtain a

model that satisﬁes the requirements by just learning from examples. Moreover, since the learned

models tend to be complex and the input domain tends to be quite large, it is unrealistic to certify that

every input-output pair satisﬁes the requirements. In addition, even if we ﬁnd an input-output pair

that does not satisfy the requirements, modifying a model is difﬁcult since we have to re-estimate it

from the training examples.

This paper considers a way to obtain a machine learning model whose input-output pairs satisfy the

required properties. We address the following assumptions for a situation where a machine learning

model is used. First, we can judge whether input-output pair

(x, h(x))

satisﬁes the requirements,

where

h:X → Y

is a machine learning model or a hypothesis. As we show below, important use

cases ﬁt this setting. Second, a machine learning model already exists whose prediction error is small

enough, although its input-output pairs are not guaranteed to satisfy the requirements. This second

assumption is also reasonable since modern machine learning models show sufﬁcient prediction

accuracy in various tasks. Under these assumptions, a practical choice for addressing this problem

isn’t changing the machine learning model but adding a module that checks the input-output pairs of

machine learning model

. We call this module a concurrent veriﬁer (CV). Fig. 1 shows the system

conﬁguration of a machine learning model with a CV. The veriﬁer checks whether the input-output

Preprint. Under review.

arXiv:2210.05331v1 [cs.LG] 11 Oct 2022

Machine Learning

Model

Concurrent

Veriﬁer

Figure 1: Overview of a machine learning model with a concurrent veriﬁer that checks whether

input-output pairs of a model satisfy requirements.

pair

(x, h(x))

satisﬁes the required properties. If it satisﬁes the requirements, it outputs

h(x)

. If not,

then it rejects

h(x)

and modiﬁes or requests the learning model to modify its output. A machine

learning model and veriﬁer pair can be seen as another machine learning model whose input-output

pairs are guaranteed to satisfy the required conditions.

Although a model with a veriﬁer can guarantee that its input-output pairs satisfy requirements, its

effect on prediction error is unclear. This paper gives theoretical analyses of the generalization errors

of a machine learning model with a CV. We focus on how the learnability of the original model,

denoted as hypothesis class

, can change by using the veriﬁer. First, we consider a situation where

we use a CV only in the inference phase. This setting corresponds to a case where the required

properties are unknown when we are in the training phase. If the hypothesis class is PAC-learnable,

we can obtain a guaranteed hypothesis using a veriﬁer only in the inference time.

Second, we consider a situation where we know the requirements when learning the model. This

situation corresponds to viewing the learnability of hypothesis set

, which is obtained by modifying

every hypothesis

h∈ H

to satisfy the requirements. Hence we compare the generalization error upper

bounds of

with those of

. On the multi-class classiﬁcation setting, we show that existing error

bounds [

] based on the Rademacher complexity of

are also bounds of modiﬁed hypothesis

for any input-output requirements. Moreover, we give similar analyses for a structured prediction

task, which is a kind of multi-class classiﬁcation where set of classes

can be decomposed into

substructures. It is worth analyzing the task since many works address the constraints in structured

prediction. Some works give error bounds for structured prediction tasks, which are tighter than

simply applying the bound for multi-class classiﬁcation tasks [

]. Similar to the case of multi-

class classiﬁcation, we show that existing Rademacher complexity-based bounds for the structured

prediction of Hare also the bounds for Hc.

Our main contributions are as follows: a) We introduce a concurrent veriﬁer, which is a model-

agnostic way to guarantee that machine learning models satisfy the required properties. Although a

similar mechanism was used in some existing models, our model gives a generalization analysis that

does not depend on a speciﬁc model. b) We show that if hypothesis class

is PAC-learnable, then

using a veriﬁer at the inference time can give a hypothesis with a guarantee in its generalization error.

Interestingly, if H is not PAC-learnable, we might fail to obtain a guaranteed hypothesis even if the

requirements are consistent with distribution

. c) We show that if we use a CV in a learning phase

of multi-class classiﬁcation tasks, then the theoretical error bounds of

based on the Rademacher

complexity will not increase with any input-output requirements. We also give similar results for

structured prediction tasks.

1.1 Use Cases of a Concurrent Veriﬁer

The following are some typical use cases for CVs.

Error-sensitive applications:

A typical situation where we want to use a veriﬁer is that some

prediction errors might cause severe effects, which we want to avoid. For example, a recommender

system might limit the set of candidate items depending on user attributes. Although such a rule

might degrade the prediction accuracy, practically a safer model is preferable.

Controlling outputs of structured prediction:

Constraints are frequently used in structured predic-

tion tasks for improving the performance or the controllability of the outputs. For example, some

works [

] exploited the constraints on sequence labeling tasks for reﬂecting background knowl-

edge to improve the prediction results. More recently, some works [

] exploited the constraints

in language generation tasks, including image captioning and machine translation, and restricted a

model to output a sentence that includes given keywords. Since the constraints used in this previous

work can be written as a logical formula, our CV model can represent them as requirements.

Robustness against input perturbations:

If a machine learning model changes its output because

we modiﬁed its input from

, which is very close to

, then the model is described as sensitive

against a small change [

]. It might be a security risk if a model is sensitive since its behavior

is unpredictable. Therefore, some methods evaluate and verify the robustness of neural networks

against small perturbations [

]. Existing veriﬁcation methods check a machine learning model’s

robustness around input

by determining whether

exists that is close to

and whether model

gives different outputs, i.e.,

h(x)6=h(x0)

, for veriﬁcation samples

x1, . . . , xn

. Although these

veriﬁcation methods can test a model, they do not directly show how to obtain a robust model.

A CV can ﬁx a model to achieve robustness around samples

x1, . . . , xn

by setting a rule of form:

“

h(x0)

must equal

h(xi)

is close to

.” Although this solution might not guarantee robustness

where samples are scarce, adding enough non-labeled veriﬁcation samples is often a reasonable

choice.

2 Related Work

Machine learning models that can exploit constraints have been investigated in many research

ﬁelds, including statistical symbolic learning and structured prediction. For example, Markov logic

networks [

], Problogs [

], and probabilistic circuit models [

] integrate statistical models with

symbolic logic formulations. Since these models can incorporate hard constraints represented by

symbolic logic, they can guarantee input-output pairs. However, previous research focused on their

practical performance and gave little theoretical analysis of their learnability when hard constraints

are used. Moreover, previous works integrated the ability to exploit constraints into speciﬁc models.

In contrast, our CV is model-agnostic and can be used in combination with a wide range of machine

learning models.

Recently, the veriﬁcation of machine learning models has been gathering more attention. Attempts

have verﬁﬁed whether a machine learning model has the desired properties [

]. Exact

veriﬁcation methods use integer programming (MIP) [

], constraint satisfaction (SAT) [

], and

a satisﬁable module theory (SMT) solver [

] to assess the robustness of a neural network model

against input noise. These approaches aim to obtain models that fulﬁll the required properties.

However, veriﬁcation methods cannot help modify the models if they do not satisfy the requirements.

If we want ML models to meet requirements, post-processing is needed as our concurrent veriﬁcation

model.

Other methods can give upper bounds on generalization error, including VC-dimension [

] and its

extensions [

], Rademacher complexity [

], stability [

], and PAC-Bayes [

]. We use

Rademacher complexity in the following analysis since it is among the most popular tools for giving

theoretical upper bounds on generalization error. Rademacher complexity also has some extensions,

including local Rademacher complexity [

] and factor graph Rademacher complexity [

]. We can

provide theoretical guarantees on these extended measures.

3 Preliminaries

Our notation follows a previous work [

]. We ﬁrst introduce the notations used in the following

sections. Let

denote the domain of the inputs, let

be the domain of the labels, and let

be the

domain of the examples deﬁned as

Z:=X ×Y

. Let

be a hypothesis class, and let

`:H×Z→R+

be a loss function. Training data

S= (z1, . . . , zm)∈Zm

is a ﬁnite sequence of size

drawn i.i.d.

from a ﬁxed but unknown probability distribution

. Learning algorithm

maps training data

to hypothesis

. We use notation

A(S)

to denote the hypothesis that learning algorithm

returns

upon receiving S. We represent set {1, . . . , K}as [K].

Given distribution

, we denote by

LD(h)

the generalization error and by

LS(h)

the empirical

error of hover S, deﬁned by

LD(h):=E

z∼D[`(h, z)] , LS(h):=1

i=1

`(h, zi).(1)

PAC learnability: We introduce PAC learnability and agnostic PAC learnability as follows.

Deﬁnition 3.1.

(Agnostic PAC learnability) Hypothesis class

is agnostic PAC-learnable if there

exists function

mH: (0,1)2→N

and learning algorithm

with the following property: For every

, δ ∈(0,1)

and distribution

over

, if

consists of

m≥mH(, δ)

i.i.d. examples generated by

D, then with at least probability 1−δ, the following holds:

LD(A(S)) ≤min

h0∈H LD(h0) +  . (2)

Distribution

is realizable by hypothesis set

h∗∈ H

exists such that

LD(h∗) = 0

. If

realizable by agnostic PAC-learnable hypothesis

, then

is PAC-learnable. If

is PAC-learnable,

then Eq. (2) becomes LD(A(S)) ≤since minh0∈H LD(h0)=0.

Rademacher complexity:

In the following sections, we use Rademacher complexity for deriving

the generalization bounds. Given loss function `(h, z)and hypothesis class H, we denote Gas

G:=`◦ H:={z7→ `(h, z) : h∈ H}.

Deﬁnition 3.2.

(Empirical Rademacher complexity) Let

be a family of functions mapping from

, and let

S= (z1, . . . , zm)∈Zm

be the training data of size

. Then the empirical Rademacher

complexity of Gwith respect to Sis deﬁned:

RS(G):=E

σ"sup

g∈G

i=1

σig(zi)#,

where

σ= (σ1, . . . , σm)∈ {±1}m

are random variables distributed i.i.d. according to

P[σi=

1] = P[σi=−1] = 1/2

.The Rademacher complexity of

is deﬁned as the expected value of the

empirical Rademacher complexity:

Rm(G):=E

S∼Dm[RS(G)] .

4 Concurrent Veriﬁer

Next we give a formal deﬁnition of a CV. A CV works with a machine learning model, which is

function

h:X → Y

. If

is given to the model, which outputs

h(x)

, then the veriﬁer checks whether

(x, h(x))

satisﬁes the required property. We assume that the required property can be represented as

requirement function

c: (X × Y)→ {0,1}

. If

c(x, h(x)) = 1

, then the pair satisﬁes the property; if

c(x, h(x)) = 0

, then it does not. Requirement function

can be represented by a set of deterministic

rules. For example, if

X=R

and

Y={0,1}

, then the requirements can be in the following form:

“if

x > 0

, then

y6= 0

.” We assume that for all possible input

x∈ X

, there exists

y∈ Y

such that

c(x, y)=1

for avoiding the situation where the requirements are unsatisﬁable for any output

. This

assumption can be easily relaxed if we allow a machine learning model to reject unsatisﬁable input

After checking the input-output pair, a veriﬁer modiﬁes output

h(x)

depending on the value of

c(x, h(x))

. If

c(x, h(x)) = 1

, the veriﬁer outputs

h(x)

since it satisﬁes the requirements. If

c(x, h(x)) = 0

, then the veriﬁer modiﬁes

h(x)

to some

y∈ Y

that satisﬁes

c(x, y)=1

. If we use a

veriﬁer with a machine learning model that corresponds to

, then the combination of the model and

the veriﬁer can be seen as function hc:X → Y, deﬁned as

hc(x):=h(x)if c(x, h(x)) = 1

ycif c(x, h(x)) = 0 ,(3)

where

yc∈ Y

satisﬁes

c(x, yc) = 1

and is selected deterministically. When

Y= [K]

, an example for

selecting minimum

i∈[K]

satisfying

c(x, i)=1

is a reasonable choice. When

Y= [K]

and

h(x)

is made by scoring functions

h(x, y):(X ×Y)→R

, it is also reasonable to select

y∗

such that

y∗= argmaxy∈Y,c(x,y)=1 h(x, y)

. Learning a model corresponds to selecting hypothesis

from

hypothesis class

. Therefore, learning a model with a CV corresponds to choosing a hypothesis

from the modiﬁed hypothesis class:

Hc={hc:h∈ H}

. By deﬁnition, every hypothesis in

satisﬁes the requirements, and thus we can guarantee that the model satisﬁes the condition if we select

a hypothesis from

. In the following sections, we analyze the learnability of

by comparing it

with that of H.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

GeneralizationAnalysisonLearningwithaConcurrentVerierMasaakiNishino,KengoNakamura,NorihitoYasudaNTTCommunicationScienceLaboratories,NTTCorporation{masaaki.nishino.uh,kengo.nakamura.dx,norihito.yasuda.hn}@hco.ntt.co.jpAbstractMachinelearningtechnologieshavebeenusedinawiderangeofpracticalsystems.Inpr...

展开>> 收起<<

Generalization Analysis on Learning with a Concurrent Veriﬁer Masaaki Nishino Kengo Nakamura Norihito Yasuda.pdf

共17页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Generalization Analysis on Learning with a Concurrent Veriﬁer Masaaki Nishino Kengo Nakamura Norihito Yasuda

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: