Generalization Analysis on Learning with a Concurrent Verifier Masaaki Nishino Kengo Nakamura Norihito Yasuda

2025-05-06 0 0 413.28KB 17 页 10玖币
侵权投诉
Generalization Analysis on
Learning with a Concurrent Verifier
Masaaki Nishino, Kengo Nakamura, Norihito Yasuda
NTT Communication Science Laboratories, NTT Corporation
{masaaki.nishino.uh, kengo.nakamura.dx, norihito.yasuda.hn}@hco.ntt.co.jp
Abstract
Machine learning technologies have been used in a wide range of practical systems.
In practical situations, it is natural to expect the input-output pairs of a machine
learning model to satisfy some requirements. However, it is difficult to obtain a
model that satisfies requirements by just learning from examples. A simple solution
is to add a module that checks whether the input-output pairs meet the require-
ments and then modifies the model’s outputs. Such a module, which we call a
concurrent verifier (CV), can give a certification, although how the generalizability
of the machine learning model changes using a CV is unclear. This paper gives a
generalization analysis of learning with a CV. We analyze how the learnability of a
machine learning model changes with a CV and show a condition where we can
obtain a guaranteed hypothesis using a verifier only in the inference time. We also
show that typical error bounds based on Rademacher complexity will be no larger
than that of the original model when using a CV in multi-class classification and
structured prediction settings.
1 Introduction
As machine learning technology matures, many systems have been developed that exploit machine
learning models. When developing a system that uses a machine learning model, a model with
merely small prediction error is not satisfactory due to real-field requirements. For example, an object
recognition model that is sensitive to slight noise would cause security issues [
4
,
28
], or a model with
unexpected output would increase a system’s cost for dealing with it. Thus, we want the input-output
pairs of a machine learning model to satisfy some requirements. However, it is difficult to obtain a
model that satisfies the requirements by just learning from examples. Moreover, since the learned
models tend to be complex and the input domain tends to be quite large, it is unrealistic to certify that
every input-output pair satisfies the requirements. In addition, even if we find an input-output pair
that does not satisfy the requirements, modifying a model is difficult since we have to re-estimate it
from the training examples.
This paper considers a way to obtain a machine learning model whose input-output pairs satisfy the
required properties. We address the following assumptions for a situation where a machine learning
model is used. First, we can judge whether input-output pair
(x, h(x))
satisfies the requirements,
where
h:X → Y
is a machine learning model or a hypothesis. As we show below, important use
cases fit this setting. Second, a machine learning model already exists whose prediction error is small
enough, although its input-output pairs are not guaranteed to satisfy the requirements. This second
assumption is also reasonable since modern machine learning models show sufficient prediction
accuracy in various tasks. Under these assumptions, a practical choice for addressing this problem
isn’t changing the machine learning model but adding a module that checks the input-output pairs of
machine learning model
h
. We call this module a concurrent verifier (CV). Fig. 1 shows the system
configuration of a machine learning model with a CV. The verifier checks whether the input-output
Preprint. Under review.
arXiv:2210.05331v1 [cs.LG] 11 Oct 2022
Machine Learning
Model
Concurrent
Verifier
Figure 1: Overview of a machine learning model with a concurrent verifier that checks whether
input-output pairs of a model satisfy requirements.
pair
(x, h(x))
satisfies the required properties. If it satisfies the requirements, it outputs
h(x)
. If not,
then it rejects
h(x)
and modifies or requests the learning model to modify its output. A machine
learning model and verifier pair can be seen as another machine learning model whose input-output
pairs are guaranteed to satisfy the required conditions.
Although a model with a verifier can guarantee that its input-output pairs satisfy requirements, its
effect on prediction error is unclear. This paper gives theoretical analyses of the generalization errors
of a machine learning model with a CV. We focus on how the learnability of the original model,
denoted as hypothesis class
H
, can change by using the verifier. First, we consider a situation where
we use a CV only in the inference phase. This setting corresponds to a case where the required
properties are unknown when we are in the training phase. If the hypothesis class is PAC-learnable,
we can obtain a guaranteed hypothesis using a verifier only in the inference time.
Second, we consider a situation where we know the requirements when learning the model. This
situation corresponds to viewing the learnability of hypothesis set
Hc
, which is obtained by modifying
every hypothesis
h∈ H
to satisfy the requirements. Hence we compare the generalization error upper
bounds of
Hc
with those of
H
. On the multi-class classification setting, we show that existing error
bounds [
15
,
18
] based on the Rademacher complexity of
H
are also bounds of modified hypothesis
Hc
for any input-output requirements. Moreover, we give similar analyses for a structured prediction
task, which is a kind of multi-class classification where set of classes
Y
can be decomposed into
substructures. It is worth analyzing the task since many works address the constraints in structured
prediction. Some works give error bounds for structured prediction tasks, which are tighter than
simply applying the bound for multi-class classification tasks [
16
,
6
,
19
]. Similar to the case of multi-
class classification, we show that existing Rademacher complexity-based bounds for the structured
prediction of Hare also the bounds for Hc.
Our main contributions are as follows: a) We introduce a concurrent verifier, which is a model-
agnostic way to guarantee that machine learning models satisfy the required properties. Although a
similar mechanism was used in some existing models, our model gives a generalization analysis that
does not depend on a specific model. b) We show that if hypothesis class
H
is PAC-learnable, then
using a verifier at the inference time can give a hypothesis with a guarantee in its generalization error.
Interestingly, if H is not PAC-learnable, we might fail to obtain a guaranteed hypothesis even if the
requirements are consistent with distribution
D
. c) We show that if we use a CV in a learning phase
of multi-class classification tasks, then the theoretical error bounds of
H
based on the Rademacher
complexity will not increase with any input-output requirements. We also give similar results for
structured prediction tasks.
1.1 Use Cases of a Concurrent Verifier
The following are some typical use cases for CVs.
Error-sensitive applications:
A typical situation where we want to use a verifier is that some
prediction errors might cause severe effects, which we want to avoid. For example, a recommender
system might limit the set of candidate items depending on user attributes. Although such a rule
might degrade the prediction accuracy, practically a safer model is preferable.
Controlling outputs of structured prediction:
Constraints are frequently used in structured predic-
tion tasks for improving the performance or the controllability of the outputs. For example, some
works [
23
,
5
] exploited the constraints on sequence labeling tasks for reflecting background knowl-
edge to improve the prediction results. More recently, some works [
9
,
2
] exploited the constraints
in language generation tasks, including image captioning and machine translation, and restricted a
2
model to output a sentence that includes given keywords. Since the constraints used in this previous
work can be written as a logical formula, our CV model can represent them as requirements.
Robustness against input perturbations:
If a machine learning model changes its output because
we modified its input from
x
to
x0
, which is very close to
x
, then the model is described as sensitive
against a small change [
27
]. It might be a security risk if a model is sensitive since its behavior
is unpredictable. Therefore, some methods evaluate and verify the robustness of neural networks
against small perturbations [
28
,
4
]. Existing verification methods check a machine learning model’s
robustness around input
x
by determining whether
x0
exists that is close to
x
and whether model
f
gives different outputs, i.e.,
h(x)6=h(x0)
, for verification samples
x1, . . . , xn
. Although these
verification methods can test a model, they do not directly show how to obtain a robust model.
A CV can fix a model to achieve robustness around samples
x1, . . . , xn
by setting a rule of form:
h(x0)
must equal
h(xi)
if
x0
is close to
xi
. Although this solution might not guarantee robustness
where samples are scarce, adding enough non-labeled verification samples is often a reasonable
choice.
2 Related Work
Machine learning models that can exploit constraints have been investigated in many research
fields, including statistical symbolic learning and structured prediction. For example, Markov logic
networks [
22
], Problogs [
8
], and probabilistic circuit models [
11
] integrate statistical models with
symbolic logic formulations. Since these models can incorporate hard constraints represented by
symbolic logic, they can guarantee input-output pairs. However, previous research focused on their
practical performance and gave little theoretical analysis of their learnability when hard constraints
are used. Moreover, previous works integrated the ability to exploit constraints into specific models.
In contrast, our CV is model-agnostic and can be used in combination with a wide range of machine
learning models.
Recently, the verification of machine learning models has been gathering more attention. Attempts
have verfified whether a machine learning model has the desired properties [
4
,
28
,
10
,
26
]. Exact
verification methods use integer programming (MIP) [
28
], constraint satisfaction (SAT) [
20
], and
a satisfiable module theory (SMT) solver [
10
] to assess the robustness of a neural network model
against input noise. These approaches aim to obtain models that fulfill the required properties.
However, verification methods cannot help modify the models if they do not satisfy the requirements.
If we want ML models to meet requirements, post-processing is needed as our concurrent verification
model.
Other methods can give upper bounds on generalization error, including VC-dimension [
29
] and its
extensions [
7
,
21
], Rademacher complexity [
3
,
12
], stability [
25
], and PAC-Bayes [
17
,
1
]. We use
Rademacher complexity in the following analysis since it is among the most popular tools for giving
theoretical upper bounds on generalization error. Rademacher complexity also has some extensions,
including local Rademacher complexity [
15
] and factor graph Rademacher complexity [
6
]. We can
provide theoretical guarantees on these extended measures.
3 Preliminaries
Our notation follows a previous work [
24
]. We first introduce the notations used in the following
sections. Let
X
denote the domain of the inputs, let
Y
be the domain of the labels, and let
Z
be the
domain of the examples defined as
Z:=X ×Y
. Let
H
be a hypothesis class, and let
`:H×ZR+
be a loss function. Training data
S= (z1, . . . , zm)Zm
is a finite sequence of size
m
drawn i.i.d.
from a fixed but unknown probability distribution
D
on
Z
. Learning algorithm
A
maps training data
S
to hypothesis
h
. We use notation
A(S)
to denote the hypothesis that learning algorithm
A
returns
upon receiving S. We represent set {1, . . . , K}as [K].
Given distribution
D
on
Z
, we denote by
LD(h)
the generalization error and by
LS(h)
the empirical
error of hover S, defined by
LD(h):=E
z∼D[`(h, z)] , LS(h):=1
m
m
X
i=1
`(h, zi).(1)
3
PAC learnability: We introduce PAC learnability and agnostic PAC learnability as follows.
Definition 3.1.
(Agnostic PAC learnability) Hypothesis class
H
is agnostic PAC-learnable if there
exists function
mH: (0,1)2N
and learning algorithm
A
with the following property: For every
, δ (0,1)
and distribution
D
over
Z
, if
S
consists of
mmH(, δ)
i.i.d. examples generated by
D, then with at least probability 1δ, the following holds:
LD(A(S)) min
h0∈H LD(h0) +  . (2)
Distribution
D
is realizable by hypothesis set
H
if
h∈ H
exists such that
LD(h) = 0
. If
D
is
realizable by agnostic PAC-learnable hypothesis
H
, then
H
is PAC-learnable. If
H
is PAC-learnable,
then Eq. (2) becomes LD(A(S)) since minh0∈H LD(h0)=0.
Rademacher complexity:
In the following sections, we use Rademacher complexity for deriving
the generalization bounds. Given loss function `(h, z)and hypothesis class H, we denote Gas
G:=`◦ H:={z7→ `(h, z) : h∈ H}.
Definition 3.2.
(Empirical Rademacher complexity) Let
G
be a family of functions mapping from
Z
to
R
, and let
S= (z1, . . . , zm)Zm
be the training data of size
m
. Then the empirical Rademacher
complexity of Gwith respect to Sis defined:
RS(G):=E
σ"sup
g∈G
m
X
i=1
σig(zi)#,
where
σ= (σ1, . . . , σm)∈ {±1}m
are random variables distributed i.i.d. according to
P[σi=
1] = P[σi=1] = 1/2
.The Rademacher complexity of
G
is defined as the expected value of the
empirical Rademacher complexity:
Rm(G):=E
S∼Dm[RS(G)] .
4 Concurrent Verifier
Next we give a formal definition of a CV. A CV works with a machine learning model, which is
function
h:X → Y
. If
x
is given to the model, which outputs
h(x)
, then the verifier checks whether
(x, h(x))
satisfies the required property. We assume that the required property can be represented as
requirement function
c: (X × Y)→ {0,1}
. If
c(x, h(x)) = 1
, then the pair satisfies the property; if
c(x, h(x)) = 0
, then it does not. Requirement function
c
can be represented by a set of deterministic
rules. For example, if
X=R
and
Y={0,1}
, then the requirements can be in the following form:
“if
x > 0
, then
y6= 0
. We assume that for all possible input
x∈ X
, there exists
y∈ Y
such that
c(x, y)=1
for avoiding the situation where the requirements are unsatisfiable for any output
y
. This
assumption can be easily relaxed if we allow a machine learning model to reject unsatisfiable input
x
.
After checking the input-output pair, a verifier modifies output
h(x)
depending on the value of
c(x, h(x))
. If
c(x, h(x)) = 1
, the verifier outputs
h(x)
since it satisfies the requirements. If
c(x, h(x)) = 0
, then the verifier modifies
h(x)
to some
y∈ Y
that satisfies
c(x, y)=1
. If we use a
verifier with a machine learning model that corresponds to
h
, then the combination of the model and
the verifier can be seen as function hc:X → Y, defined as
hc(x):=h(x)if c(x, h(x)) = 1
ycif c(x, h(x)) = 0 ,(3)
where
yc∈ Y
satisfies
c(x, yc) = 1
and is selected deterministically. When
Y= [K]
, an example for
selecting minimum
i[K]
satisfying
c(x, i)=1
as
yc
is a reasonable choice. When
Y= [K]
and
h(x)
is made by scoring functions
h(x, y):(X ×Y)R
, it is also reasonable to select
y
such that
y= argmaxy∈Y,c(x,y)=1 h(x, y)
. Learning a model corresponds to selecting hypothesis
h
from
hypothesis class
H
. Therefore, learning a model with a CV corresponds to choosing a hypothesis
from the modified hypothesis class:
Hc={hc:h∈ H}
. By definition, every hypothesis in
Hc
satisfies the requirements, and thus we can guarantee that the model satisfies the condition if we select
a hypothesis from
Hc
. In the following sections, we analyze the learnability of
Hc
by comparing it
with that of H.
4
摘要:

GeneralizationAnalysisonLearningwithaConcurrentVerierMasaakiNishino,KengoNakamura,NorihitoYasudaNTTCommunicationScienceLaboratories,NTTCorporation{masaaki.nishino.uh,kengo.nakamura.dx,norihito.yasuda.hn}@hco.ntt.co.jpAbstractMachinelearningtechnologieshavebeenusedinawiderangeofpracticalsystems.Inpr...

展开>> 收起<<
Generalization Analysis on Learning with a Concurrent Verifier Masaaki Nishino Kengo Nakamura Norihito Yasuda.pdf

共17页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:17 页 大小:413.28KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 17
客服
关注