Learning in RKHM a C-Algebraic Twist for Kernel Machines Yuka Hashimoto12Masahiro Ikeda23Hachem Kadri4 1. NTT Network Service Systems Laboratories NTT Corporation Tokyo Japan

2025-04-24 0 0 876.01KB 21 页 10玖币
侵权投诉
Learning in RKHM: a C-Algebraic Twist for Kernel Machines
Yuka Hashimoto1,2Masahiro Ikeda2,3Hachem Kadri4
1. NTT Network Service Systems Laboratories, NTT Corporation, Tokyo, Japan
2. Center for Advanced Intelligence Project, RIKEN, Tokyo, Japan
3. Keio University, Yokohama, Japan
4. Aix-Marseille University, CNRS, LIS, Marseille, France
Abstract
Supervised learning in reproducing kernel Hilbert space (RKHS) and vector-valued RKHS (vvRKHS)
has been investigated for more than 30 years. In this paper, we provide a new twist to this rich literature by
generalizing supervised learning in RKHS and vvRKHS to reproducing kernel Hilbert C-module (RKHM),
and show how to construct effective positive-definite kernels by considering the perspective of C-algebra.
Unlike the cases of RKHS and vvRKHS, we can use C-algebras to enlarge representation spaces. This
enables us to construct RKHMs whose representation power goes beyond RKHSs, vvRKHSs, and existing
methods such as convolutional neural networks. Our framework is suitable, for example, for effectively
analyzing image data by allowing the interaction of Fourier components.
1 INTRODUCTION
Supervised learning in reproducing kernel Hilbert space (RKHS) has been actively investigated since the early
1990s (Murphy, 2012; Christmann & Steinwart, 2008; Shawe-Taylor & Cristianini, 2004; Sch¨olkopf & Smola,
2002; Boser et al., 1992). The notion of reproducing kernels as dot products in Hilbert spaces was first brought to
the field of machine learning by Aizerman et al. (1964), while the theoretical foundation of reproducing kernels
and their Hilbert spaces dates back to at least Aronszajn (1950). By virtue of the representer theorem (Scolkopf
et al., 2001), we can compute the solution of an infinite-dimensional minimization problem in RKHS with given
finite samples. In addition to the standard RKHSs, applying vector-valued RKHSs (vvRKHSs) to supervised
learning has also been proposed and used in analyzing vector-valued data (Micchelli & Pontil, 2005; ´
Alvarez
et al., 2012; Kadri et al., 2016; Minh et al., 2016; Brouard et al., 2016; Laforgue et al., 2020; Huusari & Kadri,
2021). Generalization bounds of the supervised problems in RKHS and vvRKHS are also derived (Mohri et al.,
2018; Caponnetto & De Vito, 2007; Audiffren & Kadri, 2013; Huusari & Kadri, 2021).
Reproducing kernel Hilbert C-module (RKHM) is a generalization of RKHS and vvRKHS by means of
C-algebra. C-algebra is a generalization of the space of complex values. It has a product and an involution
structures. Important examples are the C-algebra of bounded linear operators on a Hilbert space and the
C-algebra of continuous functions on a compact space. RKHMs have been originally studied for pure operator
algebraic and mathematical physics problems (Manuilov & Troitsky, 2000; Heo, 2008; Moslehian, 2022). Re-
cently, applying RKHMs to data analysis has been proposed by Hashimoto et al. (2021). They generalized the
representer theorem in RKHS to RKHM, which allows us to analyze structured data such as functional data
with C-algebras.
In this paper, we investigate supervised learning in RKHM. This provides a new twist to the state-of-the-art
kernel-based learning algorithms and the development of a novel kind of reproducing kernels. An advantage of
RKHM over RKHS and vvRKHS is that we can enlarge the C-algebra characterizing the RKHM to construct
a representation space. This allows us to represent more functions than the case of RKHS and make use of the
product structure in the C-algebra. Our main contributions are:
1
arXiv:2210.11855v3 [stat.ML] 26 Jun 2024
We define positive definite kernels from the perspective of C-algebra, which are suitable for learning in
RKHM and adapted to analyze image data.
We derive a generalization bound of the supervised learning problem in RKHM, which generalizes existing
results of RKHS and vvRKHS. We also show that the computational complexity of our method can be
reduced if parameters in the C-algebra-valued positive definite kernels have specific structures.
We show that our framework generalizes existing methods based on convolution operations.
Important applications of the supervised learning in RKHM are tasks whose inputs and outputs are images. If
the proposed kernels have specific parameters, then the product structure is the convolution, which corresponds
to the pointwise product of Fourier components. By extending the C-algebra to a larger one, we can enjoy
more general operations than the convolutions. This enables us to analyze image data effectively by making
interactions between Fourier components. Regarding the generalization bound, we derive the same type of
bound as those obtained for RKHS and vvRKHS via Rademacher complexity theory. This is to our knowledge,
the first generalization bound for RKHM hypothesis classes. Concerning the connection with existing methods,
we show that using our framework, we can reconstruct existing methods such as the convolutional neural
network (LeCun et al., 1998) and the convolutional kernel (Mairal et al., 2014) and further generalize them.
This fact implies that the representation power of our framework goes beyond the existing methods.
The remainder of this paper is organized as follows: In Section 2, we review mathematical notions related
to this paper. We propose C-algebra-valued positive definite kernels in Section 3 and investigate supervised
learning in RKHM in Section 4. Then, we show connections with existing convolution-based methods in
Section 5. We confirm the advantage of our method numerically in Section 6 and conclude the paper in
Section 7. All technical proofs are in Section B.
2 PRELIMINARIES
2.1 C-Algebra and Hilbert C-Module
C-algebra is a Banach space equipped with a product and an involution that satisfies the Cidentity. See
Section A for more details. An example of C-algebra is group C-algebra (Kirillov, 1976). Let pN, and let
Z/pZbe the set of integers modulo p.
Definition 2.1 (Group C-algebra on a finite cyclic group) Let ω= e2π1/p. The group C-algebra on
Z/pZ, which is denoted as C(Z/pZ), is the set of maps from Z/pZto Cequipped with the following product,
involution, and norm:
(x·y)(z) = PwZ/pZx(zw)y(w)for zZ/pZ,
x(z) = x(z),
x= maxn∈{0,...,p1}|PzZ/pZx(z)ωzn|.
Since the product is the convolution, group C-algebras offer a new way to define positive definite kernels,
which are effective in analyzing image data as we will see in Section 3. Elements in C(Z/pZ) are described by
circulant matrices (Gray, 2006). Let Circ(p) = {xCp×p|xis a circulant matrix}. Moreover, we denote the
circulant matrix whose first row is vas circ(v). The discrete Fourier transform (DFT) matrix, whose (i, j)-entry
is ω(i1)(j1)/p, is denoted as F.
Lemma 2.2 Any circulant matrix xCirc(p)has an eigenvalue decomposition x=FΛxF, where
Λx= diag X
zZ/pZ
x(z)ωz·0,..., X
zZ/pZ
x(z)ωz(p1).
2
Lemma 2.3 The group C-algebra C(Z/pZ)is C-isomorphic to Circ(p).
We now review important notions about C-algebra. We denote a C-algebra by A.
Definition 2.4 (Positive) An element aof Ais called positive if there exists b∈ A such that a=bbholds.
For a, b ∈ A, we write aAbif bais positive, and aAbif bais positive and not zero. We denote by
A+the subset of Acomposed of all positive elements in A.
Definition 2.5 (Minimum) For a subset Sof A,a∈ A is said to be a lower bound with respect to the order
A, if aAbfor any b∈ S. Then, a lower bound c∈ A is said to be an infimum of S, if aAcfor any lower
bound aof S. If c∈ S, then cis said to be a minimum of S.
Hilbert C-module is a generalization of Hilbert space. We can define an A-valued inner product and a (real
nonnegative-valued) norm as a natural generalization of the complex-valued inner product. See Section A for
further details. Then, we define Hilbert C-module as follows.
Definition 2.6 (Hilbert C-module) Let Mbe a C-module over Aequipped with an A-valued inner prod-
uct. If Mis complete with respect to the norm induced by the A-valued inner product, it is called a Hilbert
C-module over Aor Hilbert A-module.
2.2 Reproducing Kernel Hilbert C-Module
RKHM is a generalization of RKHS by means of C-algebra. Let Xbe a non-empty set for data.
Definition 2.7 (A-valued positive definite kernel) An A-valued map k:X × X A is called a positive
definite kernel if it satisfies the following conditions:
k(x, y) = k(y, x)for x, y ∈ X,
Pn
i,j=1 c
ik(xi, xj)cjA0for nN,ci∈ A,xi∈ X.
Let ϕ:X → AXbe the feature map associated with k, which is defined as ϕ(x) = k(·, x) for x∈ X. We
construct the following C-module composed of A-valued functions:
Mk,0:= n
X
i=1
ϕ(xi)ci
nN, ci∈ A, xi∈ X.
Define an A-valued map ⟨·,·⟩Mk:Mk,0× Mk,0→ A as
n
X
i=1
ϕ(xi)ci,
l
X
j=1
ϕ(yj)bjMk:=
n
X
i=1
l
X
j=1
c
ik(xi, yj)bj.
By the properties of kin Definition 2.7, ⟨·,·⟩Mkis well-defined and has the reproducing property
ϕ(x), vMk=v(x),
for v∈ Mk,0and x∈ X. Also, it is an A-valued inner product. The reproducing kernel Hilbert A-
module (RKHM) associated with kis defined as the completion of Mk,0. We denote by Mkthe RKHM
associated with k. In the following, we denote the inner product, absolute value, and norm in Mkby ⟨·,·⟩k,
|·|k, and ∥·∥k, respectively. Hashimoto et al. (2021) showed the representer theorem in RKHM.
Proposition 2.8 (Representer theorem) Let Abe a unital C-algebra. Let x1, . . . , xn∈ X and y1, . . . , yn
A. Let h:X × A × A A+be an error function and let g:A+→ A+satisfy g(a)Ag(b)for
aAb. Assume the module (algebraically) generated by {ϕ(xi)}n
i=1 is closed. Then, any u∈ Mkminimizing
Pn
i=1 h(xi, yi, u(xi)) + g(|u|Mk)admits a representation of the form Pn
i=1 ϕ(xi)cifor some c1, . . . , cn∈ A.
3
Figure 1: Representing samples in RKHM
3C-ALGEBRA-VALUED POSITIVE DEFINITE KERNELS
To investigate the supervised learning problem in RKHM, we begin by constructing suitable C-algebra-valued
positive definite kernels. The product structure used in these kernels will be shown to be effective in analyzing
image data. However, the proposed kernels are general, and their application is not limited to image data.
Let A1be a C-algebra. By the Gelfand–Naimark theorem (see, for example, Murphy 1990), there exists
a Hilbert space Hsuch that A1is a subalgebra of the C-algebra A2of bounded linear operators on H. For
image data we can set A1and A2as follows.
Example 3.1 Let pN,A1=C(Z/pZ), and A2=Cp×p. Then, A1is a subalgebra of A2. Indeed, by
Lemma 2.3, A1Circ(p). For example, in image processing, we represent filters by circulant matrices (Chanda
& Majumder, 2011). If we regard Z/pZas the space of ppixels, then elements in C(Z/pZ)can be regarded as
functions from pixels to intensities. Thus, we can also regard grayscale and color images with ppixels as elements
in C(Z/pZ)and C(Z/pZ)3, respectively. Note that A2is noncommutative, although A1is commutative.
We consider the case where the inputs are in Ad
1for dNand define linear, polynomial, and Gaussian C-
algebra-valued positive definite kernels as follows. For example, we can consider the case where inputs are d
images.
Definition 3.2 Let X ⊆ Ad
1and x= [x1, . . . , xd]∈ X.
1. For ai,1, ai,2∈ A2, the linear kernel k:X × X A2is defined as k(x, y) = Pd
i=1 a
i,1x
ia
i,2ai,2yiai,1.
2. For qNand ai,j ∈ A2(i= 1, . . . d, j = 1,...q+ 1), the polynomial kernel k:X ×X → A2is defined as
k(x, y) =
d
X
i=1 q
Y
j=1
a
i,j x
ia
i,q+1ai,q+1q
Y
j=1
yiai,q+1j.
3. Let be a measurable space and µis an A2-valued positive measure on .1For ai,1, ai,2: Ω → A2, the
Gaussian kernel k:X × X A2is defined as
k(x, y) = Zω
e1Pd
i=1 ai,1(ω)x
iai,2(ω)dµ(ω)e1Pd
i=1 ai,2(ω)yiai,1(ω).
Here, we assume the integral does not diverge.
Remark 3.3 We can construct new kernels by the composition of functions to the kernels defined in Defi-
nition 3.2. For example, let ψi,j :A1→ A2for i= 1,...d and j= 1, . . . , q + 1. Then, the map defined by
replacing xiand yiin the polynomial kernel by ψi,j (xi)and ψi,j (yi)is also an C-algebra-valued positive definite
kernel.
1See Hashimoto et al. (2021, Appendix B) for a rigorous definition.
4
Figure 2: Product in A1and A2in Example 3.4 Figure 3: Comparison of RKHM with vvRKHS
If A1=A2=C, then the above kernels are reduced to the standard complex-valued positive definite
kernels and the RKHMs associated with them are reduced to RKHSs. In this case, if X=Ad, the input space
and the RKHS are both Hilbert spaces (Hilbert C-modules). On the other hand, for RKHMs, if we choose
A1A2, then the input space Xis a Hilbert A1-module, but the RKHM is a Hilbert A2-module, not A1-
module. Applying RKHMs, we can construct higher dimensional spaces than input spaces but also enlarge the
C-algebras characterizing the RKHMs, which allows us to represent more functions than RKHSs and make
use of the product structure in A2. Figure 1 schematically shows the representation of samples in RKHM. We
show an example related to image data below.
Example 3.4 If A1=C(Z/pZ),A2=Cp×p(A1A2), and ai,j ∈ A1, then ai,j in Definition 3.2 behaves as
convolutional filters. In fact, by Definition 2.1, the multiplication of ai,j and xiis represented by the convolution.
The convolution of two functions corresponds to the multiplication of each Fourier component of them. Thus,
each Fourier component of xidoes not interact with other Fourier components. Choosing ai,j ∈ A2outside A1
corresponds to the multiplication of different Fourier components of two functions. Indeed, let x∈ A1. Then, by
Lemma 2.3, xis represented as a circulant matrix and by Lemma 2.2, it is decomposed as x=FΛxF. In this
case, Λxis the diagonal matrix whose ith diagonal is the ith Fourier component (FC) of x. Thus, if ai,j ∈ A1,
then we have xai,j =FΛxΛai,j Fand each Fourier component of xis multiplied by the same Fourier component
of ai,j . On the other hand, if ai,j ∈ A2\ A1, then Λai,j is not a diagonal matrix, and the elements of ΛxΛai,j
are composed of the weighted sum of different Fourier components of x. Figure 2 summarizes this example.
Comparison with vvRKHS From the perspective of vvRKHS, defining kernels as in Definition 3.2 is difficult
since for vvRKHS, the output space is a Hilbert space, and we do not have product structures in it. Indeed,
the inner product in a vvRKHS is described by an action of an operator on a vector. We can regard the vector
as a rank-one operator whose range is the one-dimensional space spanned by the vector. Thus, the action is
regarded as the product of only two operators. On the other hand, from the perspective of C-algebra, we can
multiply more than two elements in C-algebra, which allows us to define C-algebra-valued kernels naturally
in the same manner as complex-valued kernels. See Figure 3 for a schematic explanation.
4 SUPERVISED LEARNING IN RKHM
We investigate supervised learning in RKHM. We first formulate the problem and derive a learning algorithm.
Then, we characterize its generalization error and investigate its computational complexity.
We do not assume X ⊆ Ad
1in Subsections 4.1 and 4.2. The input space Xcan be an arbitrary nonempty
set in these sections. Thus, although we focus on the case of X ⊆ Ad
1in this paper, the supervised learning in
RKHM is applied to general problems whose output space is a C-algebra A.
4.1 Problem Setting
Let x1, . . . , xn∈ X be input training samples and y1, . . . , yn∈ A be output training samples. Let k:X ×X A
be an A-valued positive definite kernel, and let ϕand Mkbe the feature map and RKHM associated with k,
5
摘要:

LearninginRKHM:aC∗-AlgebraicTwistforKernelMachinesYukaHashimoto1,2MasahiroIkeda2,3HachemKadri41.NTTNetworkServiceSystemsLaboratories,NTTCorporation,Tokyo,Japan2.CenterforAdvancedIntelligenceProject,RIKEN,Tokyo,Japan3.KeioUniversity,Yokohama,Japan4.Aix-MarseilleUniversity,CNRS,LIS,Marseille,FranceAbs...

展开>> 收起<<
Learning in RKHM a C-Algebraic Twist for Kernel Machines Yuka Hashimoto12Masahiro Ikeda23Hachem Kadri4 1. NTT Network Service Systems Laboratories NTT Corporation Tokyo Japan.pdf

共21页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:21 页 大小:876.01KB 格式:PDF 时间:2025-04-24

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 21
客服
关注