Learning in RKHM a C-Algebraic Twist for Kernel Machines Yuka Hashimoto12Masahiro Ikeda23Hachem Kadri4 1. NTT Network Service Systems Laboratories NTT Corporation Tokyo Japan

2025-04-24 0 0 876.01KB 21 页 10玖币

侵权投诉

Learning in RKHM: a C∗-Algebraic Twist for Kernel Machines

Yuka Hashimoto1,2Masahiro Ikeda2,3Hachem Kadri4

1. NTT Network Service Systems Laboratories, NTT Corporation, Tokyo, Japan

2. Center for Advanced Intelligence Project, RIKEN, Tokyo, Japan

3. Keio University, Yokohama, Japan

4. Aix-Marseille University, CNRS, LIS, Marseille, France

Abstract

Supervised learning in reproducing kernel Hilbert space (RKHS) and vector-valued RKHS (vvRKHS)

has been investigated for more than 30 years. In this paper, we provide a new twist to this rich literature by

generalizing supervised learning in RKHS and vvRKHS to reproducing kernel Hilbert C∗-module (RKHM),

and show how to construct eﬀective positive-deﬁnite kernels by considering the perspective of C∗-algebra.

Unlike the cases of RKHS and vvRKHS, we can use C∗-algebras to enlarge representation spaces. This

enables us to construct RKHMs whose representation power goes beyond RKHSs, vvRKHSs, and existing

methods such as convolutional neural networks. Our framework is suitable, for example, for eﬀectively

analyzing image data by allowing the interaction of Fourier components.

1 INTRODUCTION

Supervised learning in reproducing kernel Hilbert space (RKHS) has been actively investigated since the early

1990s (Murphy, 2012; Christmann & Steinwart, 2008; Shawe-Taylor & Cristianini, 2004; Sch¨olkopf & Smola,

2002; Boser et al., 1992). The notion of reproducing kernels as dot products in Hilbert spaces was ﬁrst brought to

the ﬁeld of machine learning by Aizerman et al. (1964), while the theoretical foundation of reproducing kernels

and their Hilbert spaces dates back to at least Aronszajn (1950). By virtue of the representer theorem (Sch¨olkopf

et al., 2001), we can compute the solution of an inﬁnite-dimensional minimization problem in RKHS with given

ﬁnite samples. In addition to the standard RKHSs, applying vector-valued RKHSs (vvRKHSs) to supervised

learning has also been proposed and used in analyzing vector-valued data (Micchelli & Pontil, 2005; ´

Alvarez

et al., 2012; Kadri et al., 2016; Minh et al., 2016; Brouard et al., 2016; Laforgue et al., 2020; Huusari & Kadri,

2021). Generalization bounds of the supervised problems in RKHS and vvRKHS are also derived (Mohri et al.,

2018; Caponnetto & De Vito, 2007; Audiﬀren & Kadri, 2013; Huusari & Kadri, 2021).

Reproducing kernel Hilbert C∗-module (RKHM) is a generalization of RKHS and vvRKHS by means of

C∗-algebra. C∗-algebra is a generalization of the space of complex values. It has a product and an involution

structures. Important examples are the C∗-algebra of bounded linear operators on a Hilbert space and the

C∗-algebra of continuous functions on a compact space. RKHMs have been originally studied for pure operator

algebraic and mathematical physics problems (Manuilov & Troitsky, 2000; Heo, 2008; Moslehian, 2022). Re-

cently, applying RKHMs to data analysis has been proposed by Hashimoto et al. (2021). They generalized the

representer theorem in RKHS to RKHM, which allows us to analyze structured data such as functional data

with C∗-algebras.

In this paper, we investigate supervised learning in RKHM. This provides a new twist to the state-of-the-art

kernel-based learning algorithms and the development of a novel kind of reproducing kernels. An advantage of

RKHM over RKHS and vvRKHS is that we can enlarge the C∗-algebra characterizing the RKHM to construct

a representation space. This allows us to represent more functions than the case of RKHS and make use of the

product structure in the C∗-algebra. Our main contributions are:

arXiv:2210.11855v3 [stat.ML] 26 Jun 2024

•We deﬁne positive deﬁnite kernels from the perspective of C∗-algebra, which are suitable for learning in

RKHM and adapted to analyze image data.

•We derive a generalization bound of the supervised learning problem in RKHM, which generalizes existing

results of RKHS and vvRKHS. We also show that the computational complexity of our method can be

reduced if parameters in the C∗-algebra-valued positive deﬁnite kernels have speciﬁc structures.

•We show that our framework generalizes existing methods based on convolution operations.

Important applications of the supervised learning in RKHM are tasks whose inputs and outputs are images. If

the proposed kernels have speciﬁc parameters, then the product structure is the convolution, which corresponds

to the pointwise product of Fourier components. By extending the C∗-algebra to a larger one, we can enjoy

more general operations than the convolutions. This enables us to analyze image data eﬀectively by making

interactions between Fourier components. Regarding the generalization bound, we derive the same type of

bound as those obtained for RKHS and vvRKHS via Rademacher complexity theory. This is to our knowledge,

the ﬁrst generalization bound for RKHM hypothesis classes. Concerning the connection with existing methods,

we show that using our framework, we can reconstruct existing methods such as the convolutional neural

network (LeCun et al., 1998) and the convolutional kernel (Mairal et al., 2014) and further generalize them.

This fact implies that the representation power of our framework goes beyond the existing methods.

The remainder of this paper is organized as follows: In Section 2, we review mathematical notions related

to this paper. We propose C∗-algebra-valued positive deﬁnite kernels in Section 3 and investigate supervised

learning in RKHM in Section 4. Then, we show connections with existing convolution-based methods in

Section 5. We conﬁrm the advantage of our method numerically in Section 6 and conclude the paper in

Section 7. All technical proofs are in Section B.

2 PRELIMINARIES

2.1 C∗-Algebra and Hilbert C∗-Module

C∗-algebra is a Banach space equipped with a product and an involution that satisﬁes the C∗identity. See

Section A for more details. An example of C∗-algebra is group C∗-algebra (Kirillov, 1976). Let p∈N, and let

Z/pZbe the set of integers modulo p.

Deﬁnition 2.1 (Group C∗-algebra on a ﬁnite cyclic group) Let ω= e2π√−1/p. The group C∗-algebra on

Z/pZ, which is denoted as C∗(Z/pZ), is the set of maps from Z/pZto Cequipped with the following product,

involution, and norm:

•(x·y)(z) = Pw∈Z/pZx(z−w)y(w)for z∈Z/pZ,

•x∗(z) = x(−z),

•∥x∥= maxn∈{0,...,p−1}|Pz∈Z/pZx(z)ωzn|.

Since the product is the convolution, group C∗-algebras oﬀer a new way to deﬁne positive deﬁnite kernels,

which are eﬀective in analyzing image data as we will see in Section 3. Elements in C∗(Z/pZ) are described by

circulant matrices (Gray, 2006). Let Circ(p) = {x∈Cp×p|xis a circulant matrix}. Moreover, we denote the

circulant matrix whose ﬁrst row is vas circ(v). The discrete Fourier transform (DFT) matrix, whose (i, j)-entry

is ω(i−1)(j−1)/√p, is denoted as F.

Lemma 2.2 Any circulant matrix x∈Circ(p)has an eigenvalue decomposition x=FΛxF∗, where

Λx= diag X

z∈Z/pZ

x(z)ωz·0,..., X

z∈Z/pZ

x(z)ωz(p−1).

Lemma 2.3 The group C∗-algebra C∗(Z/pZ)is C∗-isomorphic to Circ(p).

We now review important notions about C∗-algebra. We denote a C∗-algebra by A.

Deﬁnition 2.4 (Positive) An element aof Ais called positive if there exists b∈ A such that a=b∗bholds.

For a, b ∈ A, we write a≤Abif b−ais positive, and a⪇Abif b−ais positive and not zero. We denote by

A+the subset of Acomposed of all positive elements in A.

Deﬁnition 2.5 (Minimum) For a subset Sof A,a∈ A is said to be a lower bound with respect to the order

≤A, if a≤Abfor any b∈ S. Then, a lower bound c∈ A is said to be an inﬁmum of S, if a≤Acfor any lower

bound aof S. If c∈ S, then cis said to be a minimum of S.

Hilbert C∗-module is a generalization of Hilbert space. We can deﬁne an A-valued inner product and a (real

nonnegative-valued) norm as a natural generalization of the complex-valued inner product. See Section A for

further details. Then, we deﬁne Hilbert C∗-module as follows.

Deﬁnition 2.6 (Hilbert C∗-module) Let Mbe a C∗-module over Aequipped with an A-valued inner prod-

uct. If Mis complete with respect to the norm induced by the A-valued inner product, it is called a Hilbert

C∗-module over Aor Hilbert A-module.

2.2 Reproducing Kernel Hilbert C∗-Module

RKHM is a generalization of RKHS by means of C∗-algebra. Let Xbe a non-empty set for data.

Deﬁnition 2.7 (A-valued positive deﬁnite kernel) An A-valued map k:X × X → A is called a positive

deﬁnite kernel if it satisﬁes the following conditions:

•k(x, y) = k(y, x)∗for x, y ∈ X,

•Pn

i,j=1 c∗

ik(xi, xj)cj≥A0for n∈N,ci∈ A,xi∈ X.

Let ϕ:X → AXbe the feature map associated with k, which is deﬁned as ϕ(x) = k(·, x) for x∈ X. We

construct the following C∗-module composed of A-valued functions:

Mk,0:= n

i=1

ϕ(xi)ci

n∈N, ci∈ A, xi∈ X.

Deﬁne an A-valued map ⟨·,·⟩Mk:Mk,0× Mk,0→ A as



i=1

ϕ(xi)ci,

j=1

ϕ(yj)bjMk:=

i=1

j=1

c∗

ik(xi, yj)bj.

By the properties of kin Deﬁnition 2.7, ⟨·,·⟩Mkis well-deﬁned and has the reproducing property

⟨ϕ(x), v⟩Mk=v(x),

for v∈ Mk,0and x∈ X. Also, it is an A-valued inner product. The reproducing kernel Hilbert A-

module (RKHM) associated with kis deﬁned as the completion of Mk,0. We denote by Mkthe RKHM

associated with k. In the following, we denote the inner product, absolute value, and norm in Mkby ⟨·,·⟩k,

|·|k, and ∥·∥k, respectively. Hashimoto et al. (2021) showed the representer theorem in RKHM.

Proposition 2.8 (Representer theorem) Let Abe a unital C∗-algebra. Let x1, . . . , xn∈ X and y1, . . . , yn∈

A. Let h:X × A × A → A+be an error function and let g:A+→ A+satisfy g(a)⪇Ag(b)for

a⪇Ab. Assume the module (algebraically) generated by {ϕ(xi)}n

i=1 is closed. Then, any u∈ Mkminimizing

i=1 h(xi, yi, u(xi)) + g(|u|Mk)admits a representation of the form Pn

i=1 ϕ(xi)cifor some c1, . . . , cn∈ A.

Figure 1: Representing samples in RKHM

3C∗-ALGEBRA-VALUED POSITIVE DEFINITE KERNELS

To investigate the supervised learning problem in RKHM, we begin by constructing suitable C∗-algebra-valued

positive deﬁnite kernels. The product structure used in these kernels will be shown to be eﬀective in analyzing

image data. However, the proposed kernels are general, and their application is not limited to image data.

Let A1be a C∗-algebra. By the Gelfand–Naimark theorem (see, for example, Murphy 1990), there exists

a Hilbert space Hsuch that A1is a subalgebra of the C∗-algebra A2of bounded linear operators on H. For

image data we can set A1and A2as follows.

Example 3.1 Let p∈N,A1=C∗(Z/pZ), and A2=Cp×p. Then, A1is a subalgebra of A2. Indeed, by

Lemma 2.3, A1≃Circ(p). For example, in image processing, we represent ﬁlters by circulant matrices (Chanda

& Majumder, 2011). If we regard Z/pZas the space of ppixels, then elements in C∗(Z/pZ)can be regarded as

functions from pixels to intensities. Thus, we can also regard grayscale and color images with ppixels as elements

in C∗(Z/pZ)and C∗(Z/pZ)3, respectively. Note that A2is noncommutative, although A1is commutative.

We consider the case where the inputs are in Ad

1for d∈Nand deﬁne linear, polynomial, and Gaussian C∗-

algebra-valued positive deﬁnite kernels as follows. For example, we can consider the case where inputs are d

images.

Deﬁnition 3.2 Let X ⊆ Ad

1and x= [x1, . . . , xd]∈ X.

1. For ai,1, ai,2∈ A2, the linear kernel k:X × X → A2is deﬁned as k(x, y) = Pd

i=1 a∗

i,1x∗

ia∗

i,2ai,2yiai,1.

2. For q∈Nand ai,j ∈ A2(i= 1, . . . d, j = 1,...q+ 1), the polynomial kernel k:X ×X → A2is deﬁned as

k(x, y) =

i=1 q

j=1

a∗

i,j x∗

ia∗

i,q+1ai,q+1q

j=1

yiai,q+1−j.

3. Let Ωbe a measurable space and µis an A2-valued positive measure on Ω.1For ai,1, ai,2: Ω → A2, the

Gaussian kernel k:X × X → A2is deﬁned as

k(x, y) = Zω∈Ω

e−√−1Pd

i=1 ai,1(ω)∗x∗

iai,2(ω)∗dµ(ω)e√−1Pd

i=1 ai,2(ω)yiai,1(ω).

Here, we assume the integral does not diverge.

Remark 3.3 We can construct new kernels by the composition of functions to the kernels deﬁned in Deﬁ-

nition 3.2. For example, let ψi,j :A1→ A2for i= 1,...d and j= 1, . . . , q + 1. Then, the map deﬁned by

replacing xiand yiin the polynomial kernel by ψi,j (xi)and ψi,j (yi)is also an C∗-algebra-valued positive deﬁnite

kernel.

1See Hashimoto et al. (2021, Appendix B) for a rigorous deﬁnition.

Figure 2: Product in A1and A2in Example 3.4 Figure 3: Comparison of RKHM with vvRKHS

If A1=A2=C, then the above kernels are reduced to the standard complex-valued positive deﬁnite

kernels and the RKHMs associated with them are reduced to RKHSs. In this case, if X=Ad, the input space

and the RKHS are both Hilbert spaces (Hilbert C-modules). On the other hand, for RKHMs, if we choose

A1⊊A2, then the input space Xis a Hilbert A1-module, but the RKHM is a Hilbert A2-module, not A1-

module. Applying RKHMs, we can construct higher dimensional spaces than input spaces but also enlarge the

C∗-algebras characterizing the RKHMs, which allows us to represent more functions than RKHSs and make

use of the product structure in A2. Figure 1 schematically shows the representation of samples in RKHM. We

show an example related to image data below.

Example 3.4 If A1=C∗(Z/pZ),A2=Cp×p(A1⊊A2), and ai,j ∈ A1, then ai,j in Deﬁnition 3.2 behaves as

convolutional ﬁlters. In fact, by Deﬁnition 2.1, the multiplication of ai,j and xiis represented by the convolution.

The convolution of two functions corresponds to the multiplication of each Fourier component of them. Thus,

each Fourier component of xidoes not interact with other Fourier components. Choosing ai,j ∈ A2outside A1

corresponds to the multiplication of diﬀerent Fourier components of two functions. Indeed, let x∈ A1. Then, by

Lemma 2.3, xis represented as a circulant matrix and by Lemma 2.2, it is decomposed as x=FΛxF∗. In this

case, Λxis the diagonal matrix whose ith diagonal is the ith Fourier component (FC) of x. Thus, if ai,j ∈ A1,

then we have xai,j =FΛxΛai,j F∗and each Fourier component of xis multiplied by the same Fourier component

of ai,j . On the other hand, if ai,j ∈ A2\ A1, then Λai,j is not a diagonal matrix, and the elements of ΛxΛai,j

are composed of the weighted sum of diﬀerent Fourier components of x. Figure 2 summarizes this example.

Comparison with vvRKHS From the perspective of vvRKHS, deﬁning kernels as in Deﬁnition 3.2 is diﬃcult

since for vvRKHS, the output space is a Hilbert space, and we do not have product structures in it. Indeed,

the inner product in a vvRKHS is described by an action of an operator on a vector. We can regard the vector

as a rank-one operator whose range is the one-dimensional space spanned by the vector. Thus, the action is

regarded as the product of only two operators. On the other hand, from the perspective of C∗-algebra, we can

multiply more than two elements in C∗-algebra, which allows us to deﬁne C∗-algebra-valued kernels naturally

in the same manner as complex-valued kernels. See Figure 3 for a schematic explanation.

4 SUPERVISED LEARNING IN RKHM

We investigate supervised learning in RKHM. We ﬁrst formulate the problem and derive a learning algorithm.

Then, we characterize its generalization error and investigate its computational complexity.

We do not assume X ⊆ Ad

1in Subsections 4.1 and 4.2. The input space Xcan be an arbitrary nonempty

set in these sections. Thus, although we focus on the case of X ⊆ Ad

1in this paper, the supervised learning in

RKHM is applied to general problems whose output space is a C∗-algebra A.

4.1 Problem Setting

Let x1, . . . , xn∈ X be input training samples and y1, . . . , yn∈ A be output training samples. Let k:X ×X → A

be an A-valued positive deﬁnite kernel, and let ϕand Mkbe the feature map and RKHM associated with k,

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

LearninginRKHM:aC∗-AlgebraicTwistforKernelMachinesYukaHashimoto1,2MasahiroIkeda2,3HachemKadri41.NTTNetworkServiceSystemsLaboratories,NTTCorporation,Tokyo,Japan2.CenterforAdvancedIntelligenceProject,RIKEN,Tokyo,Japan3.KeioUniversity,Yokohama,Japan4.Aix-MarseilleUniversity,CNRS,LIS,Marseille,FranceAbs...

展开>> 收起<<

Learning in RKHM a C-Algebraic Twist for Kernel Machines Yuka Hashimoto12Masahiro Ikeda23Hachem Kadri4 1. NTT Network Service Systems Laboratories NTT Corporation Tokyo Japan.pdf

共21页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Learning in RKHM a C-Algebraic Twist for Kernel Machines Yuka Hashimoto12Masahiro Ikeda23Hachem Kadri4 1. NTT Network Service Systems Laboratories NTT Corporation Tokyo Japan

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: