LOCALLY SMOOTHED GAUSSIAN PROCESS REGRESSION Davit Gogolashvili Bogdan Kozyrskiy Maurizio Filippone Department of Data Science EURECOM

2025-05-02 0 0 582.23KB 10 页 10玖币

侵权投诉

LOCALLY SMOOTHED GAUSSIAN PROCESS REGRESSION

Davit Gogolashvili, Bogdan Kozyrskiy Maurizio Filippone

Department of Data Science, EURECOM

450 Route des Chappes, 06410 Biot, France

{davit.gogolashvili, bogdan.kozyrskiy, maurizio.filippone}@eurecom.fr

ABSTRACT

We develop a novel framework to accelerate Gaussian process regression (GPR). In particular, we

consider localization kernels at each data point to down-weigh the contributions from other data

points that are far away, and we derive the GPR model stemming from the application of such

localization operation. Through a set of experiments, we demonstrate the competitive performance of

the proposed approach compared to full GPR, other localized models, and deep Gaussian processes.

Crucially, these performances are obtained with considerable speedups compared to standard global

GPR due to the sparsiﬁcation effect of the Gram matrix induced by the localization operation.

Keywords Gaussian processes ·Kernel smoothing ·Local regression

1 Introduction

Function estimation is a fundamental problem in Machine Learning. In supervised learning tasks applied to a data set

composed of observed input data and labels, the goal of function estimation is to establish a mapping between these

two groups of observed quantities. Function estimation can be approached in various ways, and we can broadly divide

algorithms in two categories, as global and local. Examples of global algorithms are Neural Networks [

] and kernel

machines [

], which impose a functional form yielding a global representation of the function. The functional form is

parameterized by a set of parameters which are optimized or inferred based on all the available data. The estimated

model can later be used to query the function at any input points of interest. In local algorithms such as K-Nearest

Neighbors (KNN), instead, the target point is ﬁxed and the corresponding value of the function is estimated based on

the closest data available.

Obviously, any global algorithm can be made local by training it only for the few training points located in the vicinity

of the target test point. While it may seem that the idea of localizing global algorithms is not a very profound one,

empirical evidence shows that localization could improve the performance of the best global models [

]. The idea of

localization was therefore applied to global models such as SVMs [

]. In addition to performance gains, by operating

on smaller sets of data points, these local approaches enjoy computational advantages, which are particularly attractive

for kernel machines for which scalability with the number of data points is generally an issue [6,7,8].

In this work, we develop novel ideas to implement a localization of Gaussian processes (GPs) in order to obtain

performance gains, as well as computational ones. GPs are great candidates to beneﬁt from computational speedups

given that a naïve implementation requires expensive algebraic computations with the covariance matrix; denoting by

the number of input data, such operations cost

O(n3)

operations and require storing

O(n2)

elements, hindering the

applicability of GPs to data sets beyond a few thousand data points [

]. Another issue with GPs is how to choose a

suitable kernel for the problem at hand so as to avoid problems of model misspeciﬁcation. Both of these issues have

been addressed in various ways, by proposing scalable approximations based on inducing points [

] and random

features [11,12], and by composing GPs to obtain a rich and ﬂexible class of equivalent kernels [13].

In this work we explore an alternative way to address scalability and kernel design issues by localizing GPs. In particular,

we show how the localization operation leads to a particular form for the localized GP, and what is the effect on the

kernel of this model. Furthermore, the localization makes it apparent how to implement the model with considerable

arXiv:2210.09998v1 [stat.ML] 18 Oct 2022

Locally Smoothed Gaussian Process Regression

gains compared to other approaches to approximate GPs. We demonstrate such performance gains on regression tasks

on standard UCI benchmarks [14].

1.1 Related work

Local learning algorithms were introduced by Bottou and Vapnik [

], with the main objective of estimating the optimal

decision function for each single testing point. Examples of local learning algorithms include the well-known K-Nearest

Neighbor regression [

] and local polynomial regression [

]. These methods provide simple means for solving

regression problems for the cases where training data are nonstationary or their size is prohibitively large for building

a global model. However, neither of these methods provides ways to quantify uncertainty in predictions, which is a

highly desirable feature in cost-sensitive applications.

Gaussian Process Regression (GPR) [

] is a popular nonparametric regression method based on Bayesian principles

which provides uncertainty estimates for its predictions. Similarly to other kernel methods (e.g., SVMs and KRR), GPR

is a global method, meaning that it takes into account the whole dataset at prediction time. Thus, GPR inherits the

computational complexity of global kernel methods, which is prohibitive for large datasets. Among the large class of

scalable approximations for GPR, successful ones are based on Random Fourier Features [

] and on sparsiﬁcation of

the Gram matrix induced by the kernel [17].

Random feature approximation of the kernel proposed in [

] is based on Bochner theorem and allows representing the

kernel function as a dot product of (possibly inﬁnite) feature maps applied to the input data. In practice, inﬁnite feature

maps are replaced by a ﬁnite Monte Carlo approximation. The disadvantage of this approach is that it is necessary to

construct a speciﬁc random feature mapping for each type of kernel. While random feature approximations are known

for popular kernels such as RBF [

] and polynomial [

], there is no straightforward application of this method to

approximate arbitrary kernels.

The Gram matrix sparsiﬁcation approach is based on the idea of introducing so-called inducing points in order to

approximate the full Gram matrix. One of the most popular methods in this family is the Nyström approximation [

The main drawback of this approach is that a low number of inducing points might lead to a poor approximation of the

original model, which affects predictive performance. An important advancement within this family of approaches

which provides a scalable variational formulation was proposed in [19].

While providing good performance and scalability for large datasets, these approaches still require some design choices

for the kernel. For stationary kernels, they assume that the same kernel is suitable for all the regions of input space and

if data are nonstationary, this may harm the predictive performance. The literature has a wide range of proposals to

address kernel design by incorporating ideas from deep learning [13,20].

Recently, partitioning strategies have also gained some attention. The main idea is to divide the input space in regions

where local estimators are deﬁned [

]. In partition-based methods, the main challenge is to deﬁne an

effective partitioning of the space.

There are several approaches that use the idea of local learning for training GP models. The method proposed in [

]

and extended in [

] mostly focuses on Bayesian parametric linear regression. The methods in these papers build an

ensemble of local models centered at several ﬁxed points, where each training point is weighted accordingly to the

distance from the center of the model. Predictions are computed as a weighted sum of the local models. The authors

claim that their approach extends to GPR, but in this case each local model considers the full training set. This means

that these methods use localization to address nonstationarity, but poorly scale to large datasets. The method proposed

in [

] proposes to build local GP models that use only subsets of the training data, but it lacks a mechanism that assigns

importance weight for the training points for each local model according to the distance from the center of the model.

That is why the model can make overconﬁdent predictions for the points that lay far away from the centers of the local

models. In [

] in order to obtain fast approximate prediction at a target point, the Authors propose a forward step-wise

variable selection procedure to ﬁnd the optimal sub-design.

The paper is organized as follows: Section 2 introduces GPs and the idea of localization, along with a discussion on

the connection between our localized GPs and local kernel ridge regression. The experimental campaign in Section 3

reports results on a variety of benchmarks for regression, and Section 4 concludes the paper.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

LOCALLYSMOOTHEDGAUSSIANPROCESSREGRESSIONDavitGogolashvili,BogdanKozyrskiyMaurizioFilipponeDepartmentofDataScience,EURECOM450RoutedesChappes,06410Biot,France{davit.gogolashvili,bogdan.kozyrskiy,maurizio.filippone}@eurecom.frABSTRACTWedevelopanovelframeworktoaccelerateGaussianprocessregression(GPR).In...

展开>> 收起<<

LOCALLY SMOOTHED GAUSSIAN PROCESS REGRESSION Davit Gogolashvili Bogdan Kozyrskiy Maurizio Filippone Department of Data Science EURECOM.pdf

共10页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

LOCALLY SMOOTHED GAUSSIAN PROCESS REGRESSION Davit Gogolashvili Bogdan Kozyrskiy Maurizio Filippone Department of Data Science EURECOM

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: