ONLINE AND LIGHTWEIGHT KERNEL-BASED APPROXIMATE POLICY ITERATION FOR DYNAMIC P-NORM LINEAR ADAPTIVE FILTERING Yuki Akiyama Minh Vu Konstantinos Slavakis

2025-05-02 0 0 1.37MB 5 页 10玖币

侵权投诉

ONLINE AND LIGHTWEIGHT KERNEL-BASED APPROXIMATE POLICY ITERATION

FOR DYNAMIC P-NORM LINEAR ADAPTIVE FILTERING

Yuki Akiyama Minh Vu Konstantinos Slavakis

Tokyo Institute of Technology, Japan

Department of Information and Communications Engineering

Emails: {akiyama.y.am, vu.d.aa, slavakis.k.aa}@m.titech.ac.jp

ABSTRACT

This paper introduces a solution to the problem of selecting dy-

namically (online) the “optimal” p-norm to combat outliers in lin-

ear adaptive ﬁltering without any knowledge on the probability den-

sity function of the outliers. The proposed online and data-driven

framework is built on kernel-based reinforcement learning (KBRL).

To this end, novel Bellman mappings on reproducing kernel Hilbert

spaces (RKHSs) are introduced. These mappings do not require

any knowledge on transition probabilities of Markov decision pro-

cesses, and are nonexpansive with respect to the underlying Hilber-

tian norm. The ﬁxed-point sets of the proposed Bellman mappings

are utilized to build an approximate policy-iteration (API) frame-

work for the problem at hand. To address the “curse of dimension-

ality” in RKHSs, random Fourier features are utilized to bound the

computational complexity of the API. Numerical tests on synthetic

data for several outlier scenarios demonstrate the superior perfor-

mance of the proposed API framework over several non-RL and

KBRL schemes.

1. INTRODUCTION

The least-squares (LS) error/loss (between an observed value and its

predicted one) plays a pivotal role in signal processing, e.g., adap-

tive ﬁltering [1], and machine learning [2]. For example, the least-

mean squares (LMS) and recursive (R)LS [1] are two celebrated al-

gorithms in adaptive ﬁltering and stochastic approximation based on

the LS-error criterion. Notwithstanding, LS methods are notoriously

sensitive to the presence of outliers within data [3], where outliers

are deﬁned as (sparsely) contaminating data that do not adhere to

a nominal data generation model, and are often modeled as ran-

dom variables (RVs) with non-Gaussian heavy tailed distributions,

e.g., α-stable ones [4]. To combat outliers, several non-LS criteria,

such as least mean p-power (LMP) [5–10] and maximum corren-

tropy (MC) [11], have been studied. This work focuses on the LMP

criterion, owing to the well-documented robustness of LMP against

outliers [12], while results on MC will be reported elsewhere.

This study is built on the classical data-generation model yn=

θ|

∗xn+on, where n∈Ndenotes discrete time (Nis the set of all

non-negative integers), θ∗is the L×1vector whose entries are the

system parameters that need to be identiﬁed, onis the RV which

models outliers/noise, (xn, yn)stands for the input-output pair of

available data, where xnis an L×1vector and ynis real-valued, and

|denotes vector/matrix transposition. For an arbitrarily ﬁxed θ0, the

LMP algorithm [5] generates estimates (θn)n∈Nof θ∗according to

the following recursion:

θn+1 :=θn+ρp|en|p−2enxn,(1)

where en:=yn−x|

nθn,ρis the learning rate (step size), and p

is a ﬁxed user-deﬁned real-valued number within the interval [1,2]

to ensure that the p-norm loss |yn−x|

nθ|pis a convex function of

θ[5]. Notice that if p= 1 and 2, then (1) boils down to the classical

sign-LMS and LMS, respectively [1].

Intuition suggests that the choice of pshould be based on the

probability density function (PDF) of the RV on. For example, if

onobeys a Gaussian PDF, then p= 2 should be chosen (recall

the maximum-likelihood criterion). To enhance robustness against

outliers, combination of adaptive ﬁlters with different forgetting fac-

tors, but with the same ﬁxed p-norm, have been also introduced [8].

Nevertheless, it seems that an online and data-driven solution to the

problem of dynamically selecting p,without any prior knowledge on

the PDF of on, is yet to be found.

This work offers a solution to the aforementioned open prob-

lem via reinforcement learning (RL) [13]; a machine-learning

paradigm where an “agent” interacts with the surrounding envi-

ronment to identify iteratively the policy which minimizes the

cost of its “actions.” More speciﬁcally, the well-known policy-

iteration (PI) framework [13] of RL is adopted, because of its well-

documented merits (e.g., [14–16]) over the alternative RL frame-

works of temporal-difference (TD) and Q-learning [13], especially

for continuous and high-dimensional state spaces. PI comprises two

stages at every iteration n:policy evaluation and policy improve-

ment. At policy evaluation, the current policy is evaluated by a

Q-function [13], which represents, loosely speaking, the long-term

cost that the agent would suffer had the current policy been chosen to

determine the next state, whereas at the policy-improvement stage,

the agent uses the Q-function value to update the policy. The under-

lying state space is considered to be continuous, due to the nature

of (xn, yn), while the action space is considered to be discrete: an

action is a value of ptaken from a ﬁnite grid of the interval [1,2].

Deep neural networks offer approximating spaces for Q-functions,

e.g., [17], but they may require processing of batch data (even re-

training) during online-mode operation, since they may face test data

generated by PDFs different from those of the training ones (dy-

namic environments). Such batch processing inﬂicts large compu-

tational times and complexity, discouraging the application of deep

neural networks to online modes of operation where a small compu-

tational footprint is desired. To meet such computational complexity

requirements, this study builds an approximate (A)PI framework for

online RL along the lines of kernel-based (KB)RL [14–16, 18–25].

Central to the proposed API is the construction of novel Bellman

mappings [13, 26]. The proposed Bellman mappings are deﬁned on

a reproducing kernel Hilbert space (RKHS) H[27,28], which serves

as the approximating space for the Q-functions. Unlike the classi-

cal Bellman operators, where information on transition probabilities

in a Markov decision process is needed [13], the proposed Bellman

mappings make no use of such information, and need neither train-

ing/ofﬂine data nor past policies, but sample and average the sample

space on the ﬂy, at each iteration n, to perform exploration of the

surrounding environment. This suits the current adaptive-ﬁltering

setting, where the presence of outliers, with a possibly time-varying

arXiv:2210.11755v1 [cs.LG] 21 Oct 2022

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ONLINEANDLIGHTWEIGHTKERNEL-BASEDAPPROXIMATEPOLICYITERATIONFORDYNAMICP-NORMLINEARADAPTIVEFILTERINGYukiAkiyamaMinhVuKonstantinosSlavakisTokyoInstituteofTechnology,JapanDepartmentofInformationandCommunicationsEngineeringEmails:fakiyama.y.am,vu.d.aa,slavakis.k.aag@m.titech.ac.jpABSTRACTThispaperintroduc...

展开>> 收起<<

ONLINE AND LIGHTWEIGHT KERNEL-BASED APPROXIMATE POLICY ITERATION FOR DYNAMIC P-NORM LINEAR ADAPTIVE FILTERING Yuki Akiyama Minh Vu Konstantinos Slavakis.pdf

共5页,预览1页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

ONLINE AND LIGHTWEIGHT KERNEL-BASED APPROXIMATE POLICY ITERATION FOR DYNAMIC P-NORM LINEAR ADAPTIVE FILTERING Yuki Akiyama Minh Vu Konstantinos Slavakis

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: