DYNAMIC SELECTION OF P-NORM IN LINEAR ADAPTIVE FILTERING VIA ONLINE KERNEL-BASED REINFORCEMENT LEARNING Minh Vu Yuki Akiyama Konstantinos Slavakis

2025-05-03 0 0 924.09KB 5 页 10玖币

侵权投诉

DYNAMIC SELECTION OF P-NORM IN LINEAR ADAPTIVE FILTERING VIA

ONLINE KERNEL-BASED REINFORCEMENT LEARNING

Minh Vu Yuki Akiyama Konstantinos Slavakis

Tokyo Institute of Technology, Japan

Department of Information and Communications Engineering

Emails: {vu.d.aa, akiyama.y.am, slavakis.k.aa}@m.titech.ac.jp

ABSTRACT

This study addresses the problem of selecting dynamically, at each

time instance, the “optimal” p-norm to combat outliers in linear

adaptive ﬁltering without any knowledge on the potentially time-

varying probability density function of the outliers. To this end,

an online and data-driven framework is designed via kernel-based

reinforcement learning (KBRL). Novel Bellman mappings on repro-

ducing kernel Hilbert spaces (RKHSs) are introduced that need no

knowledge on transition probabilities of Markov decision processes,

and are nonexpansive with respect to the underlying Hilbertian norm.

An approximate policy-iteration framework is ﬁnally offered via the

introduction of a ﬁnite-dimensional afﬁne superset of the ﬁxed-point

set of the proposed Bellman mappings. The well-known “curse of

dimensionality” in RKHSs is addressed by building a basis of vec-

tors via an approximate linear dependency criterion. Numerical tests

on synthetic data demonstrate that the proposed framework selects

always the “optimal” p-norm for the outlier scenario at hand, outper-

forming at the same time several non-RL and KBRL schemes.

1. INTRODUCTION

The least-squares (LS) error/loss (between an observed value and its

predicted one) plays a pivotal role in signal processing, e.g., adap-

tive ﬁltering [1], and machine learning [2]. For example, the least-

mean squares (LMS) and recursive (R)LS [1] are two celebrated al-

gorithms in adaptive ﬁltering and stochastic approximation which

are based on the LS-error criterion. Notwithstanding, LS methods

are notoriously sensitive to the presence of outliers within data [3],

where outliers are deﬁned as (sparsely) contaminating data that do

not adhere to a nominal data generation model, and are often mod-

eled as random variables (RVs) with non-Gaussian heavy tailed dis-

tributions, e.g., α-stable ones [4, 5]. To combat the negative ef-

fects of outliers, several non-LS criteria, such as least mean p-power

(LMP) [6–11] and maximum correntropy (MC) [12], have been stud-

ied. This work focuses on the LMP criterion, owing to the well-

documented robustness of LMP against outliers [13], while results

on MC will be reported elsewhere.

This study builds around the classical data-generation model

yn=θ|

∗xn+on, where n∈Ndenotes discrete time (Nis the

set of all non-negative integers), θ∗is the L×1vector whose en-

tries are the system parameters that need to be identiﬁed, onis the

RV which models outliers/noise, (xn, yn)stands for the input-output

pair of available data, where xnis an L×1vector and ynis real-

valued, and |denotes vector/matrix transposition. For an arbitrarily

ﬁxed θ0, the LMP algorithm [6] generates estimates (θn)n∈Nof θ∗

according to the following recursion:

θn+1 :=θn+ρp|en|p−2enxn,(1)

where en:=yn−x|

nθn,ρis the learning rate (step size), and p

is a ﬁxed user-deﬁned real-valued number within the interval [1,2]

to ensure that the p-norm loss |yn−x|

nθ|pis a convex function of

θ[6]. Notice that if p= 1 and 2, then (1) boils down to the classical

sign-LMS and LMS, respectively [1].

Intuition suggests that the choice of pshould be based on the

distribution of onin the data-generation model. For example, if on

obeys a Gaussian distribution, then p= 2 should be chosen (recall

the maximum-likelihood criterion). To enhance robustness against

outliers, combination of adaptive ﬁlters with different forgetting fac-

tors, but with the same ﬁxed p-norm, have been also introduced [9].

However, it seems that an online and data-driven solution to the

problem of dynamically selecting p,without any prior knowledge

on the distribution of on, which may change with time, is yet to be

found.

This work offers a solution to the aforementioned open prob-

lem via reinforcement learning (RL) [14]; a machine-learning

paradigm where an “agent” interacts with the surrounding envi-

ronment to identify iteratively the policy which minimizes the

cost of its “actions.” More speciﬁcally, the well-known policy-

iteration (PI) framework [14] of RL is adopted, because of its

well-documented merits (e.g., [15–17]) over the alternative RL

frameworks of temporal-difference (TD) and Q-learning [14], espe-

cially for continuous and high-dimensional state spaces such as the

one considered here. PI comprises two stages at every iteration n:

policy evaluation and policy improvement. At policy evaluation, the

current policy is evaluated by a Q-function [14], which represents,

loosely speaking, the long-term cost that the agent would suffer had

the current policy been used to determine the next state, whereas at

the policy-improvement stage, the agent uses the Q-function values

to update the policy. The underlying state space is considered to be

continuous and high dimensional, due to the nature of the available

data (xn, yn), while the action space is considered to be discrete: an

action is a value of ptaken from a ﬁnite grid of the interval [1,2].

Deep neural networks offer approximating spaces for Q-

functions, e.g., [18], but they may require processing of batch data

(even re-training) during online-mode operation, since they may face

test data generated by probability density functions (PDFs) different

from those of the training ones (dynamic environments). Such batch

processing inﬂicts large computational times and complexity, dis-

couraging the application of deep neural networks to online learning

where a small complexity footprint is desired.

To meet the desired computational complexity requirements,

this study builds an approximate (A)PI framework for online RL

along the lines of kernel-based (KB)RL [15–17, 19–26]. Cen-

tral to the API design is the construction of novel Bellman map-

pings [14, 27]. The proposed Bellman mappings are deﬁned on a

reproducing kernel Hilbert space (RKHS) H[28, 29], which serves

as the approximating space for the Q-functions. In contrast to the

prevailing route in KBRL [15–17, 19–26], which views Bellman

mappings as contractions in L∞-norm Banach spaces (by deﬁnition,

no inner product available), this study introduces nonexpansive [30]

arXiv:2210.11317v2 [eess.SP] 21 Oct 2022

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

DYNAMICSELECTIONOFP-NORMINLINEARADAPTIVEFILTERINGVIAONLINEKERNEL-BASEDREINFORCEMENTLEARNINGMinhVuYukiAkiyamaKonstantinosSlavakisTokyoInstituteofTechnology,JapanDepartmentofInformationandCommunicationsEngineeringEmails:{vu.d.aa,akiyama.y.am,slavakis.k.aa}@m.titech.ac.jpABSTRACTThisstudyaddressesthepr...

展开>> 收起<<

DYNAMIC SELECTION OF P-NORM IN LINEAR ADAPTIVE FILTERING VIA ONLINE KERNEL-BASED REINFORCEMENT LEARNING Minh Vu Yuki Akiyama Konstantinos Slavakis.pdf

共5页,预览1页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

DYNAMIC SELECTION OF P-NORM IN LINEAR ADAPTIVE FILTERING VIA ONLINE KERNEL-BASED REINFORCEMENT LEARNING Minh Vu Yuki Akiyama Konstantinos Slavakis

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: