DYNAMIC SELECTION OF P-NORM IN LINEAR ADAPTIVE FILTERING VIA ONLINE KERNEL-BASED REINFORCEMENT LEARNING Minh Vu Yuki Akiyama Konstantinos Slavakis

2025-05-03 0 0 924.09KB 5 页 10玖币
侵权投诉
DYNAMIC SELECTION OF P-NORM IN LINEAR ADAPTIVE FILTERING VIA
ONLINE KERNEL-BASED REINFORCEMENT LEARNING
Minh Vu Yuki Akiyama Konstantinos Slavakis
Tokyo Institute of Technology, Japan
Department of Information and Communications Engineering
Emails: {vu.d.aa, akiyama.y.am, slavakis.k.aa}@m.titech.ac.jp
ABSTRACT
This study addresses the problem of selecting dynamically, at each
time instance, the “optimal” p-norm to combat outliers in linear
adaptive filtering without any knowledge on the potentially time-
varying probability density function of the outliers. To this end,
an online and data-driven framework is designed via kernel-based
reinforcement learning (KBRL). Novel Bellman mappings on repro-
ducing kernel Hilbert spaces (RKHSs) are introduced that need no
knowledge on transition probabilities of Markov decision processes,
and are nonexpansive with respect to the underlying Hilbertian norm.
An approximate policy-iteration framework is finally offered via the
introduction of a finite-dimensional affine superset of the fixed-point
set of the proposed Bellman mappings. The well-known “curse of
dimensionality” in RKHSs is addressed by building a basis of vec-
tors via an approximate linear dependency criterion. Numerical tests
on synthetic data demonstrate that the proposed framework selects
always the “optimal” p-norm for the outlier scenario at hand, outper-
forming at the same time several non-RL and KBRL schemes.
1. INTRODUCTION
The least-squares (LS) error/loss (between an observed value and its
predicted one) plays a pivotal role in signal processing, e.g., adap-
tive filtering [1], and machine learning [2]. For example, the least-
mean squares (LMS) and recursive (R)LS [1] are two celebrated al-
gorithms in adaptive filtering and stochastic approximation which
are based on the LS-error criterion. Notwithstanding, LS methods
are notoriously sensitive to the presence of outliers within data [3],
where outliers are defined as (sparsely) contaminating data that do
not adhere to a nominal data generation model, and are often mod-
eled as random variables (RVs) with non-Gaussian heavy tailed dis-
tributions, e.g., α-stable ones [4, 5]. To combat the negative ef-
fects of outliers, several non-LS criteria, such as least mean p-power
(LMP) [6–11] and maximum correntropy (MC) [12], have been stud-
ied. This work focuses on the LMP criterion, owing to the well-
documented robustness of LMP against outliers [13], while results
on MC will be reported elsewhere.
This study builds around the classical data-generation model
yn=θ|
xn+on, where nNdenotes discrete time (Nis the
set of all non-negative integers), θis the L×1vector whose en-
tries are the system parameters that need to be identified, onis the
RV which models outliers/noise, (xn, yn)stands for the input-output
pair of available data, where xnis an L×1vector and ynis real-
valued, and |denotes vector/matrix transposition. For an arbitrarily
fixed θ0, the LMP algorithm [6] generates estimates (θn)nNof θ
according to the following recursion:
θn+1 :=θn+ρp|en|p2enxn,(1)
where en:=ynx|
nθn,ρis the learning rate (step size), and p
is a fixed user-defined real-valued number within the interval [1,2]
to ensure that the p-norm loss |ynx|
nθ|pis a convex function of
θ[6]. Notice that if p= 1 and 2, then (1) boils down to the classical
sign-LMS and LMS, respectively [1].
Intuition suggests that the choice of pshould be based on the
distribution of onin the data-generation model. For example, if on
obeys a Gaussian distribution, then p= 2 should be chosen (recall
the maximum-likelihood criterion). To enhance robustness against
outliers, combination of adaptive filters with different forgetting fac-
tors, but with the same fixed p-norm, have been also introduced [9].
However, it seems that an online and data-driven solution to the
problem of dynamically selecting p,without any prior knowledge
on the distribution of on, which may change with time, is yet to be
found.
This work offers a solution to the aforementioned open prob-
lem via reinforcement learning (RL) [14]; a machine-learning
paradigm where an “agent” interacts with the surrounding envi-
ronment to identify iteratively the policy which minimizes the
cost of its “actions. More specifically, the well-known policy-
iteration (PI) framework [14] of RL is adopted, because of its
well-documented merits (e.g., [15–17]) over the alternative RL
frameworks of temporal-difference (TD) and Q-learning [14], espe-
cially for continuous and high-dimensional state spaces such as the
one considered here. PI comprises two stages at every iteration n:
policy evaluation and policy improvement. At policy evaluation, the
current policy is evaluated by a Q-function [14], which represents,
loosely speaking, the long-term cost that the agent would suffer had
the current policy been used to determine the next state, whereas at
the policy-improvement stage, the agent uses the Q-function values
to update the policy. The underlying state space is considered to be
continuous and high dimensional, due to the nature of the available
data (xn, yn), while the action space is considered to be discrete: an
action is a value of ptaken from a finite grid of the interval [1,2].
Deep neural networks offer approximating spaces for Q-
functions, e.g., [18], but they may require processing of batch data
(even re-training) during online-mode operation, since they may face
test data generated by probability density functions (PDFs) different
from those of the training ones (dynamic environments). Such batch
processing inflicts large computational times and complexity, dis-
couraging the application of deep neural networks to online learning
where a small complexity footprint is desired.
To meet the desired computational complexity requirements,
this study builds an approximate (A)PI framework for online RL
along the lines of kernel-based (KB)RL [15–17, 19–26]. Cen-
tral to the API design is the construction of novel Bellman map-
pings [14, 27]. The proposed Bellman mappings are defined on a
reproducing kernel Hilbert space (RKHS) H[28, 29], which serves
as the approximating space for the Q-functions. In contrast to the
prevailing route in KBRL [15–17, 19–26], which views Bellman
mappings as contractions in L-norm Banach spaces (by definition,
no inner product available), this study introduces nonexpansive [30]
arXiv:2210.11317v2 [eess.SP] 21 Oct 2022
摘要:

DYNAMICSELECTIONOFP-NORMINLINEARADAPTIVEFILTERINGVIAONLINEKERNEL-BASEDREINFORCEMENTLEARNINGMinhVuYukiAkiyamaKonstantinosSlavakisTokyoInstituteofTechnology,JapanDepartmentofInformationandCommunicationsEngineeringEmails:{vu.d.aa,akiyama.y.am,slavakis.k.aa}@m.titech.ac.jpABSTRACTThisstudyaddressesthepr...

展开>> 收起<<
DYNAMIC SELECTION OF P-NORM IN LINEAR ADAPTIVE FILTERING VIA ONLINE KERNEL-BASED REINFORCEMENT LEARNING Minh Vu Yuki Akiyama Konstantinos Slavakis.pdf

共5页,预览1页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:5 页 大小:924.09KB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 5
客服
关注