LAYER -WISE RELEVANCE PROPAGATION FOR ECHO STATE NETWORKS APPLIED TO EARTH SYSTEM VARIABILITY Marco Landt-Hayen

2025-05-03 0 0 881.76KB 13 页 10玖币
侵权投诉
LAYER-WISE RELEVANCE PROPAGATION FOR ECHO STATE
NETWORKS APPLIED TO EARTH SYSTEM VARIABILITY
Marco Landt-Hayen
GEOMAR Helmholtz Centre for Ocean Research
Kiel, Germany
mlandt-hayen@geomar.de
Peer Kröger
Christian-Albrechts-Universität
Kiel, Germany
pkr@informatik.uni-kiel.de
Martin Claus
Christian-Albrechts-Universität
Kiel, Germany
mclaus@geomar.de
Willi Rath
GEOMAR Helmholtz Centre for Ocean Research
Kiel, Germany
wrath@geomar.de
ABSTRACT
Artificial neural networks (ANNs) are known to be powerful methods for many hard problems (e.g.
image classification, speech recognition or time series prediction). However, these models tend
to produce black-box results and are often difficult to interpret. Layer-wise relevance propagation
(LRP) is a widely used technique to understand how ANN models come to their conclusion and to
understand what a model has learned.
Here, we focus on Echo State Networks (ESNs) as a certain type of recurrent neural networks, also
known as reservoir computing. ESNs are easy to train and only require a small number of trainable
parameters, but are still black-box models. We show how LRP can be applied to ESNs in order to
open the black-box. We also show how ESNs can be used not only for time series prediction but
also for image classification: Our ESN model serves as a detector for El Niño Southern Oscillation
(ENSO) from sea surface temperature anomalies. ENSO is actually a well-known problem and has
been extensively discussed before. But here we use this simple problem to demonstrate how LRP can
significantly enhance the explainablility of ESNs.
Keywords Reservoir Computing ·Echo State Networks ·Layer-wise Relevance Propagation ·Explainable AI
1 Introduction
Machine learning (ML) provides powerful techniques in the field of artificial intelligence (AI) to discover meaningful
relationships in all kinds of data. Within machine learning, artificial neural networks (ANNs) in shallow and deep
architectures are found to be promising and very versatile. While these models considerably push the state-of-the-art
solutions of many hard problems, they tend to produce black-box results that are difficult to interpret even by ML
experts. Consequently, the question of enhancing the explainability of complex models ("explainable AI" or "xAI") has
gained a lot of attention in the AI/ML community and stimulated a large amount of fundamental research [1], [2].
In its basic form layers of perceptrons [
3
] are stacked on top of each other to create a multilayer perceptron (MLP) [
4
].
These models are usually trained using some form of stochastic gradient descent (SGD) [
5
]. The aim is to minimize
some objective or loss function. More sophisticated architectures e.g. make use of convolutional neural networks
(CNNs) [
6
] or long short term memory (LSTM) [
7
] units to have recurrence in time in so-called recurrent neural
networks (RNNs).
This work was supported by the Helmholtz School for Marine Data Science (MarDATA) funded by the Helmholtz Association
(Grant HIDSS-0005).
Citation
: Landt-Hayen, M., Kröger, P., Claus, M., and Rath, W.: "Layer-wise Relevance Propagation for
Echo State Networks applied to Earth System Variability", In Proceedings of the 3rd International Conference on Machine Learning
Techniques (MLTEC 2022), Zurich, Switzerland, vol. 12, no. 20, pp. 115-130 (2022).
arXiv:2210.09958v2 [cs.LG] 16 Nov 2022
In this paper, we focus on geospatial data, which typically feature non-linear relationships among observations. In this
szenario, ANNs are good candidate models, since ANNs are capable of handling complex and non-linear relations by
learning from data and training some adjustable weights and biases [
8
]. In recent years these methods have been used in
various ways on geospatial data [9], [10], [11].
The problem with using ANNs on data of the Earth system is that we often only have relatively short time series to
predict on or a small number of events to learn from. Using sophisticated neural networks encounters a large number of
trainable parameters and these models are prone to overfitting. This requires a lot of expertise and effort to train these
models and prevent them from getting stuck in local minima of the objective function. Famous techniques are dropout,
early stopping and regularization [12], [13], [14].
In this work we overcome these problems by using Echo State Networks (ESNs) [
15
]. ESNs are a certain type of RNNs
and have been widely used for time series forecasting [
16
], [
17
]. In its basic form an ESN consists of an input and
an output layer. In between we find a reservoir of sparsely connected units. Weights and biases connecting inputs
to reservoir units and internal reservoir weights and biases are randomly initialized. The input length determines the
number of recurrent time steps inside the reservoir. We record the final reservoir states and only the output weights and
bias are trained. But opposed to other types of neural networks, this does not encounter some gradient descent methods
but is rather done in a closed-form manner by applying linear regression of final reservoir states onto desired target
values to get the output weights and bias.
This makes ESN models extremely powerful since they require only a very small number of trainable parameters (the
output weights and bias). In addition to that, training an ESN is easy, fast and leads to stable and reproducible results.
This makes them especially suitable for applications in the domain of climate and ocean research.
But as long as ESNs remain black-boxes, there is only a low level of trust in the obtained results and using these kinds
of models is likely to be rejected by domain experts. This can be overcome by adopting techniques from computer
vision developed for image data to climate data. Layer-wise relevance propagation (LRP) is a technique to trace the
final prediction of a multilayered neural network back through its layers until reaching the input space [
18
], [
19
]. When
applied to image classification, this reveals valuable insights in which input pixels have the highest relevance for the
model to come to its conclusion.
Toms, Barnes and Ebert-Uphoff have shown in their work [
20
] that LRP can be successfully applied to MLP used for
classification of events related to some well-known Earth system variablity: El Niño Southern Oscillation (ENSO).
This work is inspired by [
20
] and goes beyond their studies: We also pick the well-known ENSO problem [
21
]. ENSO
is found to have some strong zonal structure: It comes with anomalies in the sea surface temperature (SST) in Tropical
Pacific. This phenomenon is limited to a quite narrow range of latitude and some extended region in terms of longitude.
We use ESN models for image classification on SST anomaly fields. We then open the black-box and apply LRP to
ESN models, which has not been done before - to the best of our knowledge.
SST anomaly fields used in this work are found to be noisy. For this reason we focus on a special flavour of ESNs, that
uses a leaky reservoir because they have been considered to be more powerful on noisy input data, compared to standard
ESNs [
22
]. With the help of our LRP application to ESNs, we find the leak rate used in reservoir state transition to be a
crucial parameter determining the memory of the reservoir. Leak rate needs to be chosen appropriately to enable ESN
models to reach the desired high level of accuracy.
Our models yield competitive results compared to linear regression and MLP used as baselines. However, ESN models
require significantly less parameters and hence prevent our model from overfitting. We even find our reservoirs to be
robust against random permutation of input fields, destroying the zonal structure in the underlying ENSO anomalies.
This opens the door to use ESNs on unsolved problems from the domain of climate and ocean science and apply further
techniques of the toolbox of xAI [23].
The rest of this work is structured as follows: In Section 2 we briefly introduce basic ESNs and focus on reservoir state
transition for leaky reservoirs. We then sketch an efficient way to use ESN models for image classification. Section 3
outlines the concept of LRP in general before we customize LRP for our base ESN models by unfolding the reservoir
recurrence. The classification of ENSO patterns and the application of LRP to ESN models is presented in Section
4. Our models are not only found to be competitive classifiers but also reveal valuable insights in what the models
have learned. We show robustness of our models on randomly permuted input samples and visualize how the leak rate
determines the reservoir memory. Discussion and conclusion is found in Section 5, followed by technical details on the
used ESN and baseline models in the Appendix.
2
Figure 1: Sketch of base ESN: An input and an output layer, in between we find the reservoir.
2 Echo State Networks
An ESN is a special type of RNNs and comes with a strong theoretical background [
15
], [
24
], [
25
]. ESN models have
shown outstanding advantages over other types of RNNs that use gradient descent methods for training. We use in
this work a shallow ESN architecture consisting of an input and output layer. In between we find a single reservoir
of sparsely connected units. The weights connecting input layer and reservoir plus the input bias terms are randomly
initialized and kept fixed afterwards. We find some recurrence within the reservoir and reservoir weights and biases
are also randomly set and not trainable. Reservoir units are sparsely connected with sparsity usually in the range of
20-30%. Further constraints are put to the largest Eigenvalue of the reservoir weight matrix
Wres
. This is required for
the reservoir to be stable and show the so-called Echo State Property [26].
Only the output weights and bias are trained by solving a linear regression problem of final reservoir states onto desired
target outputs. A sketch of a base ESN model is shown in Figure 1.
In our ESN model,
u(t)RD×1
denotes input values at time
t
with
D
input features. Inputs are fed into the model for
T
time steps, hence
t= 1..T
. Reservoir states at time
t= 1..T
are denoted by
x(t)RN×1
, final reservoir states are
obtained as x(T). The final model output y(T)RM×1at time Thas Moutput values.
We then find input weights
Win RN×D
, connecting
D
input units to
N
reservoir units. Reservoir weights are given
by
Wres RN×N
and output weights connecting
N
reservoir units to
M
output units read
Wout RM×N
. In addition
to weight matrices we have bias vectors
bin RN×1
,
bres RN×1
and
bout RM×1
for input, reservoir and output
units, respectively.
We use a leaky reservoir with leak rate
α[0,1]
, as discussed in [
22
]. Leak rate serves as smoothing constant. The
larger the leak rate, the faster reservoir states react to new inputs. In other words the leak rate can be understood as the
inverse of the memory time scale of the ESN: The larger the leak rate, the faster the reservoir forgets previous time
steps’ inputs. The reservoir state transition is defined by Equation 1.
x(t) = (1 α)x(t1) + α act[Winu(t) + bin +Wresx(t1) + bres](1)
Here
act(.)
is some activation function, e.g. sigmoid or tanh. From the initial reservoir states
x(t= 1)
we can then
obtain further states
x(t)
for
t= 2..T
by keeping a fraction
(1 α)
of the previous reservoir state
x(t1)
. Current
time step’s input
Winu(t) + bin
as well as recurrence inside the reservoir
Wresx(t1) + bres
are added after applying
some activation and multiplying with leak rate
α
. Reservoir states
x(t)
are only defined for
t= 1..T
. This requires
special treatment of x(t= 1) as outlined in Equation 2.
x(t= 1) = α act[Winu(t) + bin](2)
The model output
y(T)
is derived as linear combination of output weights
Wout
and biases
bout
with final reservoir
states x(T), as shown in Equation 3.
y(T) = Wout x(T) + bout (3)
This is a linear problem that can be solved in a closed-form manner with multi-linear regression minimizing mean
squared error to obtain trained output weights and biases.
3
摘要:

LAYER-WISERELEVANCEPROPAGATIONFORECHOSTATENETWORKSAPPLIEDTOEARTHSYSTEMVARIABILITYMarcoLandt-HayenGEOMARHelmholtzCentreforOceanResearchKiel,Germanymlandt-hayen@geomar.dePeerKrögerChristian-Albrechts-UniversitätKiel,Germanypkr@informatik.uni-kiel.deMartinClausChristian-Albrechts-UniversitätKiel,Germa...

展开>> 收起<<
LAYER -WISE RELEVANCE PROPAGATION FOR ECHO STATE NETWORKS APPLIED TO EARTH SYSTEM VARIABILITY Marco Landt-Hayen.pdf

共13页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:13 页 大小:881.76KB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 13
客服
关注