LAYER -WISE RELEVANCE PROPAGATION FOR ECHO STATE NETWORKS APPLIED TO EARTH SYSTEM VARIABILITY Marco Landt-Hayen

2025-05-03 0 0 881.76KB 13 页 10玖币

侵权投诉

LAYER-WISE RELEVANCE PROPAGATION FOR ECHO STATE

NETWORKS APPLIED TO EARTH SYSTEM VARIABILITY ∗

Marco Landt-Hayen

GEOMAR Helmholtz Centre for Ocean Research

Kiel, Germany

mlandt-hayen@geomar.de

Peer Kröger

Christian-Albrechts-Universität

Kiel, Germany

pkr@informatik.uni-kiel.de

Martin Claus

Christian-Albrechts-Universität

Kiel, Germany

mclaus@geomar.de

Willi Rath

GEOMAR Helmholtz Centre for Ocean Research

Kiel, Germany

wrath@geomar.de

ABSTRACT

Artiﬁcial neural networks (ANNs) are known to be powerful methods for many hard problems (e.g.

image classiﬁcation, speech recognition or time series prediction). However, these models tend

to produce black-box results and are often difﬁcult to interpret. Layer-wise relevance propagation

(LRP) is a widely used technique to understand how ANN models come to their conclusion and to

understand what a model has learned.

Here, we focus on Echo State Networks (ESNs) as a certain type of recurrent neural networks, also

known as reservoir computing. ESNs are easy to train and only require a small number of trainable

parameters, but are still black-box models. We show how LRP can be applied to ESNs in order to

open the black-box. We also show how ESNs can be used not only for time series prediction but

also for image classiﬁcation: Our ESN model serves as a detector for El Niño Southern Oscillation

(ENSO) from sea surface temperature anomalies. ENSO is actually a well-known problem and has

been extensively discussed before. But here we use this simple problem to demonstrate how LRP can

signiﬁcantly enhance the explainablility of ESNs.

Keywords Reservoir Computing ·Echo State Networks ·Layer-wise Relevance Propagation ·Explainable AI

1 Introduction

Machine learning (ML) provides powerful techniques in the ﬁeld of artiﬁcial intelligence (AI) to discover meaningful

relationships in all kinds of data. Within machine learning, artiﬁcial neural networks (ANNs) in shallow and deep

architectures are found to be promising and very versatile. While these models considerably push the state-of-the-art

solutions of many hard problems, they tend to produce black-box results that are difﬁcult to interpret even by ML

experts. Consequently, the question of enhancing the explainability of complex models ("explainable AI" or "xAI") has

gained a lot of attention in the AI/ML community and stimulated a large amount of fundamental research [1], [2].

In its basic form layers of perceptrons [

] are stacked on top of each other to create a multilayer perceptron (MLP) [

These models are usually trained using some form of stochastic gradient descent (SGD) [

]. The aim is to minimize

some objective or loss function. More sophisticated architectures e.g. make use of convolutional neural networks

(CNNs) [

] or long short term memory (LSTM) [

] units to have recurrence in time in so-called recurrent neural

networks (RNNs).

∗

This work was supported by the Helmholtz School for Marine Data Science (MarDATA) funded by the Helmholtz Association

(Grant HIDSS-0005).

Citation

: Landt-Hayen, M., Kröger, P., Claus, M., and Rath, W.: "Layer-wise Relevance Propagation for

Echo State Networks applied to Earth System Variability", In Proceedings of the 3rd International Conference on Machine Learning

Techniques (MLTEC 2022), Zurich, Switzerland, vol. 12, no. 20, pp. 115-130 (2022).

arXiv:2210.09958v2 [cs.LG] 16 Nov 2022

In this paper, we focus on geospatial data, which typically feature non-linear relationships among observations. In this

szenario, ANNs are good candidate models, since ANNs are capable of handling complex and non-linear relations by

learning from data and training some adjustable weights and biases [

]. In recent years these methods have been used in

various ways on geospatial data [9], [10], [11].

The problem with using ANNs on data of the Earth system is that we often only have relatively short time series to

predict on or a small number of events to learn from. Using sophisticated neural networks encounters a large number of

trainable parameters and these models are prone to overﬁtting. This requires a lot of expertise and effort to train these

models and prevent them from getting stuck in local minima of the objective function. Famous techniques are dropout,

early stopping and regularization [12], [13], [14].

In this work we overcome these problems by using Echo State Networks (ESNs) [

]. ESNs are a certain type of RNNs

and have been widely used for time series forecasting [

], [

]. In its basic form an ESN consists of an input and

an output layer. In between we ﬁnd a reservoir of sparsely connected units. Weights and biases connecting inputs

to reservoir units and internal reservoir weights and biases are randomly initialized. The input length determines the

number of recurrent time steps inside the reservoir. We record the ﬁnal reservoir states and only the output weights and

bias are trained. But opposed to other types of neural networks, this does not encounter some gradient descent methods

but is rather done in a closed-form manner by applying linear regression of ﬁnal reservoir states onto desired target

values to get the output weights and bias.

This makes ESN models extremely powerful since they require only a very small number of trainable parameters (the

output weights and bias). In addition to that, training an ESN is easy, fast and leads to stable and reproducible results.

This makes them especially suitable for applications in the domain of climate and ocean research.

But as long as ESNs remain black-boxes, there is only a low level of trust in the obtained results and using these kinds

of models is likely to be rejected by domain experts. This can be overcome by adopting techniques from computer

vision developed for image data to climate data. Layer-wise relevance propagation (LRP) is a technique to trace the

ﬁnal prediction of a multilayered neural network back through its layers until reaching the input space [

], [

]. When

applied to image classiﬁcation, this reveals valuable insights in which input pixels have the highest relevance for the

model to come to its conclusion.

Toms, Barnes and Ebert-Uphoff have shown in their work [

] that LRP can be successfully applied to MLP used for

classiﬁcation of events related to some well-known Earth system variablity: El Niño Southern Oscillation (ENSO).

This work is inspired by [

] and goes beyond their studies: We also pick the well-known ENSO problem [

]. ENSO

is found to have some strong zonal structure: It comes with anomalies in the sea surface temperature (SST) in Tropical

Paciﬁc. This phenomenon is limited to a quite narrow range of latitude and some extended region in terms of longitude.

We use ESN models for image classiﬁcation on SST anomaly ﬁelds. We then open the black-box and apply LRP to

ESN models, which has not been done before - to the best of our knowledge.

SST anomaly ﬁelds used in this work are found to be noisy. For this reason we focus on a special ﬂavour of ESNs, that

uses a leaky reservoir because they have been considered to be more powerful on noisy input data, compared to standard

ESNs [

]. With the help of our LRP application to ESNs, we ﬁnd the leak rate used in reservoir state transition to be a

crucial parameter determining the memory of the reservoir. Leak rate needs to be chosen appropriately to enable ESN

models to reach the desired high level of accuracy.

Our models yield competitive results compared to linear regression and MLP used as baselines. However, ESN models

require signiﬁcantly less parameters and hence prevent our model from overﬁtting. We even ﬁnd our reservoirs to be

robust against random permutation of input ﬁelds, destroying the zonal structure in the underlying ENSO anomalies.

This opens the door to use ESNs on unsolved problems from the domain of climate and ocean science and apply further

techniques of the toolbox of xAI [23].

The rest of this work is structured as follows: In Section 2 we brieﬂy introduce basic ESNs and focus on reservoir state

transition for leaky reservoirs. We then sketch an efﬁcient way to use ESN models for image classiﬁcation. Section 3

outlines the concept of LRP in general before we customize LRP for our base ESN models by unfolding the reservoir

recurrence. The classiﬁcation of ENSO patterns and the application of LRP to ESN models is presented in Section

4. Our models are not only found to be competitive classiﬁers but also reveal valuable insights in what the models

have learned. We show robustness of our models on randomly permuted input samples and visualize how the leak rate

determines the reservoir memory. Discussion and conclusion is found in Section 5, followed by technical details on the

used ESN and baseline models in the Appendix.

Figure 1: Sketch of base ESN: An input and an output layer, in between we ﬁnd the reservoir.

2 Echo State Networks

An ESN is a special type of RNNs and comes with a strong theoretical background [

], [

]. ESN models have

shown outstanding advantages over other types of RNNs that use gradient descent methods for training. We use in

this work a shallow ESN architecture consisting of an input and output layer. In between we ﬁnd a single reservoir

of sparsely connected units. The weights connecting input layer and reservoir plus the input bias terms are randomly

initialized and kept ﬁxed afterwards. We ﬁnd some recurrence within the reservoir and reservoir weights and biases

are also randomly set and not trainable. Reservoir units are sparsely connected with sparsity usually in the range of

20-30%. Further constraints are put to the largest Eigenvalue of the reservoir weight matrix

Wres

. This is required for

the reservoir to be stable and show the so-called Echo State Property [26].

Only the output weights and bias are trained by solving a linear regression problem of ﬁnal reservoir states onto desired

target outputs. A sketch of a base ESN model is shown in Figure 1.

In our ESN model,

u(t)∈RD×1

denotes input values at time

with

input features. Inputs are fed into the model for

time steps, hence

t= 1..T

. Reservoir states at time

t= 1..T

are denoted by

x(t)∈RN×1

, ﬁnal reservoir states are

obtained as x(T). The ﬁnal model output y(T)∈RM×1at time Thas Moutput values.

We then ﬁnd input weights

Win ∈RN×D

, connecting

input units to

reservoir units. Reservoir weights are given

Wres ∈RN×N

and output weights connecting

reservoir units to

output units read

Wout ∈RM×N

. In addition

to weight matrices we have bias vectors

bin ∈RN×1

bres ∈RN×1

and

bout ∈RM×1

for input, reservoir and output

units, respectively.

We use a leaky reservoir with leak rate

α∈[0,1]

, as discussed in [

]. Leak rate serves as smoothing constant. The

larger the leak rate, the faster reservoir states react to new inputs. In other words the leak rate can be understood as the

inverse of the memory time scale of the ESN: The larger the leak rate, the faster the reservoir forgets previous time

steps’ inputs. The reservoir state transition is deﬁned by Equation 1.

x(t) = (1 −α)x(t−1) + α act[Winu(t) + bin +Wresx(t−1) + bres](1)

Here

act(.)

is some activation function, e.g. sigmoid or tanh. From the initial reservoir states

x(t= 1)

we can then

obtain further states

x(t)

for

t= 2..T

by keeping a fraction

(1 −α)

of the previous reservoir state

x(t−1)

. Current

time step’s input

Winu(t) + bin

as well as recurrence inside the reservoir

Wresx(t−1) + bres

are added after applying

some activation and multiplying with leak rate

. Reservoir states

x(t)

are only deﬁned for

t= 1..T

. This requires

special treatment of x(t= 1) as outlined in Equation 2.

x(t= 1) = α act[Winu(t) + bin](2)

The model output

y(T)

is derived as linear combination of output weights

Wout

and biases

bout

with ﬁnal reservoir

states x(T), as shown in Equation 3.

y(T) = Wout x(T) + bout (3)

This is a linear problem that can be solved in a closed-form manner with multi-linear regression minimizing mean

squared error to obtain trained output weights and biases.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

LAYER-WISERELEVANCEPROPAGATIONFORECHOSTATENETWORKSAPPLIEDTOEARTHSYSTEMVARIABILITYMarcoLandt-HayenGEOMARHelmholtzCentreforOceanResearchKiel,Germanymlandt-hayen@geomar.dePeerKrögerChristian-Albrechts-UniversitätKiel,Germanypkr@informatik.uni-kiel.deMartinClausChristian-Albrechts-UniversitätKiel,Germa...

展开>> 收起<<

LAYER -WISE RELEVANCE PROPAGATION FOR ECHO STATE NETWORKS APPLIED TO EARTH SYSTEM VARIABILITY Marco Landt-Hayen.pdf

共13页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

LAYER -WISE RELEVANCE PROPAGATION FOR ECHO STATE NETWORKS APPLIED TO EARTH SYSTEM VARIABILITY Marco Landt-Hayen

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: