Feature Importance for Time Series Data Improving KernelSHAP

2025-04-22 1 0 994.48KB 8 页 10玖币

侵权投诉

Feature Importance for Time Series Data: Improving

KernelSHAP

Mattia Jacopo Villani

mattia.villani@jpmorgan.com

J.P. Morgan AI Research

London, UK

King’s College London

London, UK

Joshua Lockhart

joshua.lockhart@jpmorgan.com

J.P. Morgan AI Research

London, UK

Daniele Magazzeni

daniele.magazzeni@jpmorgan.com

J.P. Morgan AI Research

London, UK

ABSTRACT

Feature importance techniques have enjoyed widespread attention

in the explainable AI literature as a means of determining how

trained machine learning models make their predictions. We con-

sider Shapley value based approaches to feature importance, applied

in the context of time series data. We present closed form solutions

for the SHAP values of a number of time series models, including

VARMAX. We also show how KernelSHAP can be applied to time

series tasks, and how the feature importances that come from this

technique can be combined to perform “event detection”. Finally,

we explore the use of Time Consistent Shapley values for feature

importance.

CCS CONCEPTS

•Explainable Articial Intelligence →Feature Importance

;

Time Series; Time Series;

•Time Series Modelling →

Shapley

Values.

KEYWORDS

neural networks, explainable articial intelligence, feature impor-

tance, Shapley Values, time series

ACM Reference Format:

Mattia Jacopo Villani, Joshua Lockhart, and Daniele Magazzeni. 2022. Fea-

ture Importance for Time Series Data: Improving KernelSHAP. In Proceed-

ings of Workshop on Explainable AI in Finance (ICAIF ’22). ACM, New York,

NY, USA, 8 pages. https://doi.org/XXXXXXX.XXXXXXX

1 INTRODUCTION

As deep learning models become more widely used, the importance

of explaining their predictions increases. Eorts in Explainable

Articial Intelligence (XAI) research have led to the development

of frameworks for interpreting model behaviour, including coun-

terfactuals [

WMR17

], [

KBBV20

], local surrogates [

RSG16

] and fea-

ture importance [

LL17

], [

SVZ13

], [

STK+17

]. This growing range of

techniques all aim at producing information that can assist model-

makers in establishing the soundness of their models. Indeed, the

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for prot or commercial advantage and that copies bear this notice and the full citation

on the rst page. Copyrights for components of this work owned by others than ACM

must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,

to post on servers or to redistribute to lists, requires prior specic permission and/or a

fee. Request permissions from permissions@acm.org.

ICAIF ’22, November 2 2022, New York, and Online

ACM ISBN 978-x-xxxx-xxxx-x/YY/MM. . . $15.00

https://doi.org/XXXXXXX.XXXXXXX

idea of the consumer’s right to an explanation on an AI-driven

decision is driving an increased focus on criteria for explainable

articial intelligence [

GSC+19

], [

DBH18

]. In general, explanations

are key to ensure that model decisions are compatible with ethical

standards.

Beyond their broader social scope, producing explanations can

help to mitigate model risk, by enabling a better understanding

of the limitations of a model. In fact, XAI techniques are often

employed to ‘debug’ models. Research in AI is therefore necessary

to enable the safer deployment of models such as Articial Neural

Networks (ANNs), which oer high predictive performance on

many tasks at the expense of interpretability.

Feature importance scores provide a rst approximation of model

behaviour, revealing which features were inuential on the model

predictions. This can either be in the context of a single prediction

(falling within the scope of local explainability) or a statement of

how the model behaves more generally: either for a given data set

or for any data set (global explainability) [

DBH18

]. Many of these

techniques are model agnostic, meaning that they are not tied to any

particular model architecture. However, we will argue that these

techniques can and should be rened to the particular model on

which they are being implemented.

Here we examine the particular case of the time series domain.

In this context, we explore how the particular characteristics af-

fect the implementation of Shapley Additive Explanations (SHAP)

[

LL17

]. As we explore later, there are several important constraints

in deploying KernelSHAP in the time series domain. Firstly, Ker-

nelSHAP makes use of the coecients of a linear model that has

been t to perturbations of the data. We ascertain that the results

guaranteeing the convergence of these coecients to the SHAP

values are indeed preserved in the time series domain, where in-

stead of a linear model we need to t a Vector Autoregressive (VAR)

model.

Secondly, time series models often take windows of data; in the

presence of long input windows the computation of KernelSHAP

becomes impossible. As we will see later, this is due to numerical

underow in the normalisation coecient present in the calcula-

tions required for KernelSHAP. Indeed, in practice, it is common to

nd very long lookback windows. Thirdly, KernelSHAP assumes in-

dependence of features, which is emphatically an exception rather

than the norm in the time series domain. It is common for time

series to possess autocorrelations, meaning correlations between

values attained by a given variable at dierent time steps.

In this paper we present several extensions to the KernelSHAP

algorithm to compute SHAP values, for the time series domain,

arXiv:2210.02176v1 [cs.LG] 5 Oct 2022

ICAIF ’22, November 2 2022, New York, and Online Maia Jacopo Villani, Joshua Lockhart, and Daniele Magazzeni

addressing the aforementioned areas, as well as a method for de-

tecting events through SHAP. Moreover, we provide explicit SHAP

values for broadly used time series models AR, MA, ARMA, VARMA,

VARMAX. Our contributions, in the structure of the paper, are:

(1)

Proof of suitability for the application of KernelSHAP in

the context of time series data. Our proof builds on the ap-

proximation of SHAP with linear models in KernelSHAP,

extending it to the time series domain through VAR mod-

els, whose calibration also approximates SHAP. We call this

alteration VARSHAP.

(2)

Explicit SHAP values for widely used time series models:

autoregressive, moving average and vector models of the

same with exogenous variables.

(3)

We present Time Consistent SHAP Values, a new feature

importance technique for the time series domain, which

leverages the temporal component of the problem in order

to cut down the sample space involved in the KernelSHAP

computation.

(4)

An aggregation technique which is able to capture surges

in feature importance across time steps, which we call event

detection.

1.1 Related Work

Shapley Additive Explanations (SHAP Values) [

LL17

] are a broadly

used feature importance technique that leverage an analogy be-

tween value attribution in co-operative game theory and feature

importance assignment in machine learning models. SHAP oper-

ates by assessing the marginal contribution of a chosen feature

to all possible combinations of features which do not contain an

input of interest. However, doing so is computationally expensive,

prompting the development of approximation algorithms. An ex-

ample of this is KernelSHAP [

LL17

], which is shown to converge to

SHAP values as the sample space approaches the set of all possible

coalitions.

Gradient based methods are also popular feature importance

techniques that measure the sensitivity of the output to perturba-

tions of the input. We may use vanilla gradients for a particular

sample [

SVZ13

], or use regularization techniques such as in Smooth-

Grad [

STK+17

] or integrated gradients [

STY17

]. These methods

tend to be less expensive than computing SHAP values. [

BHPB21

]

applies a range of these techniques, as well as LIME explanations

[RSG16], in the context of nancial time series.

Indeed, much research has gone into speeding up the computa-

tion of SHAP values. For example, FastSHAP uses a deep neural

network to approximate the SHAP imputations [

JSC+21

]. While

this approach does not maintain the desirable theoretical guaran-

tees of SHAP, the technique is fast, and generally accurate. However,

since this work relies on neural networks, it raises the potential

challenge of having to explain the explanation.

In particular, with respect to Recurrent Neural Networks, many

of the above model agnostic techniques apply. However, there are

certain methods that are specic to the time series domain and to

certain model architectures. [

MLY18

] leverage a decomposition of

the LSTM function to compute the relevance of a particular feature

in a given context. TimeSHAP [

BSC+21

], the closest work to our

paper, is an algorithm to extend KernelSHAP to the time series

domain, by selectively masking either features, or time steps.

There are other ways of explaining time series predictions. Com-

puting neural activations for a range of samples [

KJFF15

] nds that

certain neurons of neural networks act as symbols for events occur-

ring in the data. [

AMMS17

] apply Layerwise Relevance Propagation

(LRP) [BBM+15] to recurrent neural networks.

Finally, this paper makes use of traditional time series modelling

techniques, including autoregressive models (AR), moving average

models (MA), autoregressive moving average models (ARMA) and

vector autoregressive moving average models with exogenous vari-

ables (VARMAX). We point to [

SSS00

] for a general introduction to

the topic and [

Mil16

] for specics on the VARMAX model and how

to calibrate or train these models.

2 SHAP FOR TIME SERIES

2.1 SHAP and KernelSHAP

Let the supervised learning problem on time series be dened by

D=(X,Y)

, the data of the problem on

X={𝑋}𝑗∈𝐼⊂R𝑁×𝑊

Y={𝑦}𝑗∈𝐼⊂R𝑀

, where

𝑋

is a matrix of

𝑊

column vectors (each

column represents the

𝑁

features), one for each step in the lookback

window,

𝐼

is an indexing set, and

|𝐼|

is the number of samples in the

dataset,

𝑊

is the size of the window,

𝑁

is the number of features

shown to the network at each time step

𝑤∈ [𝑊]

, which represents

𝑤∈ {

, ...,𝑊 }

. Finally,

𝑀

is the dimensionality of the output space.

A function approximating the relationship described by

and

parametrised by

𝜃∈Θ

, a parameter space, is of type

𝑓𝜃

R𝑁×𝑊→

R𝑀

. In particular, we let

𝑓𝜃

be a recurrent neural network, such as

an LSTM [HS97].

The formula for the SHAP value [

LL17

] of feature

𝑖∈𝐶

, where

𝐶is the collection of all features, is given by:

𝜙𝑣(𝑖)=

𝑆∈P (𝐶)\{𝑖}

(𝑁− |𝑆| + 1)!|𝑆|!

𝑁!Δ𝑣(𝑆, 𝑖),

where

P(𝐶)

is the powerset of the set of all features, and, for a

value function

𝑣

P(𝐶) → R

, the marginal contribution

Δ𝑣(𝑆, 𝑖)

of a feature 𝑖to a coalition 𝑆⊂ P(𝐶)is given by

Δ𝑣(𝑖, 𝑆)=𝑣({𝑖} ∪ 𝑆) − 𝑣(𝑆).

Even for small values of

|𝐶|

|P(𝐶)| =

|𝐶|

is large, implying

that the SHAP values cannot be easily computed. KernelSHAP,

again presented in [

LL17

] is a commonly used approximation of

SHAP, which provably converges to SHAP values as the number

of perturbed input features approaches

|P(𝐶)|

. More precisely, we

dene KernelSHAP as the SHAP values of the linear model

𝑔

given

by the minimization of

min

𝑔∈𝐿𝑀 

𝑧∈Z

(𝑓𝜃(ℎ𝑥(𝑧)) − 𝑔(𝑧))𝜋𝑥(𝑧),(1)

where

ℎ𝑥

Z → R𝑑

is a masking function on a

𝑑−

dimensional

{

}

-vector

𝑧

belonging to a sample

Z ⊆ {

}𝑑

, the collection of

all possible said vectors, each representing a dierent coalition. In

practice, this function maps a coalition to the masked data point

𝑥★

, on which we compute the prediction

𝑓𝜃(ℎ𝑥(𝑧))

. Finally,

𝜋𝑥

combinatorial kernel, from which the method gets its name, given

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

FeatureImportanceforTimeSeriesData:ImprovingKernelSHAPMattiaJacopoVillanimattia.villani@jpmorgan.comJ.P.MorganAIResearchLondon,UKKing’sCollegeLondonLondon,UKJoshuaLockhartjoshua.lockhart@jpmorgan.comJ.P.MorganAIResearchLondon,UKDanieleMagazzenidaniele.magazzeni@jpmorgan.comJ.P.MorganAIResearchLondon...

展开>> 收起<<

Feature Importance for Time Series Data Improving KernelSHAP.pdf

共8页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Feature Importance for Time Series Data Improving KernelSHAP

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: