Feature Importance for Time Series Data Improving KernelSHAP
2025-04-22
1
0
994.48KB
8 页
10玖币
侵权投诉
Feature Importance for Time Series Data: Improving
KernelSHAP
Mattia Jacopo Villani
mattia.villani@jpmorgan.com
J.P. Morgan AI Research
London, UK
King’s College London
London, UK
Joshua Lockhart
joshua.lockhart@jpmorgan.com
J.P. Morgan AI Research
London, UK
Daniele Magazzeni
daniele.magazzeni@jpmorgan.com
J.P. Morgan AI Research
London, UK
ABSTRACT
Feature importance techniques have enjoyed widespread attention
in the explainable AI literature as a means of determining how
trained machine learning models make their predictions. We con-
sider Shapley value based approaches to feature importance, applied
in the context of time series data. We present closed form solutions
for the SHAP values of a number of time series models, including
VARMAX. We also show how KernelSHAP can be applied to time
series tasks, and how the feature importances that come from this
technique can be combined to perform “event detection”. Finally,
we explore the use of Time Consistent Shapley values for feature
importance.
CCS CONCEPTS
•Explainable Articial Intelligence →Feature Importance
;
Time Series; Time Series;
•Time Series Modelling →
Shapley
Values.
KEYWORDS
neural networks, explainable articial intelligence, feature impor-
tance, Shapley Values, time series
ACM Reference Format:
Mattia Jacopo Villani, Joshua Lockhart, and Daniele Magazzeni. 2022. Fea-
ture Importance for Time Series Data: Improving KernelSHAP. In Proceed-
ings of Workshop on Explainable AI in Finance (ICAIF ’22). ACM, New York,
NY, USA, 8 pages. https://doi.org/XXXXXXX.XXXXXXX
1 INTRODUCTION
As deep learning models become more widely used, the importance
of explaining their predictions increases. Eorts in Explainable
Articial Intelligence (XAI) research have led to the development
of frameworks for interpreting model behaviour, including coun-
terfactuals [
WMR17
], [
KBBV20
], local surrogates [
RSG16
] and fea-
ture importance [
LL17
], [
SVZ13
], [
STK+17
]. This growing range of
techniques all aim at producing information that can assist model-
makers in establishing the soundness of their models. Indeed, the
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from permissions@acm.org.
ICAIF ’22, November 2 2022, New York, and Online
©2022 Association for Computing Machinery.
ACM ISBN 978-x-xxxx-xxxx-x/YY/MM. . . $15.00
https://doi.org/XXXXXXX.XXXXXXX
idea of the consumer’s right to an explanation on an AI-driven
decision is driving an increased focus on criteria for explainable
articial intelligence [
GSC+19
], [
DBH18
]. In general, explanations
are key to ensure that model decisions are compatible with ethical
standards.
Beyond their broader social scope, producing explanations can
help to mitigate model risk, by enabling a better understanding
of the limitations of a model. In fact, XAI techniques are often
employed to ‘debug’ models. Research in AI is therefore necessary
to enable the safer deployment of models such as Articial Neural
Networks (ANNs), which oer high predictive performance on
many tasks at the expense of interpretability.
Feature importance scores provide a rst approximation of model
behaviour, revealing which features were inuential on the model
predictions. This can either be in the context of a single prediction
(falling within the scope of local explainability) or a statement of
how the model behaves more generally: either for a given data set
or for any data set (global explainability) [
DBH18
]. Many of these
techniques are model agnostic, meaning that they are not tied to any
particular model architecture. However, we will argue that these
techniques can and should be rened to the particular model on
which they are being implemented.
Here we examine the particular case of the time series domain.
In this context, we explore how the particular characteristics af-
fect the implementation of Shapley Additive Explanations (SHAP)
[
LL17
]. As we explore later, there are several important constraints
in deploying KernelSHAP in the time series domain. Firstly, Ker-
nelSHAP makes use of the coecients of a linear model that has
been t to perturbations of the data. We ascertain that the results
guaranteeing the convergence of these coecients to the SHAP
values are indeed preserved in the time series domain, where in-
stead of a linear model we need to t a Vector Autoregressive (VAR)
model.
Secondly, time series models often take windows of data; in the
presence of long input windows the computation of KernelSHAP
becomes impossible. As we will see later, this is due to numerical
underow in the normalisation coecient present in the calcula-
tions required for KernelSHAP. Indeed, in practice, it is common to
nd very long lookback windows. Thirdly, KernelSHAP assumes in-
dependence of features, which is emphatically an exception rather
than the norm in the time series domain. It is common for time
series to possess autocorrelations, meaning correlations between
values attained by a given variable at dierent time steps.
In this paper we present several extensions to the KernelSHAP
algorithm to compute SHAP values, for the time series domain,
arXiv:2210.02176v1 [cs.LG] 5 Oct 2022
ICAIF ’22, November 2 2022, New York, and Online Maia Jacopo Villani, Joshua Lockhart, and Daniele Magazzeni
addressing the aforementioned areas, as well as a method for de-
tecting events through SHAP. Moreover, we provide explicit SHAP
values for broadly used time series models AR, MA, ARMA, VARMA,
VARMAX. Our contributions, in the structure of the paper, are:
(1)
Proof of suitability for the application of KernelSHAP in
the context of time series data. Our proof builds on the ap-
proximation of SHAP with linear models in KernelSHAP,
extending it to the time series domain through VAR mod-
els, whose calibration also approximates SHAP. We call this
alteration VARSHAP.
(2)
Explicit SHAP values for widely used time series models:
autoregressive, moving average and vector models of the
same with exogenous variables.
(3)
We present Time Consistent SHAP Values, a new feature
importance technique for the time series domain, which
leverages the temporal component of the problem in order
to cut down the sample space involved in the KernelSHAP
computation.
(4)
An aggregation technique which is able to capture surges
in feature importance across time steps, which we call event
detection.
1.1 Related Work
Shapley Additive Explanations (SHAP Values) [
LL17
] are a broadly
used feature importance technique that leverage an analogy be-
tween value attribution in co-operative game theory and feature
importance assignment in machine learning models. SHAP oper-
ates by assessing the marginal contribution of a chosen feature
to all possible combinations of features which do not contain an
input of interest. However, doing so is computationally expensive,
prompting the development of approximation algorithms. An ex-
ample of this is KernelSHAP [
LL17
], which is shown to converge to
SHAP values as the sample space approaches the set of all possible
coalitions.
Gradient based methods are also popular feature importance
techniques that measure the sensitivity of the output to perturba-
tions of the input. We may use vanilla gradients for a particular
sample [
SVZ13
], or use regularization techniques such as in Smooth-
Grad [
STK+17
] or integrated gradients [
STY17
]. These methods
tend to be less expensive than computing SHAP values. [
BHPB21
]
applies a range of these techniques, as well as LIME explanations
[RSG16], in the context of nancial time series.
Indeed, much research has gone into speeding up the computa-
tion of SHAP values. For example, FastSHAP uses a deep neural
network to approximate the SHAP imputations [
JSC+21
]. While
this approach does not maintain the desirable theoretical guaran-
tees of SHAP, the technique is fast, and generally accurate. However,
since this work relies on neural networks, it raises the potential
challenge of having to explain the explanation.
In particular, with respect to Recurrent Neural Networks, many
of the above model agnostic techniques apply. However, there are
certain methods that are specic to the time series domain and to
certain model architectures. [
MLY18
] leverage a decomposition of
the LSTM function to compute the relevance of a particular feature
in a given context. TimeSHAP [
BSC+21
], the closest work to our
paper, is an algorithm to extend KernelSHAP to the time series
domain, by selectively masking either features, or time steps.
There are other ways of explaining time series predictions. Com-
puting neural activations for a range of samples [
KJFF15
] nds that
certain neurons of neural networks act as symbols for events occur-
ring in the data. [
AMMS17
] apply Layerwise Relevance Propagation
(LRP) [BBM+15] to recurrent neural networks.
Finally, this paper makes use of traditional time series modelling
techniques, including autoregressive models (AR), moving average
models (MA), autoregressive moving average models (ARMA) and
vector autoregressive moving average models with exogenous vari-
ables (VARMAX). We point to [
SSS00
] for a general introduction to
the topic and [
Mil16
] for specics on the VARMAX model and how
to calibrate or train these models.
2 SHAP FOR TIME SERIES
2.1 SHAP and KernelSHAP
Let the supervised learning problem on time series be dened by
D=(X,Y)
, the data of the problem on
X={𝑋}𝑗∈𝐼⊂R𝑁×𝑊
,
Y={𝑦}𝑗∈𝐼⊂R𝑀
, where
𝑋
is a matrix of
𝑊
column vectors (each
column represents the
𝑁
features), one for each step in the lookback
window,
𝐼
is an indexing set, and
|𝐼|
is the number of samples in the
dataset,
𝑊
is the size of the window,
𝑁
is the number of features
shown to the network at each time step
𝑤∈ [𝑊]
, which represents
𝑤∈ {
1
, ...,𝑊 }
. Finally,
𝑀
is the dimensionality of the output space.
A function approximating the relationship described by
D
and
parametrised by
𝜃∈Θ
, a parameter space, is of type
𝑓𝜃
:
R𝑁×𝑊→
R𝑀
. In particular, we let
𝑓𝜃
be a recurrent neural network, such as
an LSTM [HS97].
The formula for the SHAP value [
LL17
] of feature
𝑖∈𝐶
, where
𝐶is the collection of all features, is given by:
𝜙𝑣(𝑖)=
𝑆∈P (𝐶)\{𝑖}
(𝑁− |𝑆| + 1)!|𝑆|!
𝑁!Δ𝑣(𝑆, 𝑖),
where
P(𝐶)
is the powerset of the set of all features, and, for a
value function
𝑣
:
P(𝐶) → R
, the marginal contribution
Δ𝑣(𝑆, 𝑖)
of a feature 𝑖to a coalition 𝑆⊂ P(𝐶)is given by
Δ𝑣(𝑖, 𝑆)=𝑣({𝑖} ∪ 𝑆) − 𝑣(𝑆).
Even for small values of
|𝐶|
,
|P(𝐶)| =
2
|𝐶|
is large, implying
that the SHAP values cannot be easily computed. KernelSHAP,
again presented in [
LL17
] is a commonly used approximation of
SHAP, which provably converges to SHAP values as the number
of perturbed input features approaches
|P(𝐶)|
. More precisely, we
dene KernelSHAP as the SHAP values of the linear model
𝑔
given
by the minimization of
min
𝑔∈𝐿𝑀
𝑧∈Z
(𝑓𝜃(ℎ𝑥(𝑧)) − 𝑔(𝑧))𝜋𝑥(𝑧),(1)
where
ℎ𝑥
:
Z → R𝑑
is a masking function on a
𝑑−
dimensional
{
0
,
1
}
-vector
𝑧
belonging to a sample
Z ⊆ {
0
,
1
}𝑑
, the collection of
all possible said vectors, each representing a dierent coalition. In
practice, this function maps a coalition to the masked data point
𝑥★
, on which we compute the prediction
𝑓𝜃(ℎ𝑥(𝑧))
. Finally,
𝜋𝑥
is
combinatorial kernel, from which the method gets its name, given
摘要:
展开>>
收起<<
FeatureImportanceforTimeSeriesData:ImprovingKernelSHAPMattiaJacopoVillanimattia.villani@jpmorgan.comJ.P.MorganAIResearchLondon,UKKing’sCollegeLondonLondon,UKJoshuaLockhartjoshua.lockhart@jpmorgan.comJ.P.MorganAIResearchLondon,UKDanieleMagazzenidaniele.magazzeni@jpmorgan.comJ.P.MorganAIResearchLondon...
声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
相关推荐
-
公司营销部领导述职述廉报告VIP免费
2024-12-03 4 -
100套述职述廉述法述学框架提纲VIP免费
2024-12-03 3 -
20220106政府党组班子党史学习教育专题民主生活会“五个带头”对照检查材料VIP免费
2024-12-03 3 -
20220106县纪委监委领导班子党史学习教育专题民主生活会对照检查材料VIP免费
2024-12-03 6 -
A文秘笔杆子工作资料汇编手册(近70000字)VIP免费
2024-12-03 3 -
20220106县领导班子党史学习教育专题民主生活会对照检查材料VIP免费
2024-12-03 4 -
经济开发区党工委书记管委会主任述学述职述廉述法报告VIP免费
2024-12-03 34 -
20220106政府领导专题民主生活会五个方面对照检查材料VIP免费
2024-12-03 11 -
派出所教导员述职述廉报告6篇VIP免费
2024-12-03 8 -
民主生活会对县委班子及其成员批评意见清单VIP免费
2024-12-03 50
分类:图书资源
价格:10玖币
属性:8 页
大小:994.48KB
格式:PDF
时间:2025-04-22


渝公网安备50010702506394