Feature Importance for Time Series Data Improving KernelSHAP

2025-04-22 0 0 994.48KB 8 页 10玖币
侵权投诉
Feature Importance for Time Series Data: Improving
KernelSHAP
Mattia Jacopo Villani
mattia.villani@jpmorgan.com
J.P. Morgan AI Research
London, UK
King’s College London
London, UK
Joshua Lockhart
joshua.lockhart@jpmorgan.com
J.P. Morgan AI Research
London, UK
Daniele Magazzeni
daniele.magazzeni@jpmorgan.com
J.P. Morgan AI Research
London, UK
ABSTRACT
Feature importance techniques have enjoyed widespread attention
in the explainable AI literature as a means of determining how
trained machine learning models make their predictions. We con-
sider Shapley value based approaches to feature importance, applied
in the context of time series data. We present closed form solutions
for the SHAP values of a number of time series models, including
VARMAX. We also show how KernelSHAP can be applied to time
series tasks, and how the feature importances that come from this
technique can be combined to perform “event detection”. Finally,
we explore the use of Time Consistent Shapley values for feature
importance.
CCS CONCEPTS
Explainable Articial Intelligence Feature Importance
;
Time Series; Time Series;
Time Series Modelling
Shapley
Values.
KEYWORDS
neural networks, explainable articial intelligence, feature impor-
tance, Shapley Values, time series
ACM Reference Format:
Mattia Jacopo Villani, Joshua Lockhart, and Daniele Magazzeni. 2022. Fea-
ture Importance for Time Series Data: Improving KernelSHAP. In Proceed-
ings of Workshop on Explainable AI in Finance (ICAIF ’22). ACM, New York,
NY, USA, 8 pages. https://doi.org/XXXXXXX.XXXXXXX
1 INTRODUCTION
As deep learning models become more widely used, the importance
of explaining their predictions increases. Eorts in Explainable
Articial Intelligence (XAI) research have led to the development
of frameworks for interpreting model behaviour, including coun-
terfactuals [
WMR17
], [
KBBV20
], local surrogates [
RSG16
] and fea-
ture importance [
LL17
], [
SVZ13
], [
STK+17
]. This growing range of
techniques all aim at producing information that can assist model-
makers in establishing the soundness of their models. Indeed, the
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from permissions@acm.org.
ICAIF ’22, November 2 2022, New York, and Online
©2022 Association for Computing Machinery.
ACM ISBN 978-x-xxxx-xxxx-x/YY/MM. . . $15.00
https://doi.org/XXXXXXX.XXXXXXX
idea of the consumer’s right to an explanation on an AI-driven
decision is driving an increased focus on criteria for explainable
articial intelligence [
GSC+19
], [
DBH18
]. In general, explanations
are key to ensure that model decisions are compatible with ethical
standards.
Beyond their broader social scope, producing explanations can
help to mitigate model risk, by enabling a better understanding
of the limitations of a model. In fact, XAI techniques are often
employed to ‘debug’ models. Research in AI is therefore necessary
to enable the safer deployment of models such as Articial Neural
Networks (ANNs), which oer high predictive performance on
many tasks at the expense of interpretability.
Feature importance scores provide a rst approximation of model
behaviour, revealing which features were inuential on the model
predictions. This can either be in the context of a single prediction
(falling within the scope of local explainability) or a statement of
how the model behaves more generally: either for a given data set
or for any data set (global explainability) [
DBH18
]. Many of these
techniques are model agnostic, meaning that they are not tied to any
particular model architecture. However, we will argue that these
techniques can and should be rened to the particular model on
which they are being implemented.
Here we examine the particular case of the time series domain.
In this context, we explore how the particular characteristics af-
fect the implementation of Shapley Additive Explanations (SHAP)
[
LL17
]. As we explore later, there are several important constraints
in deploying KernelSHAP in the time series domain. Firstly, Ker-
nelSHAP makes use of the coecients of a linear model that has
been t to perturbations of the data. We ascertain that the results
guaranteeing the convergence of these coecients to the SHAP
values are indeed preserved in the time series domain, where in-
stead of a linear model we need to t a Vector Autoregressive (VAR)
model.
Secondly, time series models often take windows of data; in the
presence of long input windows the computation of KernelSHAP
becomes impossible. As we will see later, this is due to numerical
underow in the normalisation coecient present in the calcula-
tions required for KernelSHAP. Indeed, in practice, it is common to
nd very long lookback windows. Thirdly, KernelSHAP assumes in-
dependence of features, which is emphatically an exception rather
than the norm in the time series domain. It is common for time
series to possess autocorrelations, meaning correlations between
values attained by a given variable at dierent time steps.
In this paper we present several extensions to the KernelSHAP
algorithm to compute SHAP values, for the time series domain,
arXiv:2210.02176v1 [cs.LG] 5 Oct 2022
ICAIF ’22, November 2 2022, New York, and Online Maia Jacopo Villani, Joshua Lockhart, and Daniele Magazzeni
addressing the aforementioned areas, as well as a method for de-
tecting events through SHAP. Moreover, we provide explicit SHAP
values for broadly used time series models AR, MA, ARMA, VARMA,
VARMAX. Our contributions, in the structure of the paper, are:
(1)
Proof of suitability for the application of KernelSHAP in
the context of time series data. Our proof builds on the ap-
proximation of SHAP with linear models in KernelSHAP,
extending it to the time series domain through VAR mod-
els, whose calibration also approximates SHAP. We call this
alteration VARSHAP.
(2)
Explicit SHAP values for widely used time series models:
autoregressive, moving average and vector models of the
same with exogenous variables.
(3)
We present Time Consistent SHAP Values, a new feature
importance technique for the time series domain, which
leverages the temporal component of the problem in order
to cut down the sample space involved in the KernelSHAP
computation.
(4)
An aggregation technique which is able to capture surges
in feature importance across time steps, which we call event
detection.
1.1 Related Work
Shapley Additive Explanations (SHAP Values) [
LL17
] are a broadly
used feature importance technique that leverage an analogy be-
tween value attribution in co-operative game theory and feature
importance assignment in machine learning models. SHAP oper-
ates by assessing the marginal contribution of a chosen feature
to all possible combinations of features which do not contain an
input of interest. However, doing so is computationally expensive,
prompting the development of approximation algorithms. An ex-
ample of this is KernelSHAP [
LL17
], which is shown to converge to
SHAP values as the sample space approaches the set of all possible
coalitions.
Gradient based methods are also popular feature importance
techniques that measure the sensitivity of the output to perturba-
tions of the input. We may use vanilla gradients for a particular
sample [
SVZ13
], or use regularization techniques such as in Smooth-
Grad [
STK+17
] or integrated gradients [
STY17
]. These methods
tend to be less expensive than computing SHAP values. [
BHPB21
]
applies a range of these techniques, as well as LIME explanations
[RSG16], in the context of nancial time series.
Indeed, much research has gone into speeding up the computa-
tion of SHAP values. For example, FastSHAP uses a deep neural
network to approximate the SHAP imputations [
JSC+21
]. While
this approach does not maintain the desirable theoretical guaran-
tees of SHAP, the technique is fast, and generally accurate. However,
since this work relies on neural networks, it raises the potential
challenge of having to explain the explanation.
In particular, with respect to Recurrent Neural Networks, many
of the above model agnostic techniques apply. However, there are
certain methods that are specic to the time series domain and to
certain model architectures. [
MLY18
] leverage a decomposition of
the LSTM function to compute the relevance of a particular feature
in a given context. TimeSHAP [
BSC+21
], the closest work to our
paper, is an algorithm to extend KernelSHAP to the time series
domain, by selectively masking either features, or time steps.
There are other ways of explaining time series predictions. Com-
puting neural activations for a range of samples [
KJFF15
] nds that
certain neurons of neural networks act as symbols for events occur-
ring in the data. [
AMMS17
] apply Layerwise Relevance Propagation
(LRP) [BBM+15] to recurrent neural networks.
Finally, this paper makes use of traditional time series modelling
techniques, including autoregressive models (AR), moving average
models (MA), autoregressive moving average models (ARMA) and
vector autoregressive moving average models with exogenous vari-
ables (VARMAX). We point to [
SSS00
] for a general introduction to
the topic and [
Mil16
] for specics on the VARMAX model and how
to calibrate or train these models.
2 SHAP FOR TIME SERIES
2.1 SHAP and KernelSHAP
Let the supervised learning problem on time series be dened by
D=(X,Y)
, the data of the problem on
X={𝑋}𝑗𝐼R𝑁×𝑊
,
Y={𝑦}𝑗𝐼R𝑀
, where
𝑋
is a matrix of
𝑊
column vectors (each
column represents the
𝑁
features), one for each step in the lookback
window,
𝐼
is an indexing set, and
|𝐼|
is the number of samples in the
dataset,
𝑊
is the size of the window,
𝑁
is the number of features
shown to the network at each time step
𝑤∈ [𝑊]
, which represents
𝑤∈ {
1
, ...,𝑊 }
. Finally,
𝑀
is the dimensionality of the output space.
A function approximating the relationship described by
D
and
parametrised by
𝜃Θ
, a parameter space, is of type
𝑓𝜃
:
R𝑁×𝑊
R𝑀
. In particular, we let
𝑓𝜃
be a recurrent neural network, such as
an LSTM [HS97].
The formula for the SHAP value [
LL17
] of feature
𝑖𝐶
, where
𝐶is the collection of all features, is given by:
𝜙𝑣(𝑖)=
𝑆P (𝐶)\{𝑖}
(𝑁− |𝑆| + 1)!|𝑆|!
𝑁!Δ𝑣(𝑆, 𝑖),
where
P(𝐶)
is the powerset of the set of all features, and, for a
value function
𝑣
:
P(𝐶) R
, the marginal contribution
Δ𝑣(𝑆, 𝑖)
of a feature 𝑖to a coalition 𝑆⊂ P(𝐶)is given by
Δ𝑣(𝑖, 𝑆)=𝑣({𝑖} 𝑆) 𝑣(𝑆).
Even for small values of
|𝐶|
,
|P(𝐶)| =
2
|𝐶|
is large, implying
that the SHAP values cannot be easily computed. KernelSHAP,
again presented in [
LL17
] is a commonly used approximation of
SHAP, which provably converges to SHAP values as the number
of perturbed input features approaches
|P(𝐶)|
. More precisely, we
dene KernelSHAP as the SHAP values of the linear model
𝑔
given
by the minimization of
min
𝑔𝐿𝑀
𝑧Z
(𝑓𝜃(𝑥(𝑧)) 𝑔(𝑧))𝜋𝑥(𝑧),(1)
where
𝑥
:
Z R𝑑
is a masking function on a
𝑑
dimensional
{
0
,
1
}
-vector
𝑧
belonging to a sample
Z ⊆ {
0
,
1
}𝑑
, the collection of
all possible said vectors, each representing a dierent coalition. In
practice, this function maps a coalition to the masked data point
𝑥
, on which we compute the prediction
𝑓𝜃(𝑥(𝑧))
. Finally,
𝜋𝑥
is
combinatorial kernel, from which the method gets its name, given
摘要:

FeatureImportanceforTimeSeriesData:ImprovingKernelSHAPMattiaJacopoVillanimattia.villani@jpmorgan.comJ.P.MorganAIResearchLondon,UKKing’sCollegeLondonLondon,UKJoshuaLockhartjoshua.lockhart@jpmorgan.comJ.P.MorganAIResearchLondon,UKDanieleMagazzenidaniele.magazzeni@jpmorgan.comJ.P.MorganAIResearchLondon...

展开>> 收起<<
Feature Importance for Time Series Data Improving KernelSHAP.pdf

共8页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:8 页 大小:994.48KB 格式:PDF 时间:2025-04-22

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 8
客服
关注