Conditional Feature Importance for Mixed Data Kristin Blesch12 David S. Watson3and Marvin N. Wright124 1Leibniz Institute for Prevention Research Epidemiology BIPS Bremen Germany.

2025-05-01 0 0 7.23MB 52 页 10玖币
侵权投诉
Conditional Feature Importance for Mixed Data
Kristin Blesch1,2*, David S. Watson3and Marvin N. Wright1,2,4
1* Leibniz Institute for Prevention Research & Epidemiology – BIPS, Bremen, Germany.
2Faculty of Mathematics and Computer Science, University of Bremen, Bremen, Germany.
3Department of Informatics, King’s College London, London, United Kingdom.
4Department of Public Health, University of Copenhagen, Copenhagen, Denmark.
*Corresponding author. E-mail: blesch@leibniz-bips.de;
Contributing authors: david.watson@kcl.ac.uk;wright@leibniz-bips.de;
Abstract
Despite the popularity of feature importance (FI) measures in interpretable
machine learning, the statistical adequacy of these methods is rarely dis-
cussed. From a statistical perspective, a major distinction is between analyzing
a variable’s importance before and after adjusting for covariates – i.e.,
between marginal and conditional measures. Our work draws attention to this
rarely acknowledged, yet crucial distinction and showcases its implications.
We find that few methods are available for testing conditional
FI, and practitioners have hitherto been severely restricted in
method application due to mismatched data requirements. Most
real-world data exhibits complex feature dependencies and incorpo-
rates both continuous and categorical features (i.e., mixed data).
Both properties are oftentimes neglected by conditional FI measures.
To fill this gap, we propose to combine the conditional predictive
impact (CPI) framework with sequential knockoff sampling. The CPI
enables conditional FI measurement that controls for any feature depen-
dencies by sampling valid knockoffs – hence, generating synthetic data
with similar statistical properties – for the data to be analyzed.
Sequential knockoffs were deliberately designed to handle mixed data
and thus allow us to extend the CPI approach to such datasets.
We demonstrate through numerous simulations and a real-world example that
our proposed workflow controls type I error, achieves high power, and is in line
with results given by other conditional FI measures, whereas marginal FI met-
rics can result in misleading interpretations. Our findings highlight the necessity
of developing statistically adequate, specialized methods for mixed data.
Keywords: Interpretable Machine Learning, Feature Importance, Knockoffs,
Explainable Artificial Intelligence
1
arXiv:2210.03047v3 [stat.ML] 2 May 2023
2Conditional Feature Importance for Mixed Data
1 Introduction
Interpretable machine learning is on the rise as practitioners become inter-
ested in not only achieving high prediction accuracy in supervised learning
tasks, but also understanding why certain predictions were made. Evaluating
the importance of input variables (features) to the target prediction plays a
crucial role in facilitating such endeavours. Several feature importance (FI)
measures have been proposed by the machine learning community, but differing
conceptualizations are spread across the literature.
We identify at least five dichotomies that orient FI methods: (1) global vs.
local; (2) model-agnostic vs. model-specific; (3) testing vs. scoring; (4) meth-
ods that do and do not accommodate mixed tabular data; and (5) conditional
vs. marginal measures. This defines a grid with 25= 32 cells that helps cate-
gorize FI measures. For example, the popular SHAP algorithm (Lundberg and
Lee,2017) produces local, model-agnostic FI scores that can accommodate
mixed data and measures marginal FI. We emphasize that there is no “ideal”
configuration of these five options—each is the right answer to a different
question that is irreducibly context-dependent. However, this grid helps iden-
tify a notable lacuna: There are few global, model-agnostic FI methods that
accommodate mixed data with error control for conditional FI measurement.
Explaining the dichotomies in more detail, local FI measures (Lundberg
and Lee,2017;Ribeiro et al.,2016) are optimized for a particular point or
region of the feature space, e.g. a single observation, while global FI scores
(Fisher et al.,2019;Friedman,2001) measure a variable’s overall importance.
Model-specific measures (Breiman,2001;Kursa and Rudnicki,2010;Shriku-
mar et al.,2017) exploit the properties of a particular function class for more
efficient or precise FI calculation, while model-agnostic measures (Apley and
Zhu,2020;Ribeiro et al.,2018) treat the underlying model as a black box.
Testing methods include some inference procedure for error control (Lei et al.,
2018), while scoring methods (Covert et al.,2020) do not. Some methods are
proposed with limited applicability to certain data types, e.g., only continu-
ous inputs (Watson and Wright,2021), while others are more flexible (Molnar
et al.,2023). We discuss a selection of FI methods briefly in Section 2, but
refer readers to review papers on FI interpretability methods, e.g. Linardatos
et al. (2021), for a wider discussion on the topic.
Through the lens of statistics, the division (5), conditional vs. marginal
measures, is particularly important yet insufficiently acknowledged in both
literature and practice (Apley and Zhu,2020;Hooker et al.,2021;Molnar
et al.,2023;Watson and Wright,2021). The complementary concepts become
evident when relating the statistical conception of independence testing to the
machine learning view on FI measurement. We can think of the marginal null
hypothesis as testing whether the input feature Xjis independent of other
covariates Xjor the target variable Y:
HM
0:Xj⊥ {Y, Xj}(1)
Conditional Feature Importance for Mixed Data 3
On the other hand, testing against (2) accounts for the covariates Xjand
hence corresponds to conditional FI:
HC
0:XjY|Xj(2)
These tests clearly target different objectives. In this setup, we have HM
0entail-
ing HC
0, but not the other way around. However, this strength comes with a
certain loss of specificity, because rejecting HM
0leaves it unclear whether Xj
is correlated with Y,Xj, or both.
The relationship between FI and independence testing sheds light on
another aspect, which may even be considered another dichotomy: does the
FI measure aim to investigate model behaviour or the underlying data struc-
ture (Chen et al.,2020)? For example, conditional independence tests that
are part of some conditional FI measures (Watson and Wright,2021) may be
used for causal structure learning, which often is based on repeated condi-
tional independence testing (Glymour et al.,2019). Therefore, conditional FI
measures can help explain the underlying data structure, whereas marginal FI
measures differentiate between variables the predictive model relies on, which
can be used to evaluate the fairness of a model. This does not preclude practi-
tioners from using marginal and conditional FI measures in conjunction, and
since marginal measures are often faster to compute, they might be preferable
for quick assessments in large pipelines with many iterations. However, prac-
titioners must be careful to interpret these measures properly and not infer a
conditional signal from a marginal test.
In Fig. 1, we illustrate the difference between marginal (permutation fea-
ture importance (PFI), Fisher et al.,2019;Breiman,2001) and conditional
(conditional predictive impact with Gaussian knockoffs (CPIgauss), Watson
and Wright,2021) FI measures. In this example, the confounding variable C
is a common cause of both Xand Y. This causal structure induces spurious
correlation between Xand Y, leading the marginal FI measure to attribute
nonzero importance to both Cand Xin predicting Y. On the contrary, the con-
ditional FI measure attributes nonzero FI only to C, since Xhas no additional
predictive value for Yabove C.
This paper explores global, model-agnostic FI methods that accommodate
mixed data with error control for conditional FI measurement. This is not a
niche problem: mixed tabular data is the norm in many important areas such
as healthcare, economics, and industry, and inference procedures are essen-
tial for decision making in high risk domains to minimize costly errors. With
the proliferation of machine learning algorithms, model-agnostic approaches
can help standardize FI tasks without recalibrating to a particular function
class for each new application. Conditional, global measures are valuable when
practitioners seek mechanistic understanding that takes data covariance into
account and go beyond individual model outputs.
Even though the empirical relevance of this kind of FI measurement is
eminent, specialized methods are lacking. Some FI methods have yet to be eval-
uated in mixed data settings (Covert et al.,2020;Molnar et al.,2023;Lei et al.,
4Conditional Feature Importance for Mixed Data
Fig. 1 Boxplots contrasting marginal and conditional FI metrics for a prediction of Y
with C, X (N= 200) through a random forest prediction model across 1 000 replicates.
The conditional FI measure attributes no importance to X, whereas the marginal measure
attributes non-zero importance to Xbecause (due to induced correlation between Xand Y
by C) it is predictive of Y.
2018), while others are currently inapplicable (Watson and Wright,2021). The
consequences of neglecting the special nature of mixed data for conditional FI
measurement remain unexplored, and therefore practitioners currently have no
guidance on how to proceed with conditional FI measurement in such cases,
which proves a severe limitation in real-world applications.
We propose to combine the conditional predictive impact (CPI) testing
framework proposed by Watson and Wright (2021) with the use of sequential
knockoffs (Kormaksson et al.,2021) in order to enable conditional, global,
model-agnostic FI testing for mixed data. CPI is a flexible, model-agnostic
tool that relies on the usage of so-called knockoffs (Cand`es et al.,2018). In
short, knockoffs are synthetic variables that carry over the major statistical
properties of the original variables, such as the correlation structure among
covariates. While Watson and Wright (2021) claim that the CPI should in
principle work with any valid set of knockoffs, it has thus far only been applied
and evaluated with Gaussian knockoffs (Cand`es et al.,2018). This currently
limits practitioners to using the CPI method only with continuous variables or
to disregard the specialities of mixed data. We analyse consequences of such
a disregard when using CPI with Gaussian knockoffs (Cand`es et al.,2018)
(CPIgauss) and deep knockoffs (Romano et al.,2020) (CPIdeep) and propose
a specialized solution strategy to tackle the mixed data case: using sequential
knockoffs (Kormaksson et al.,2021) – a knockoff sampling algorithm explicitly
developed for mixed data – within the CPI framework (CPIseq).
The paper will be structured as follows. We present relevant methodology
and FI measures in Section 2. Section 2.2 reviews several knockoff sampling
algorithms, demonstrating the need for specialized procedures with mixed
Conditional Feature Importance for Mixed Data 5
data and motivating our proposed solution, CPIseq. Through simulation stud-
ies in Sections 3.1 and 3.2, we will evaluate our newly proposed workflow in
more depth and further compare it to other methods. Finally, we illustrate
method application to a real-world dataset in Section 3.3 before concluding
and discussing our findings in Section 4.
2 Methods
With a focus on the measurement of model-agnostic, global, conditional FI,
this section presents related measures proposed by previous literature and
discusses their applicability to mixed data. We acknowledge that methods
from the statistical literature on conditional independence testing (Shah and
Peters,2020;Williamson et al.,2021) might also be utilized for conditional FI
measurement, however, a full comparison of such methods is beyond the scope
of this paper. Further, it is worth clarifying at this point that we understand
FI here as a concept that is tied to the variable’s effect on the predictive
performance in a supervised learning task.
2.1 Feature Importance Measures
Conditional subgroup approach (CS)
A global, model-agnostic FI measure that acknowledges the crucial distinction
between conditional and marginal measures of importance is the conditional
subgroup (CS) approach proposed by Molnar et al. (2023). CS partitions
the data into interpretable subgroups, i.e. groups whose feature distribu-
tions are homogeneous within but heterogeneous between groups. The method
is promising, as it explicitly specifies the conditioning between subgroups
and further allows for an unconditional interpretation within subgroups.
This means the method provides both a global conditional and a within-
group unconditional interpretation, which sheds light on feature dependence
structures.
To determine FI, CS evaluates the change in loss when the variable of
interest is permuted within subgroups, which lowers extrapolation to low-
density regions of the feature space, thereby mitigating a common problem
with permutation-based approaches (Hooker et al.,2021). To decide on a suit-
able partition, the authors suggest determining subgroups via transformation
trees. Using a pre-specified loss function, the average increase in loss is reported
for multiple permutations versus the original ordering of variables.
CS is not affected by mixed data other than through the choice of an appro-
priate prediction algorithm, which is why this method is suspected to work
equally well with mixed data. However, for this approach to work, researchers
must assume that the data is separable into subgroups. Further, for testing
FI, the method would need to rely on computationally expensive permutation
tests, as no inherent testing procedure is provided.
摘要:

ConditionalFeatureImportanceforMixedDataKristinBlesch1,2*,DavidS.Watson3andMarvinN.Wright1,2,41*LeibnizInstituteforPreventionResearch&Epidemiology{BIPS,Bremen,Germany.2FacultyofMathematicsandComputerScience,UniversityofBremen,Bremen,Germany.3DepartmentofInformatics,King'sCollegeLondon,London,UnitedK...

展开>> 收起<<
Conditional Feature Importance for Mixed Data Kristin Blesch12 David S. Watson3and Marvin N. Wright124 1Leibniz Institute for Prevention Research Epidemiology BIPS Bremen Germany..pdf

共52页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:52 页 大小:7.23MB 格式:PDF 时间:2025-05-01

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 52
客服
关注