the objective function of neural networks being highly non-convex, the resulting model may end up
in different local minima. Although such local minima result in models with similar generalization
performance (as measured with validation loss), these result in varied explanations or feature im-
portance scores for the same data. This instability in feature attribution is aggravated by the dataset
having low signal amplitude features or high correlation between features which is quite common
in real-world data analysis. More importantly, this instability in feature importance score directly
impacts the stability of selected features.
In this work, we first demonstrate this issue of instability in feature importance and feature selection
for standard benchmarking datasets and interpretability metrics. We also provide evidence of how
data properties like signal strength and correlation aggravate instability. We then demonstrate how
simple averaging of feature importance scores from models at different training epochs helps address
this instability. Motivated by the abilities of such averaging, we propose a framework for stabilizing
the feature importance and feature selection from the deep neural network. Our proposed framework
first perform hyperparameter optimization of deep learning models. Then, instead of the conventional
practice of selecting a single best model, we find out numerous good models and create an ensemble
of their feature importance score, which, as we show later, will help select robust features. For
determining good models, we consider two strategies. First, we propose using top-performing models
as determined by cross-validation (CV) loss. In the second strategy, we propose statistical leveraging
to find the influential models for feature importance. In this work, we consider a knockoff framework
for feature selection as they choose features with statistical guarantees. Across a range of experiments
in simulation settings and real-world feature selection problems, we find that the existing approach of
selecting features from the best model across different hyperparameter settings and epochs doesn’t
necessarily result in stable or improved feature selection. Instead, we achieve stable and improved
feature selection with the presented framework. Overall, our contributions are as follows:
•
We demonstrate the instability in DNN interpretations for widely used interpretability metrics
(Grad, DeepLift and Lime) across two benchmarking datasets (MNIST and CIFAR-10).
•
We propose a framework to create an ensemble of feature importance scores obtained from
the training paths of the deep neural network to stabilize the feature importance score from
deep neural networks.
•
We demonstrate the applicability of such an ensemble in the task of feature selection with
knockoff inference.
•
Across the simulation studies and three real-data applications for feature selection, we
demonstrate the efficacy of the proposed framework to improve both stability and the power
of feature selection for deep learning models.
2 Related works
In recent times, there have been works that carefully studied the fragility in neural networks interpreta-
tion Ghorbani et al. [2019], Slack et al. [2020]. These works demonstrate that explanation approaches
are fragile to adversarial perturbations where perceptively indistinguishable inputs can have very
different interpretations, despite assigning the same predicted label. Although our work is also about
the instability in interpretations of the neural network, unlike them, we study this problem without
relying on adversarial inputs. Further, we focus on the impact of this instability on the downstream
application of feature selection, which hasn’t been considered before.
The primary strategy in our framework, i.e., ensemble of feature importance score from models in
different training stages has some similarities with the recent works in deep learning generalization
Li et al. [2022], Izmailov et al. [2018]. These works have studied the model’s weight averaging as an
alternative to improve generalization. However, unlike these works, we propose to form an ensemble
of feature importance scores obtained from individual model weights at different stages of the deep
learning training, and most importantly, unlike all previous works, we used such an ensemble to
improve the stability and power of feature selection.
Feature selection (or variable selection) has been extensively studied in machine learning and statistics
Saeys et al. [2007], Mares et al. [2016]. The selection of features while controlling false discovery is
an attractive property, and there exist different feature selection methods that provide such statistical
guarantees Meinshausen and Bühlmann [2010], Barber and Candès [2015]. Although our framework
2