A Detailed Study of Interpretability of Deep Neural Network based Top Taggers 3
learning models have been developed to optimize top tagging [19, 20, 21, 22, 23, 24, 25,
26, 27, 28, 29, 30, 31, 32, 33]. A comprehensive review and comparison of many of these
models is given in Ref. [34]. Some of these models have exploited DNN’s capacity to
approximate arbitrary non-linear functions [35] and their huge success with problems
in the field of computer vision, other models have been inspired by the underlying
physics information like jet clustering history [22], physical symmetries [23] and physics-
inspired feature engineering [27]. These efforts have inspired novel model architectures
and feature engineering by creating or augmenting input feature spaces with physically
meaningful quantities [27, 36, 37].
The rich history of physics-inspired model development makes the problem of top
tagging an excellent playground to better understand the modern XAI tools themselves.
This allows us to traverse a rare two-way bridge in exploring the relationship between
data and models- our physics knowledge will allow us to better understand the inner
workings of modern XAI tools and perfect them while those improved tools would allow
us to take a deeper look at the models- paving ways for analyzing and reoptimizing
them. As it has been pointed out in Ref. [38], such insights into explainability of
DNN-based models are important to validate them, to make them reliable and reusable.
Additionally, the broader scope of uncertainty quantification in association with ML
models relies on developing robust explanations [39] and in the field of HEP for
problems like top tagging will require dedicated understanding of how robust as well as
interpretable these models are [40].
Yet another remarkable application of interpretability is to understand how the
model conveys information and in doing so, which parts of a DNN most actively engage
in forward propagation of information. Such studies could be useful to understand
and reoptimize model complexity. Given DNNs have shown remarkable success in jet
and event classification, recent work has placed emphasis on developing DNN-enabled
FPGAs for trigger-level applications at the LHC [41, 42, 43]. As resource consumption
and latency of FPGAs directly depend on the size of the network to be implemented,
it is definitely easier to embed simpler networks on these devices. Hence, methods that
allow interpreting a network’s response patterns as well as provide critical insights about
model optimization without compromising its performance can greatly benefit these new
budding fields of ML applications, especially for online event selection and jet tagging
at current and future high energy colliders.
Application of state-of-the-art explainability techniques for interpreting jet tagger
models is receiving more attention recently [37, 44, 45, 46] and has been demonstrated
to be successful in identifying feature importance for models like the Interaction
Network [47]. In this paper, we study the interpretability of a subset of existing ML-
based top tagging models. The models we have chosen use multi-layer perceptrons
(MLPs) as underlying neural architecture. Choosing simpler neural architecture allows
us to elucidate the applicability and limitations of existing XAI methods and develop
new tools to examine them without convoluting these efforts with the complexity of
larger models or unorthodox data structures. To compare our results for different