Concise and interpretable multi-label rule sets
Martino Ciaperoni
Department of Computer Science
Aalto University
Espoo, Finland
martino.ciaperoni@aalto.fi
Han Xiao
Department of Computer Science
Aalto University
Espoo, Finland
han.xiao@aalto.fi
Aristides Gionis
Division of Theoretical Computer Science
KTH Royal Institute of Technology
Stockholm, Sweden
argioni@kth.se
Abstract—Multi-label classification is becoming increasingly
ubiquitous, but not much attention has been paid to interpretabil-
ity. In this paper, we develop a multi-label classifier that can be
represented as a concise set of simple “if-then” rules, and thus,
it offers better interpretability compared to black-box models.
Notably, our method is able to find a small set of relevant patterns
that lead to accurate multi-label classification, while existing
rule-based classifiers are myopic and wasteful in searching rules,
requiring a large number of rules to achieve high accuracy. In
particular, we formulate the problem of choosing multi-label
rules to maximize a target function, which considers not only
discrimination ability with respect to labels, but also diversity.
Accounting for diversity helps to avoid redundancy, and thus, to
control the number of rules in the solution set. To tackle the said
maximization problem we propose a 2-approximation algorithm,
which relies on a novel technique to sample high-quality rules.
In addition to our theoretical analysis, we provide a thorough
experimental evaluation, which indicates that our approach offers
a trade-off between predictive performance and interpretability
that is unmatched in previous work.
I. INTRODUCTION
Machine-learning algorithms are nowadays being used in
almost every domain. While such algorithms are known to
perform well in many tasks, they are often used as “black-
boxes,” i.e., the decision processes involved are too complex
for humans to interpret. The lack of interpretability limits
considerably the level of trust humans put in machine-learning
algorithms and thus, poses a barrier for the wide adoption of
machine-learning techniques in the real world. In an attempt
to overcome this barrier, interpretable and explainable machine
learning have recently emerged as increasingly prominent topics.
In the standard classification setting, the goal is to learn a
classifier that accurately maps data points to two or more
mutually exclusive classes.
In this paper, we focus on a different setting, namely, multi-
label classification. In contrast to the standard setting, in multi-
label classification, a point can be associated with more than
one class at the same time. Though multi-label classification has
been extensively studied, the main focus is still on improving
predictive performance [24]. Significantly less attention has
been paid to interpretability aspects.
Classification rules, due to their simple structure, are gaining
popularity in interpretable multi-label classification literature.
In rule-based approaches, the goal is to learn a set of rules
that captures the most prominent patterns between features
and labels in the data. A rule usually takes the form “{set
of predicates}
→
{set of labels}.” For a given data point,
a rule would predict the associated labels to be present, if
all the predicates in the rule evaluate to true. Due to the
structural simplicity of rules, classifiers based on a set of rules
are generally considered more interpretable than other types
of classifiers, such as neural networks or even decision trees.
The research question that we bring forward is whether
we can design rule-based multi-label classification methods
that are both accurate and interpretable. BOOMER [19], [20],
a recently-proposed rule-based classifier based on gradient
boosting, gives promising results in accuracy. However, despite
being a rule-based approach, its interpretability is limited due
to producing a set of rules that is both too large and redundant.
In this work, we propose CORSET, a rule-based method that
significantly improves over the state-of-the-art BOOMER. The
improvement is due to (1) reducing rule redundancy, which is
achieved by incorporating a term in our objective that penalizes
for rule overlap, and (2) explicitly limiting the complexity
of rules via a suite of novel sampling schemes. As a result,
our method produces a concise set of interpretable rules. An
illustration of the concept of our approach is given in Fig. 1.
Example.
To illustrate the improvement of CORSET over
BOOMER, we consider as an example the
bibtex
dataset, where
each data point represents a scientific article, with bag-of-words
as features and topics as labels. We first consider predictive
performance as a function of the number of rules. In Fig. 2,
we show the (micro-averaged) balanced
F1
scores, a popular
measure for multi-label classification used throughout this
paper, for both CORSET and BOOMER. Due to the conciseness
of its learned rules, CORSET achieves a score close to
0.36
with
about 100 rules, whereas BOOMER needs over
800
rules to
achieve similar performance. Note that CORSET’s performance
starts to drop after about 100 rules, as there are no more
good rules to learn. The drop indicates overfitting, which can
be addressed by standard methods, e.g., cross validation. In
addition, Fig. 3 demonstrates the conciseness of the rules
found by CORSET vs. the ones by BOOMER. Here, we show
a subset of rules as a bipartite graph, where nodes at the
top represent labels and nodes at the bottom represent the
predicates (features). Rules are represented by colors and two
nodes are connected if they are part of the same rule. CORSET
uses fewer rules than BOOMER and rules tend to contain fewer
predicates, resulting in a sparser graph.
arXiv:2210.01533v2 [cs.LG] 7 Nov 2022