Measuring Interpretability of Neural Policies of Robots with Disentangled Representation Tsun-Hsuan Wang Wei Xiao Tim Seyde Ramin Hasani Daniela Rus

2025-04-24
0
0
2.51MB
40 页
10玖币
侵权投诉
Measuring Interpretability of Neural Policies of
Robots with Disentangled Representation
Tsun-Hsuan Wang, Wei Xiao, Tim Seyde, Ramin Hasani, Daniela Rus
Massachusetts Institute of Technology (MIT)
Abstract: The advancement of robots, particularly those functioning in complex
human-centric environments, relies on control solutions that are driven by ma-
chine learning. Understanding how learning-based controllers make decisions is
crucial since robots are often safety-critical systems. This urges a formal and
quantitative understanding of the explanatory factors in the interpretability of
robot learning. In this paper, we aim to study interpretability of compact neu-
ral policies through the lens of disentangled representation. We leverage decision
trees to obtain factors of variation [1] for disentanglement in robot learning; these
encapsulate skills, behaviors, or strategies toward solving tasks. To assess how
well networks uncover the underlying task dynamics, we introduce interpretability
metrics that measure disentanglement of learned neural dynamics from a concen-
tration of decisions, mutual information and modularity perspective. We showcase
the effectiveness of the connection between interpretability and disentanglement
consistently across extensive experimental analysis.
Keywords: Interpretability, Disentangled Representation, Neural Policy
1 Introduction
Observation
𝑑 ≥ 0.1 ∧ 𝜇 ≥ 0.1
0.0 ≤ 𝑑 ≤ 0.2
Neuron Response
Factors of Variation
by Logic Program
Compact Neural Policy
for Robot Learning
Strategy 1 (Stabilize)
Strategy 2 (Recover
At The Right)
Interpretability Measure
by Disentanglement
•Concentration
•Mutual Information
•Modularity
Latent Code
Behavior #1
Behavior #N
Behavior #1
Figure 1: Understand robot behaviors by extracting logic programs as
factors of variation to measure interpretability with disentanglement.
Interpretability of learning-based
robot control is important for
safety-critical applications as it af-
fords human comprehension of
how the system processes inputs
and decides actions. In general,
achieving interpretability is diffi-
cult for learning-based robot con-
trol. The robot learning models
make decisions without being ex-
plicitly programmed to perform the
task and are often very large, thus it
is impossible to synthesize and explain their reasoning processes. This lack of transparency, often
referred to as the ”black box” problem, makes it hard to interpret the workings of learning-based
robot control systems. Understanding why a particular decision was made or predicting how the
system will behave in future scenarios remains a challenge, yet critical for physical deployments.
Through the lens of representation learning, we assume that neural networks capture a set of pro-
cesses that exist in the data distribution; for robots, they manifest learned skills, behaviors, or strate-
gies, which are critical to understand the decision-making of a policy. However, while these factors
of variation [1] (e.g., color or shape representations) are actively studied in unsupervised learning
for disentangled representation, in robot learning, they are less well-defined and pose unique chal-
lenges due to the intertwined correspondence of neural activities with emergent behaviors unknown
a priori. In the present study, we aim to (i) provide a useful definition of factors of variation for
policy learning, and (ii) explore how to uncover dynamics and factors of variation quantitatively as a
measure of interpretability in compact neural networks for closed-loop end-to-end control applica-
7th Conference on Robot Learning (CoRL 2023), Atlanta, USA.
arXiv:2210.06650v2 [cs.LG] 12 Nov 2023
tions. In this space, an entanglement corresponding to multiple neurons responsible for an emergent
behavior can obstruct the interpretation of neuron response even with a small number of neurons
[2,3,4,5]. To this end, the disentanglement of learned representations [6,7,8] in compact neural
networks is essential for deriving explanations and interpretations for neural policies.
We posit that each neuron should learn an abstraction (factor of variation) related to a specific strat-
egy required for solving a sub-component of a task. For example, in locomotion, one neuron may
capture periodic gait, where the numerical value of the neuron response may be aligned with dif-
ferent phases of the gait cycle; another neuron may account for recovery from slipping. Retrieving
the abstraction learned by a neuron is, however, non-trivial. Directly observing the neuron response
along with sensory information provided as input to the policy can be extremely inefficient and
tedious for identifying behaviors and interpreting decision-making.
In this work, our objective is to formulate an abstraction that represents the decision-making of a
parametric policy to quantify the interpretability of learned behaviors, specifically from the perspec-
tive of disentangled representations. To this end, we make the following key contributions:
• Provide a practical definition of factor of variation for robot learning by programmatically extract-
ing decision trees from neural policies, in the form of logic program grounded by world states.
• Introduce a novel set of quantitative metrics of interpretability to assess how well policies uncover
task structures and their factors of variation by measuring the disentanglement of learned neural
dynamics from a concentration of decisions, mutual information, and modularity perspective.
• Experiment in a series of end-to-end policy learning tasks that (a) showcase the effectiveness of
leveraging disentanglement to measure interpretability, (b) demonstrate policy behaviors extracted
from neural responses, (c) unveil interpretable models through the lens of disentanglement.
2 Related Work
Compact neural networks. Compact neural networks are ideal in resource constraints situations
such as robotics and by nature easier to interpret due to a smaller number of neurons [2,4]. Com-
pact networks can be obtained by pruning [9] or compressing neural networks in end-to-end training
[10]. Regularization has also been used to generate compact neural networks [11]. Compact repre-
sentation of features can be learned using discriminative masking [12]. Neural Ordinary Differential
Equations have also been used for learning compact network [13,4]. In this work, we formally study
the interpretability of compact neural policies through the lens of disentangled representation.
Interpretable neural networks. An interpretable neural network could be constructed from a phys-
ically comprehensible perspective [14,15]. Knowledge representation is used to obtain interpretable
Convolutional Neural Network [16]. An active line of research focuses on dissecting and analyzing
trained neural networks in a generic yet post-hoc manner [17,18,19,20]. Another active line of
research is to study disentangled explanatory factors in learned representation [8]. A better repre-
sentation should contain information in a compact and interpretable structure [1,21]. Unlike prior
works that study disentanglement based on factors of variations such as object types, there is no no-
tion of ground-truth factors in robot learning and thus we propose to use decision trees to construct
pseudo-ground-truth factors that capture emergent behaviors of robot for interpretability analysis.
Interpretability in policy learning. Explainable AI has been recently extended to policy learn-
ing like reinforcement learning [22] or for human-AI shared control settings [23]. One line of
research analyzes multi-step trajectories from the perspective of options or compositional skills
[24,25,26,27,28,29]. A more fine-grained single-step alternative is to extract policies via imitation
learning to interpretable models like decision tree [30]. Another line of work directly embeds the
decision tree framework into the learning-based model to strike a balance between expressiveness
and interpretability [31,32,33]. Explanation of policy behaviors can also be obtained by searching
for abstract state with value function [34] or feature importance [35]. In this work, we aim to offer a
new perspective of disentangled representation to measure interpretability in robot policy learning.
3 Method
2
Algorithm 1 Extract Abstraction via Decision Tree
Data Trajectories rollout from a compact neural policy Ddt ={(o0, s0, a0, z0, . . . )j}N
j=1
Result Interpreters of neuron response {fi
S}i∈I
for i∈ I do
Train a decision tree Tθifrom state {st}to neural response {zi
t}.
Collect dataset Ddp with neuron response {zi
t}and decision paths {Pi
st}.
Train neuron response classifier qϕi:R→ {P} with Ddp .
Obtain decision path parser ri:{P} → L by tracing out {Pi
k}Ki
k=1 in Tθi.
Construct the mapping fi
S=ri◦qϕi.
end for
In this section, we describe how to obtain factor of variation by predicting logic programs from
neuron responses that reflect the learned behavior of the policies (Section 3.1), followed by a set of
quantitative measures of interpretability in the lens of disentanglement (Section 3.2).
3.1 Extracting Abstraction via Decision Tree
Our goal is to formulate a logic program that represents the decision-making of a parametric policy
to serve as an abstraction of learned behaviors, summarized in Algorithm 1. First, we describe a
decision process as a tuple {O,S,A, Pa, h}, where at a time instance t,ot∈ O is the observation,
st∈ S is the state, at∈ A is the action, Pa:S × A × S → [0,1] is the (Markovian) transition
probability from current state stto next state st+1 under action at, and h:S → O is the observation
model. We define a neural policy as π:O → A and the response of neuron i∈ I as {zi
t∈R}i∈I ,
where Irefers to a set of neurons to be interpreted. For each neuron i, we aim to construct a mapping
that infers a logic program from neuron response, fi
S:R→ L, where Lis a set of logic programs
grounded on environment states S. Note that fi
Sdoes not take the state as an input as underlying
states may be inaccessible during robot deployment. In the following discussion, we heavily use the
notation Pi
∗for the decision path associated with the i’th neuron, where the subscript ∗refers to the
dependency on state if with parenthesis (like (st)) and otherwise indexing based on the context.
From states to neuron responses. Decision trees are non-parametric supervised learning algo-
rithms for classification and regression. Throughout training, they develop a set of decision rules
based on thresholding one or a subset of input dimensions. The relation across rules is described by
a tree structure with the root node as the starting point of the decision-making process and the leaf
nodes as the predictions. The property of decision trees to convert data for decision making to a set
of propositions is a natural fit for state-grounded logic programs. Given a trained neural policy π,
we collect a set of rollout trajectories Ddt ={τj}N
j=1, where τj= (o0, s0, a0, z0, o1, . . . ). We first
train a decision tree Tθito predict the ith neuron response from states,
θi∗= arg min
θiX
(st,zi
t)∈Ddt
Ldt(ˆzi
t, zi
t),where ˆzi
t=Tθi(st)(1)
where Ldt represents the underlying classification or regression criteria. The decision tree Tθide-
scribes relations between the neuron responses and the relevant states as logical expressions. During
inference, starting from the root node, relevant state dimensions will be checked by the decision rule
in the current node and directed to the relevant lower layer, finally arriving at one of the leaf nodes
and providing information to regress the neuron response. Each inference traces out a route from the
root node to a leaf node. This route is called a decision path. A decision path consists of a sequence
of decision rules defined by nodes visited by the path, which combine to form a logic program,
∧
n∈Pi
(st),j=g(n)(sj
t≤cn)←→ Behavior extracted from ˆzi
tvia Tθi(2)
where ∧is the logical AND, Pi
(st)is the decision path of the tree Tθithat takes stas inputs, g
gives the state dimension used in the decision rule of node n(assume each node uses one feature for
notation simplicity), and cnis the threshold at node n.
From neuron responses to decision paths. So far, we recover a correspondence between the neuron
response ztand the state-grounded program based on decision paths Pi
(st); however, this is not
3
sufficient for deployment since the decision tree Tθirequires as input the ground-truth state and not
the observable data to the policy (like ot, zt). To address this, we find an inverse of Tθiwith neuron
responses as inputs and pre-extracted decision paths as classification targets. Based on the inference
process of Tθi, we can calculate the numerical range of neuron responses associated with a certain
decision path Pi
(st)from the predicted ˆztand then construct the pairs of ztand Pi
st. We collect
another dataset Ddp and train a classifier qϕito predict decision paths from neuron responses,
ϕi∗= arg min
ϕiX
(zi
t,Pi
(st))∈Ddp
Ldp(qϕi(zi
t),Pi
(st))(3)
where Ldp is a classification criterion. While Pi
(st)is state-dependent, there exists a finite set of
decision paths {Pi
k}Ki
k=1 given the generating decision tree. We define the mapping from the decision
tree to the logic program as r:{P} → L, which can be obtained by tracing out the path as described
above. Overall, the desired mapping is readily constructed as fi
S=ri◦qϕi.
3.2 Quantitative Measures of Interpretability
Programmatically extracting decision trees for constructing a mapping from the neuron response to
a logic program offers a representation that facilitates the interpretability of compact neural poli-
cies. Furthermore, building on the computational aspect of our approach, we can quantify the inter-
pretability of a policy with respect to several metrics through the lens of disentanglement.
A. Neuron-Response Variance. Given decision paths {Pi
k}Ki
k=1 associated with a tree Tθiat the ith
neuron, we compute the normalized variance of the neuron response averaged across decision paths,
1
|I| X
i∈I
1
Ki
Ki
X
k=1
Var (st,zi
t)∈Ddt
t∈{u|Pi
(su)=Pi
k}
hzi
t
Zii(4)
where Ziis a normalization factor that depends on the range of response of the ith neuron. The
set {u|Pi
(su)=Pi
k}contains all time steps that exhibit the same behavior as entailed by Pi
k. For
example, suppose we have a trajectory consisting of behaviors including walking and running, and
that walking is depicted as Pi
k, the set refers to all time steps of walking. This metric captures the
concentration of the neuron response that corresponds to the same strategy represented by the logic
program defined by Tθi. In practice, we discretize all neuron responses to Nbins, compute the
index of bins to which a value belongs, divide the index by Nand compute their variance.
B. Mutual Information Gap. Inspired by [21,8], we integrate the notion of mutual information
in our framework to extend disentanglement measures for unsupervised learning to policy learning.
Specifically, while previous literature assumes known ground-truth factors for disentanglement such
as object types, viewing angles, etc., there is no straightforward equivalence in neural policies since
the emergent behaviors or strategies are unknown a priori. To this end, we propose to leverage the
decision path sets to construct pseudo-ground-truth factors Mdp =Si∈I {Pi
k}Ki
k=1 ={Pk}K
k=1.
Note that there may be a correlation across decision paths, i.e., P(Pi,Pj)̸=P(Pi)P(Pj)for
i̸=j. For example, one decision path corresponding to a logic program of the robot moving
forward at high speed has a correlation to another decision path for moving forward at low speed.
This may occur because a neuron of a policy can learn arbitrary behaviors. However, this leads to
a non-orthogonal ground-truth factor set and can be undesirable since high correlations of a neuron
to multiple ground-truth factors (e.g., I[zi;Pi]and I[zi;Pj]are large) can result from not only
entanglement of the neuron but also the correlation between factors (e.g., I[Pi;Pj]is large). Hence,
this urges the need to calibrate mutual information for computing disentanglement measures. We
start by adapting the Mutual Information Gap (MIG) [21] to our framework:
1
K
K
X
k=1
1
H[Pk]I[zi∗;Pk]−max
j̸=i∗I[zj;Pk]−I[zj;Pk;Pkj](5)
where His entropy, Iis interaction information that can take an arbitrary number of variables (with
2 being mutual information), i∗= arg maxiI[zi;Pk], and kj= arg maxlI[zj;Pl]. Intuitively,
4
Table 1: Quantitative results of classical control.
Network
Architecture
Disentanglement Explanation Size ↓Cognitive
Chunks ↓Variance ↓MI-Gap ↑Modularity ↑Vertical Horizontal
FCs 0.0242.005 0.3008.025 0.9412.014 5.00.46 1.91.14 1.65.28
GRU 0.0329.004 0.2764.062 0.9096.022 4.90.80 1.96.17 1.65.25
LSTM 0.0216.003 0.2303.024 0.9355.008 4.75.39 2.02.12 1.90.14
ODE-RNN 0.0287.007 0.3062.041 0.9376.017 4.90.38 1.93.15 1.80.27
CfC 0.0272.004 0.2892.111 0.9067.039 4.70.65 1.82.33 1.50.47
NCP 0.0240.008 0.3653.052 0.9551.019 3.45.83 1.51.33 1.30.32
Table 2: Alignment between disen-
tanglement and explanation quality in
classical control.
Re-signed Rank
Correlation ↑
Explanation Size Cognitive
ChunksVertical Horizontal
Variance -0.146 0.002 0.040
MI-Gap 0.427 0.505 0.449
Modularity -0.114 0.156 0.032
Clockwise Angular Velocity ̇
𝜃 ≥ 0
Static
At All
Positions
Upright
𝜃 ≈ 0
Downward
(Right)
Downward
(Left)
1 2 3 4
12
34
1
2 3 4
(a) Phase Portrait & Neuron Response (b) Emergent Strategies from
Logic Programs
(c) Decision Tree of a Neuron
̇
𝜃 ≤ 0.4
̇
𝜃 ≤ −0.3 𝜃 ≤ −1.3
𝜃 ≤ 1.4
12
3 4
1
True False
Figure 2: In classical control (Pendulum): (a) Phase portrait with empirically measured closed-loop dynamics
and neuron response. Each arrow and colored dot are the results averaged around the binned state space. (b)
Emergent strategies from logic programs. (c) Decision tree extracted for command neuron 3 in NCP.
this measures the normalized difference between the highest and the second-highest mutual informa-
tion of each decision path with individual neuron activation, i.e., how discriminative the correlation
between the neuron response is with one decision path as opposed to the others. For example, neuron
response correlated to multiple factors of variation will have lower MIG than those to one only. The
last term I[zj;Pk;Pkj]is for calibration and captures the inherent correlation between zjand Pk
resulted from potentially nonzero I[Pk;Pkj]with Pkjbeing a proxy random variable of zjin the
ground-truth factor set. We show how to compute I[zj;Pk]−I[zj;Pk;Pkj]in Appendix Section C.
C. Modularity. We compute modularity scores from [36] with the same calibration term,
1
IX
i∈I
1−Pk̸=k∗(I[zi;Pk]−I[zi;Pk;Pk∗])2
(K−1)I[zi;Pk∗]2,(6)
where k∗= arg maxlI[zi;Pl]. For a ideally modular representation, each neuron will have high
mutual information to a single factor of variation and low mutual information with all the others.
Suppose for each neuron ihas the best ”match” with a decision path (ground-truth factor) k∗, non-
modularity of that neuron is computed as the normalized variance of mutual information between
its neuron response and all non-matched decision paths {Pk}k̸=k∗. In practice, we discretize neuron
responses into Nbins to compute discrete mutual information.
4 Experiments
We conduct a series of experiments in various policy-learning tasks to answer the following: (i) How
effective is disentanglement to measure the interpretability of policies? (ii) What can we extract from
neural responses? (iii) What architecture is more interpretable through the lens of disentanglement?
4.1 Setup
Network architecture. We construct compact neural networks for each end-to-end learning to con-
trol task. For all tasks, our networks are constructed by the following priors: (i) Each baseline net-
work is supplied with a perception backbone (e.g., a convolutional neural network) (ii) We construct
policies based on different compact architectures that take in feature vectors from the perception
backbone and output control with comparable cell counts (instead of actual network size in memory
as we assess interpretability metrics down to cell-level). The perception backbone is followed by
a neural controller designed by compact feed-forward and recurrent network architectures includ-
ing fully-connected network (FCs), gated recurrent units (GRU) [37], and long-short term memory
(LSTM) [38]. Additionally, we include advanced continuous-time baselines designed by ordinary
differential equations such as ODE-RNN [39], closed-form continuous-time neural models (CfCs)
[40], and neural circuit policies (NCPs) [4]. We interpret the dynamics of the neurons in the last
layer before the output in FCs, the command-neuron layer of NCPs, and the recurrent state of the
rest. We then extract logic programs and measure interpretability with the proposed metrics.
5
摘要:
展开>>
收起<<
MeasuringInterpretabilityofNeuralPoliciesofRobotswithDisentangledRepresentationTsun-HsuanWang,WeiXiao,TimSeyde,RaminHasani,DanielaRusMassachusettsInstituteofTechnology(MIT)Abstract:Theadvancementofrobots,particularlythosefunctioningincomplexhuman-centricenvironments,reliesoncontrolsolutionsthataredr...
声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
相关推荐
-
2025-04-11 1
-
2025-04-11 0
-
2025-04-11 1
-
2025-04-11 0
-
2025-04-11 2
-
2025-04-11 2
-
2025-04-11 1
-
2025-04-11 2
-
2025-04-11 4
-
2025-05-14 0
分类:图书资源
价格:10玖币
属性:40 页
大小:2.51MB
格式:PDF
时间:2025-04-24
作者详情
-
Voltage-Controlled High-Bandwidth Terahertz Oscillators Based On Antiferromagnets Mike A. Lund1Davi R. Rodrigues2Karin Everschor-Sitte3and Kjetil M. D. Hals1 1Department of Engineering Sciences University of Agder 4879 Grimstad Norway10 玖币0人下载
-
Voltage-controlled topological interface states for bending waves in soft dielectric phononic crystal plates10 玖币0人下载