Measuring Interpretability of Neural Policies of Robots with Disentangled Representation Tsun-Hsuan Wang Wei Xiao Tim Seyde Ramin Hasani Daniela Rus

2025-04-24 1 0 2.51MB 40 页 10玖币

侵权投诉

Measuring Interpretability of Neural Policies of

Robots with Disentangled Representation

Tsun-Hsuan Wang, Wei Xiao, Tim Seyde, Ramin Hasani, Daniela Rus

Massachusetts Institute of Technology (MIT)

Abstract: The advancement of robots, particularly those functioning in complex

human-centric environments, relies on control solutions that are driven by ma-

chine learning. Understanding how learning-based controllers make decisions is

crucial since robots are often safety-critical systems. This urges a formal and

quantitative understanding of the explanatory factors in the interpretability of

robot learning. In this paper, we aim to study interpretability of compact neu-

ral policies through the lens of disentangled representation. We leverage decision

trees to obtain factors of variation [1] for disentanglement in robot learning; these

encapsulate skills, behaviors, or strategies toward solving tasks. To assess how

well networks uncover the underlying task dynamics, we introduce interpretability

metrics that measure disentanglement of learned neural dynamics from a concen-

tration of decisions, mutual information and modularity perspective. We showcase

the effectiveness of the connection between interpretability and disentanglement

consistently across extensive experimental analysis.

Keywords: Interpretability, Disentangled Representation, Neural Policy

1 Introduction

Observation

𝑑 ≥ 0.1 ∧ 𝜇 ≥ 0.1

0.0 ≤ 𝑑 ≤ 0.2

Neuron Response

Factors of Variation

by Logic Program

Compact Neural Policy

for Robot Learning

Strategy 1 (Stabilize)

Strategy 2 (Recover

At The Right)

Interpretability Measure

by Disentanglement

•Concentration

•Mutual Information

•Modularity

Latent Code

Behavior #1

Behavior #N

Behavior #1

Figure 1: Understand robot behaviors by extracting logic programs as

factors of variation to measure interpretability with disentanglement.

Interpretability of learning-based

robot control is important for

safety-critical applications as it af-

fords human comprehension of

how the system processes inputs

and decides actions. In general,

achieving interpretability is difﬁ-

cult for learning-based robot con-

trol. The robot learning models

make decisions without being ex-

plicitly programmed to perform the

task and are often very large, thus it

is impossible to synthesize and explain their reasoning processes. This lack of transparency, often

referred to as the ”black box” problem, makes it hard to interpret the workings of learning-based

robot control systems. Understanding why a particular decision was made or predicting how the

system will behave in future scenarios remains a challenge, yet critical for physical deployments.

Through the lens of representation learning, we assume that neural networks capture a set of pro-

cesses that exist in the data distribution; for robots, they manifest learned skills, behaviors, or strate-

gies, which are critical to understand the decision-making of a policy. However, while these factors

of variation [1] (e.g., color or shape representations) are actively studied in unsupervised learning

for disentangled representation, in robot learning, they are less well-deﬁned and pose unique chal-

lenges due to the intertwined correspondence of neural activities with emergent behaviors unknown

a priori. In the present study, we aim to (i) provide a useful deﬁnition of factors of variation for

policy learning, and (ii) explore how to uncover dynamics and factors of variation quantitatively as a

measure of interpretability in compact neural networks for closed-loop end-to-end control applica-

7th Conference on Robot Learning (CoRL 2023), Atlanta, USA.

arXiv:2210.06650v2 [cs.LG] 12 Nov 2023

tions. In this space, an entanglement corresponding to multiple neurons responsible for an emergent

behavior can obstruct the interpretation of neuron response even with a small number of neurons

[2,3,4,5]. To this end, the disentanglement of learned representations [6,7,8] in compact neural

networks is essential for deriving explanations and interpretations for neural policies.

We posit that each neuron should learn an abstraction (factor of variation) related to a speciﬁc strat-

egy required for solving a sub-component of a task. For example, in locomotion, one neuron may

capture periodic gait, where the numerical value of the neuron response may be aligned with dif-

ferent phases of the gait cycle; another neuron may account for recovery from slipping. Retrieving

the abstraction learned by a neuron is, however, non-trivial. Directly observing the neuron response

along with sensory information provided as input to the policy can be extremely inefﬁcient and

tedious for identifying behaviors and interpreting decision-making.

In this work, our objective is to formulate an abstraction that represents the decision-making of a

parametric policy to quantify the interpretability of learned behaviors, speciﬁcally from the perspec-

tive of disentangled representations. To this end, we make the following key contributions:

• Provide a practical deﬁnition of factor of variation for robot learning by programmatically extract-

ing decision trees from neural policies, in the form of logic program grounded by world states.

• Introduce a novel set of quantitative metrics of interpretability to assess how well policies uncover

task structures and their factors of variation by measuring the disentanglement of learned neural

dynamics from a concentration of decisions, mutual information, and modularity perspective.

• Experiment in a series of end-to-end policy learning tasks that (a) showcase the effectiveness of

leveraging disentanglement to measure interpretability, (b) demonstrate policy behaviors extracted

from neural responses, (c) unveil interpretable models through the lens of disentanglement.

2 Related Work

Compact neural networks. Compact neural networks are ideal in resource constraints situations

such as robotics and by nature easier to interpret due to a smaller number of neurons [2,4]. Com-

pact networks can be obtained by pruning [9] or compressing neural networks in end-to-end training

[10]. Regularization has also been used to generate compact neural networks [11]. Compact repre-

sentation of features can be learned using discriminative masking [12]. Neural Ordinary Differential

Equations have also been used for learning compact network [13,4]. In this work, we formally study

the interpretability of compact neural policies through the lens of disentangled representation.

Interpretable neural networks. An interpretable neural network could be constructed from a phys-

ically comprehensible perspective [14,15]. Knowledge representation is used to obtain interpretable

Convolutional Neural Network [16]. An active line of research focuses on dissecting and analyzing

trained neural networks in a generic yet post-hoc manner [17,18,19,20]. Another active line of

research is to study disentangled explanatory factors in learned representation [8]. A better repre-

sentation should contain information in a compact and interpretable structure [1,21]. Unlike prior

works that study disentanglement based on factors of variations such as object types, there is no no-

tion of ground-truth factors in robot learning and thus we propose to use decision trees to construct

pseudo-ground-truth factors that capture emergent behaviors of robot for interpretability analysis.

Interpretability in policy learning. Explainable AI has been recently extended to policy learn-

ing like reinforcement learning [22] or for human-AI shared control settings [23]. One line of

research analyzes multi-step trajectories from the perspective of options or compositional skills

[24,25,26,27,28,29]. A more ﬁne-grained single-step alternative is to extract policies via imitation

learning to interpretable models like decision tree [30]. Another line of work directly embeds the

decision tree framework into the learning-based model to strike a balance between expressiveness

and interpretability [31,32,33]. Explanation of policy behaviors can also be obtained by searching

for abstract state with value function [34] or feature importance [35]. In this work, we aim to offer a

new perspective of disentangled representation to measure interpretability in robot policy learning.

3 Method

Algorithm 1 Extract Abstraction via Decision Tree

Data Trajectories rollout from a compact neural policy Ddt ={(o0, s0, a0, z0, . . . )j}N

j=1

Result Interpreters of neuron response {fi

S}i∈I

for i∈ I do

Train a decision tree Tθifrom state {st}to neural response {zi

t}.

Collect dataset Ddp with neuron response {zi

t}and decision paths {Pi

st}.

Train neuron response classiﬁer qϕi:R→ {P} with Ddp .

Obtain decision path parser ri:{P} → L by tracing out {Pi

k}Ki

k=1 in Tθi.

Construct the mapping fi

S=ri◦qϕi.

end for

In this section, we describe how to obtain factor of variation by predicting logic programs from

neuron responses that reﬂect the learned behavior of the policies (Section 3.1), followed by a set of

quantitative measures of interpretability in the lens of disentanglement (Section 3.2).

3.1 Extracting Abstraction via Decision Tree

Our goal is to formulate a logic program that represents the decision-making of a parametric policy

to serve as an abstraction of learned behaviors, summarized in Algorithm 1. First, we describe a

decision process as a tuple {O,S,A, Pa, h}, where at a time instance t,ot∈ O is the observation,

st∈ S is the state, at∈ A is the action, Pa:S × A × S → [0,1] is the (Markovian) transition

probability from current state stto next state st+1 under action at, and h:S → O is the observation

model. We deﬁne a neural policy as π:O → A and the response of neuron i∈ I as {zi

t∈R}i∈I ,

where Irefers to a set of neurons to be interpreted. For each neuron i, we aim to construct a mapping

that infers a logic program from neuron response, fi

S:R→ L, where Lis a set of logic programs

grounded on environment states S. Note that fi

Sdoes not take the state as an input as underlying

states may be inaccessible during robot deployment. In the following discussion, we heavily use the

notation Pi

∗for the decision path associated with the i’th neuron, where the subscript ∗refers to the

dependency on state if with parenthesis (like (st)) and otherwise indexing based on the context.

From states to neuron responses. Decision trees are non-parametric supervised learning algo-

rithms for classiﬁcation and regression. Throughout training, they develop a set of decision rules

based on thresholding one or a subset of input dimensions. The relation across rules is described by

a tree structure with the root node as the starting point of the decision-making process and the leaf

nodes as the predictions. The property of decision trees to convert data for decision making to a set

of propositions is a natural ﬁt for state-grounded logic programs. Given a trained neural policy π,

we collect a set of rollout trajectories Ddt ={τj}N

j=1, where τj= (o0, s0, a0, z0, o1, . . . ). We ﬁrst

train a decision tree Tθito predict the ith neuron response from states,

θi∗= arg min

θiX

(st,zi

t)∈Ddt

Ldt(ˆzi

t, zi

t),where ˆzi

t=Tθi(st)(1)

where Ldt represents the underlying classiﬁcation or regression criteria. The decision tree Tθide-

scribes relations between the neuron responses and the relevant states as logical expressions. During

inference, starting from the root node, relevant state dimensions will be checked by the decision rule

in the current node and directed to the relevant lower layer, ﬁnally arriving at one of the leaf nodes

and providing information to regress the neuron response. Each inference traces out a route from the

root node to a leaf node. This route is called a decision path. A decision path consists of a sequence

of decision rules deﬁned by nodes visited by the path, which combine to form a logic program,

∧

n∈Pi

(st),j=g(n)(sj

t≤cn)←→ Behavior extracted from ˆzi

tvia Tθi(2)

where ∧is the logical AND, Pi

(st)is the decision path of the tree Tθithat takes stas inputs, g

gives the state dimension used in the decision rule of node n(assume each node uses one feature for

notation simplicity), and cnis the threshold at node n.

From neuron responses to decision paths. So far, we recover a correspondence between the neuron

response ztand the state-grounded program based on decision paths Pi

(st); however, this is not

sufﬁcient for deployment since the decision tree Tθirequires as input the ground-truth state and not

the observable data to the policy (like ot, zt). To address this, we ﬁnd an inverse of Tθiwith neuron

responses as inputs and pre-extracted decision paths as classiﬁcation targets. Based on the inference

process of Tθi, we can calculate the numerical range of neuron responses associated with a certain

decision path Pi

(st)from the predicted ˆztand then construct the pairs of ztand Pi

st. We collect

another dataset Ddp and train a classiﬁer qϕito predict decision paths from neuron responses,

ϕi∗= arg min

ϕiX

(zi

t,Pi

(st))∈Ddp

Ldp(qϕi(zi

t),Pi

(st))(3)

where Ldp is a classiﬁcation criterion. While Pi

(st)is state-dependent, there exists a ﬁnite set of

decision paths {Pi

k}Ki

k=1 given the generating decision tree. We deﬁne the mapping from the decision

tree to the logic program as r:{P} → L, which can be obtained by tracing out the path as described

above. Overall, the desired mapping is readily constructed as fi

S=ri◦qϕi.

3.2 Quantitative Measures of Interpretability

Programmatically extracting decision trees for constructing a mapping from the neuron response to

a logic program offers a representation that facilitates the interpretability of compact neural poli-

cies. Furthermore, building on the computational aspect of our approach, we can quantify the inter-

pretability of a policy with respect to several metrics through the lens of disentanglement.

A. Neuron-Response Variance. Given decision paths {Pi

k}Ki

k=1 associated with a tree Tθiat the ith

neuron, we compute the normalized variance of the neuron response averaged across decision paths,

|I| X

i∈I

k=1

Var (st,zi

t)∈Ddt

t∈{u|Pi

(su)=Pi

hzi

Zii(4)

where Ziis a normalization factor that depends on the range of response of the ith neuron. The

set {u|Pi

(su)=Pi

k}contains all time steps that exhibit the same behavior as entailed by Pi

k. For

example, suppose we have a trajectory consisting of behaviors including walking and running, and

that walking is depicted as Pi

k, the set refers to all time steps of walking. This metric captures the

concentration of the neuron response that corresponds to the same strategy represented by the logic

program deﬁned by Tθi. In practice, we discretize all neuron responses to Nbins, compute the

index of bins to which a value belongs, divide the index by Nand compute their variance.

B. Mutual Information Gap. Inspired by [21,8], we integrate the notion of mutual information

in our framework to extend disentanglement measures for unsupervised learning to policy learning.

Speciﬁcally, while previous literature assumes known ground-truth factors for disentanglement such

as object types, viewing angles, etc., there is no straightforward equivalence in neural policies since

the emergent behaviors or strategies are unknown a priori. To this end, we propose to leverage the

decision path sets to construct pseudo-ground-truth factors Mdp =Si∈I {Pi

k}Ki

k=1 ={Pk}K

k=1.

Note that there may be a correlation across decision paths, i.e., P(Pi,Pj)̸=P(Pi)P(Pj)for

i̸=j. For example, one decision path corresponding to a logic program of the robot moving

forward at high speed has a correlation to another decision path for moving forward at low speed.

This may occur because a neuron of a policy can learn arbitrary behaviors. However, this leads to

a non-orthogonal ground-truth factor set and can be undesirable since high correlations of a neuron

to multiple ground-truth factors (e.g., I[zi;Pi]and I[zi;Pj]are large) can result from not only

entanglement of the neuron but also the correlation between factors (e.g., I[Pi;Pj]is large). Hence,

this urges the need to calibrate mutual information for computing disentanglement measures. We

start by adapting the Mutual Information Gap (MIG) [21] to our framework:

k=1

H[Pk]I[zi∗;Pk]−max

j̸=i∗I[zj;Pk]−I[zj;Pk;Pkj](5)

where His entropy, Iis interaction information that can take an arbitrary number of variables (with

2 being mutual information), i∗= arg maxiI[zi;Pk], and kj= arg maxlI[zj;Pl]. Intuitively,

Table 1: Quantitative results of classical control.

Network

Architecture

Disentanglement Explanation Size ↓Cognitive

Chunks ↓Variance ↓MI-Gap ↑Modularity ↑Vertical Horizontal

FCs 0.0242.005 0.3008.025 0.9412.014 5.00.46 1.91.14 1.65.28

GRU 0.0329.004 0.2764.062 0.9096.022 4.90.80 1.96.17 1.65.25

LSTM 0.0216.003 0.2303.024 0.9355.008 4.75.39 2.02.12 1.90.14

ODE-RNN 0.0287.007 0.3062.041 0.9376.017 4.90.38 1.93.15 1.80.27

CfC 0.0272.004 0.2892.111 0.9067.039 4.70.65 1.82.33 1.50.47

NCP 0.0240.008 0.3653.052 0.9551.019 3.45.83 1.51.33 1.30.32

Table 2: Alignment between disen-

tanglement and explanation quality in

classical control.

Re-signed Rank

Correlation ↑

Explanation Size Cognitive

ChunksVertical Horizontal

Variance -0.146 0.002 0.040

MI-Gap 0.427 0.505 0.449

Modularity -0.114 0.156 0.032

Clockwise Angular Velocity ̇

𝜃 ≥ 0

Static

At All

Positions

Upright

𝜃 ≈ 0

Downward

(Right)

Downward

(Left)

1 2 3 4

2 3 4

(a) Phase Portrait & Neuron Response (b) Emergent Strategies from

Logic Programs

𝜃 ≤ 0.4

𝜃 ≤ −0.3 𝜃 ≤ −1.3

𝜃 ≤ 1.4

3 4

True False

Figure 2: In classical control (Pendulum): (a) Phase portrait with empirically measured closed-loop dynamics

and neuron response. Each arrow and colored dot are the results averaged around the binned state space. (b)

Emergent strategies from logic programs. (c) Decision tree extracted for command neuron 3 in NCP.

this measures the normalized difference between the highest and the second-highest mutual informa-

tion of each decision path with individual neuron activation, i.e., how discriminative the correlation

between the neuron response is with one decision path as opposed to the others. For example, neuron

response correlated to multiple factors of variation will have lower MIG than those to one only. The

last term I[zj;Pk;Pkj]is for calibration and captures the inherent correlation between zjand Pk

resulted from potentially nonzero I[Pk;Pkj]with Pkjbeing a proxy random variable of zjin the

ground-truth factor set. We show how to compute I[zj;Pk]−I[zj;Pk;Pkj]in Appendix Section C.

C. Modularity. We compute modularity scores from [36] with the same calibration term,

i∈I

1−Pk̸=k∗(I[zi;Pk]−I[zi;Pk;Pk∗])2

(K−1)I[zi;Pk∗]2,(6)

where k∗= arg maxlI[zi;Pl]. For a ideally modular representation, each neuron will have high

mutual information to a single factor of variation and low mutual information with all the others.

Suppose for each neuron ihas the best ”match” with a decision path (ground-truth factor) k∗, non-

modularity of that neuron is computed as the normalized variance of mutual information between

its neuron response and all non-matched decision paths {Pk}k̸=k∗. In practice, we discretize neuron

responses into Nbins to compute discrete mutual information.

4 Experiments

We conduct a series of experiments in various policy-learning tasks to answer the following: (i) How

effective is disentanglement to measure the interpretability of policies? (ii) What can we extract from

neural responses? (iii) What architecture is more interpretable through the lens of disentanglement?

4.1 Setup

Network architecture. We construct compact neural networks for each end-to-end learning to con-

trol task. For all tasks, our networks are constructed by the following priors: (i) Each baseline net-

work is supplied with a perception backbone (e.g., a convolutional neural network) (ii) We construct

policies based on different compact architectures that take in feature vectors from the perception

backbone and output control with comparable cell counts (instead of actual network size in memory

as we assess interpretability metrics down to cell-level). The perception backbone is followed by

a neural controller designed by compact feed-forward and recurrent network architectures includ-

ing fully-connected network (FCs), gated recurrent units (GRU) [37], and long-short term memory

(LSTM) [38]. Additionally, we include advanced continuous-time baselines designed by ordinary

differential equations such as ODE-RNN [39], closed-form continuous-time neural models (CfCs)

[40], and neural circuit policies (NCPs) [4]. We interpret the dynamics of the neurons in the last

layer before the output in FCs, the command-neuron layer of NCPs, and the recurrent state of the

rest. We then extract logic programs and measure interpretability with the proposed metrics.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

MeasuringInterpretabilityofNeuralPoliciesofRobotswithDisentangledRepresentationTsun-HsuanWang,WeiXiao,TimSeyde,RaminHasani,DanielaRusMassachusettsInstituteofTechnology(MIT)Abstract:Theadvancementofrobots,particularlythosefunctioningincomplexhuman-centricenvironments,reliesoncontrolsolutionsthataredr...

展开>> 收起<<

Measuring Interpretability of Neural Policies of Robots with Disentangled Representation Tsun-Hsuan Wang Wei Xiao Tim Seyde Ramin Hasani Daniela Rus.pdf

共40页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Measuring Interpretability of Neural Policies of Robots with Disentangled Representation Tsun-Hsuan Wang Wei Xiao Tim Seyde Ramin Hasani Daniela Rus

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: