
reward function as well as stochastic dynamics, and it uses measures of dispersion such as the stan-
dard deviation of Q-values to drive the refinement process. The main contributions of this paper
are mechanisms for building conditional abstraction trees that help compute and represent such ab-
stractions, and the process of interleaving RL episodes with phases of abstraction and refinement.
Although this process could be adapted to numerous RL algorithms, we focus on developing and
investigating it with Q-learning in this paper.
Figure 1: Consider a classic taxi world with two passengers
and a building as the drop-off location where the green area is
impassable (left). Meaningful conditional abstractions can be
constructed, for example, for situations where both passen-
gers are at their pickup locations (middle), or one passenger
has already been picked-up (right).
The presented approach for dynamic ab-
stractions for RL (DAR+RL) can be
thought of as a dynamic abstraction
scheme because the refinement is tied to
the dispersion of Q-values based on the
agent’s evolving policy during learning. It
provides adjustable degrees of compres-
sion (Abel et al., 2016) where the aggres-
siveness of abstraction can be controlled
by tuning the definition of variation in the
dispersion of Q-values. Extensive empir-
ical evaluation on multiple domains and
problems shows that this approach auto-
matically learns abstract representations
that effectively draw out similarities across
the state space, and yield powerful sample efficiency in learning. Comparative evaluation shows that
Q-learning based RL agents enhanced with our approach outperform state-of-the-art RL approaches
in both discrete and continuous domains while learning meaningful abstract representations.
The rest of this paper is organized as follows. Sec. 2 summarizes the related work followed by a
discussion on the necessary backgrounds in Sec. 3. Sec. 4 presents our dynamic abstraction learning
method for sample-efficient RL. The empirical evaluations are demonstrated in Sec. 5 followed by
the conclusions in Sec. 6.
2 RELATED WORK
Offline State Abstraction. Most early studies focus on action-specific (Dietterich, 1999) and
option-specific (Jonsson & Barto, 2000) state abstraction. Further, Givan et al. (2003) introduced
the notion of state equivalence to possibly reduce the state space size by which two states can be
aggregated into one abstract state if applying a mutual action leads to equivalence states with similar
rewards. Later on, Ravindran & Barto (2004) relaxed this definition of state equivalence by allowing
the actions to be different if there is a valid mapping between them. Offline state abstraction has
further been studied for generalization and transfer in RL (Karia & Srivastava, 2022) and planning
(Srivastava et al., 2012).
Graph-Theoretic State Abstraction. Mannor et al. (2004) developed a graph-theoretic state ab-
straction approach that utilizes the topological similarities of a state transition graph (STG) to ag-
gregate states in an online manner. Mannor’s definition of state abstraction follows Givan’s notion
of equivalence states except they update the partial STG iteratively to find the abstractions. Another
comparable method proposed by Chiu & Soo (2010) carries out spectral graph analysis on STG to
decompose the graph into multiple sub-graphs. However, most graph-theoretic analyses on STG,
such as computing the eigenvectors in Chiu & Soo’s work, can become infeasible for problems with
large-scale state space.
Monte-Carlo Tree Search (MCTS). MCTS approaches offer viable and tractable algorithms for
large state-space Markovian decision problems (Kocsis & Szepesv´
ari, 2006). Jiang et al. (2014)
demonstrated that proper abstraction effectively enhances the performance of MCTS algorithms.
However, their clustering-based state abstraction approach is limited to the states enumerated by
their algorithm within the partially expanded tree, which makes it ineffectual when limited samples
are available to the planning/learning agent. Anand et al. (2015) advanced Jiang’s method by com-
prehensively aggregating states and state-action pairs aiming to uncover more symmetries in the do-
main. Owing to their novel state-action pair abstraction extending Givan and Ravindran’s notions of
abstractions, Anand et al.’s method results in higher quality policies compared to other approaches
2