
be identified by the agents. In this work, we explore how to define and detect task delineations to
enhance robots’ learning capabilities in task-agnostic CRL.
Equivariant
Policy
Network
Reflect Task
Configuration
Reflect
Action
Figure 1: This example illustrates how group
symmetry enhances adaptability. The robot is
instructed to close drawers situated in two dis-
tinct locations with top-down images as inputs.
Considering the symmetry of the drawers’ lo-
cations around the robot’s position, the optimal
control policies are equivalent but mirrored.
Our key insight is that robotic control tasks typ-
ically preserve certain desirable structures, such
as group symmetries. Existing CRL approaches
typically delineate task boundaries based on sta-
tistical measures, such as maximum a posteriori
estimates and likelihoods [7,8]. However, these
measures overlook the geometric information in-
herent in task representations, which naturally
emerge in robotic control tasks, as demonstrated
in Figure 1. Consider the drawer-closing exam-
ple: conventional CRL works using image inputs
would treat each mirrored configuration as a new
task and learn the task from scratch. Yet, we, as
humans, understand that the mirrored task config-
uration can be easily resolved by correspondingly
reflecting the actions. Learning the mirrored task
from scratch hampers positive task interference
and limits the agent’s adaptivity. To address this
issue, our goal is to exploit the geometric sim-
ilarity among tasks in the task-agnostic CRL setting to facilitate rapid adaptation to unseen but
geometrically equivalent tasks.
In this work, we propose COVERS, a task-agnostic vision-based CRL algorithm with strong sample
efficiency and generalization capability by encoding group symmetries in the state and action spaces.
We define a task group as the set that contains equivalent tasks under the same group operation, such
as rotations and reflections. We state our main contributions as follows:
1. COVERS grows a PPO-based [10] policy with an equivariant feature extractor for each task
group, instead of a single task, to solve unseen tasks in seen groups in a zero-shot manner.
2. COVERS utilizes a novel unsupervised task grouping mechanism, which automatically
detects group boundaries based on 1-Wasserstein distance in the invariant feature space.
3. In non-stationary table-top manipulation environments, COVERS performs better than
baselines in terms of average rewards and success rates. Moreover, we show that (a) the
group symmetric information from the equivariant feature extractor promotes the adaptiv-
ity by maximizing the positive interference within each group, and (b) the task grouping
mechanism recovers the ground truth group indexes, which helps minimize the negative
interference among different groups.
2 Related Work
Task-Agnostic CRL. CRL has been a long-standing problem that aims to train RL agents adaptable
to non-stationary environments with evolving world models [11,12,13,14,15,5,16,17,18,19]. In
task-agnostic CRL where task identifications are unrevealed, existing methods have addressed the
problem through a range of techniques. These include hierarchical task modeling with stochastic
processes [7,8], meta-learning [3,20], online system identification [21], learning a representation
from experience [9,22], and experience replay [14,23]. Considering that in realistic situations, the
new task may not belong to the same task distribution as past tasks, we develop an ensemble model of
policy networks capable of handling diverse unseen tasks, rather than relying on a single network to
model dynamics or latent representations. Moreover, prior work often depends on data distribution-
wise similarity or distances between latent variables, implicitly modeling task relationships. In
contrast, we aim to introduce beneficial inductive bias explicitly by developing policy networks
with equivariant feature extractors to capture the geometric structures of tasks.
2