unsafe in the real-world. It takes a while to build up enough
experience to train the policy to function successfully in
a dynamic environment with moving targets and obstacles.
Therefore, we develop a simulation in Gazebo, very similar
to our real-robot setup, and train the robot in Gazebo initially.
Afterwards, the learned policy is used in real-robot settings
directly. We extensively evaluate the performance of our
approach in both simulation and real-robot using three differ-
ent tasks with ascending levels of difficulties. Experimental
results show that the proposed method produces throws that
are more accurate than baseline alternatives. In summary, our
key contributions are threefold:
•To the best of our knowledge, we are the first group that
addresses object tossing while obstacles are present in
the environment and the target basket is moving.
•Despite only trained using simulation data, the proposed
approach can be directly applied to real-robot. Further-
more, it shows impressive generalization capability to
new target locations and unseen objects.
•Our experiments show that the trained policy could
achieve above 80% object throwing accuracy for the
most difficult task (i.e., throwing object into the basket
while there is an obstacle obstructing the path) in both
simulation and real robot environments.
II. RELATED WORK
The robotics community has long been interested in giving
service robots the ability to throw objects [2], [3], [4], [5],
[6]. Throwing formulae were mostly influenced by analyt-
ical models in the late 1990s and early 2000s [7], while
such formulations are increasingly moving toward learning
approaches today [8], [4]. In the following subsections, we
briefly review these approaches.
A. Analytical Approaches
Earlier throwing systems relied on handcrafting or me-
chanical analysis and then optimizing control parameters to
execute a throw such that the projectile (typically a ball)
lands at a target location. As we previously highlighted, pre-
cisely modeling of dynamics is difficult because it calls for
knowledge about the physical characteristics of the object,
gripper and environment, which are hard to quantify [7]. For
instance, Y. Gai et al., derived an analytical approach for
throwing a ball using a manipulator with a single flexible
link through Hamilton’s principle [3]. This is an example
of tuning for a single object, a ball in this case. In another
work, Jwu-Sheng Hu et al., [2] discussed a stereo vision
system for throwing a ball into a basket. They calculated the
ball-throwing transformation for a specific ball object based
on cubic polynomial. In [9], an analytical approach is used
to predict the end-effector velocity (magnitude and direction)
as well as a duration movement for underhand throwing task
by a humanoid robot. Such approaches to some extend work
for specific scenario but have difficulties generalizing over
changing dynamics and various objects.
B. Learning Approaches
Unlike analytical approaches for throwing, learning-based
methods enable robots to learn/optimize the main task di-
rectly through success or failure signals. In general, learning-
based throwing approaches demonstrate better performance
than analytical methods[10], [11]. In [10], a deep predictive
policy training architecture (DPPT) is presented to teach
a PR2 robot object-grasping and ball-throwing tasks. They
showed DPPT is successful in both simulated and real robots.
In another work, Kober et al. [11] introduced an RL-based
method for dart throwing task based on a kernelized version
of the reward-weighted regression. In both of these works,
the properties of the object (ball and dart) are known a-
priori. In contrast to both of these approaches, we do not
make assumptions about the physical properties of objects
that are thrown.
In some other works, researchers tried to combine the
potential of analytical and learning approaches for robotic
throwing tasks. In particular, analytical models are used to
approximate the initial control parameters, and a learning-
based model is used to estimate residual parameters to adjust
the initial parameters. Such approaches are called residual
physics. For instance, [4] proposed TossingBot, an end-to-
end self-supervised learning method for learning to throw
arbitrary objects with residual physics. Similar to our work,
their approach was able to throw an object into a basket.
Unlike our approach, they used an analytical approach for
estimating initial control parameters, and then used an end-
to-end formulation for learning residual velocity for throwing
motion primitives. We formulate the throwing task as an RL
problem that modulates the parameters of a kernel motion
generator. In contrast to all reviewed works, our formulation
allows the robot to throw the object into a moving basket
while avoiding present obstacles, whereas, in all reviewed
works, the throwing task is considered in an obstacle-free
environment where the target is static and known in advance.
III. METHOD
In this section, the preliminaries are briefly reviewed, fol-
lowed by a discussion of how we formulate object throwing
as an RL problem. The perception that represents the world
model at each time step is the subject of the last subsection.
A. Preliminaries
Markov Decision Process (MDP): An MDP can be
described as a tuple containing four basic elements:
(st, at, p(st+1|st, at), r(st+1|st, at)), where the stand atare
the continuous state and action at time step t, respectively.
p(st+1|st, at)shows the transition probability function to
reach to the next state st+1 given the current state stand
action at. The r(st+1|st, at)denotes the immediate reward
received from the environment after the state transition.
Off-policy RL: In online RL, an agent continuously
interacts with the environment to accumulate experiences for
learning the optimal policyπ∗. The agent seeks to maximize
the expected future return Rt=E[P∞
i=tγi−tri+1]with a
discounted factor γ∈[0,1] weighting the future importance.