
(q1, R1, p1,Π1)
(q0, R0, p0,Π0)
.
.
.
ϕθ,Lie T2
h
. . . ϕθ,Lie T2
h
ϕθ,Lie T2
h
. . . ϕθ,Lie T2
h
Hlayer SRNN
(ˆq1,ˆ
R1,ˆp1,ˆ
Π1)
(ˆq2,ˆ
R2,ˆp2,ˆ
Π2)L(θ)
Figure 2: Inputs are fed through a recurrent layer with Lie
T2
. Prediction error is used as a loss on
θ
.
through a
1/r
gravitational potential. However, planets are not point masses, and their rotations
matter because they shape planetary climates [
28
,
29
] and even feedback to their orbits [
30
]. This
already starts to alter
V
even if one only considers classical gravity. For example, the gravitational
potential Vfor interacting bodies of finite sizes should be V(q,R)=Pi<j Vi,j , where
Vi,j (q,R)=ZBiZBj−Gρ(xi)ρ(xj)
kqi+Rixi−qj−Rjxjkdxidxj=−Gmimj
kqi−qjk
| {z }
Vi,j,point
+O 1
kqi−qjk2!
| {z }
Vi,j,resid
.(2)
Working with the full potential is complicated since
Bi
is not known and the integral is not analytically
known. Can we directly learn Vresid from time-series data?
Classical gravity (i.e. Newtonian physics) is not the only driver of planetary motion — tidal forces
and general relativity (GR) matter too. The former provides a dissipation mechanism and plays
critical roles in altering planetary orbits [
31
,
32
]; the latter doesn’t need much explanation and has
been demonstrated by, e.g., Mercury’s precessing orbit [
33
]. Tidal forces depend on celestial bodies’
rotations [
34
] and thus is a function of both
q,R
. GR’s effects cannot be fully characterized with
classical coordinates
q,R,p,Π
, but post-Newtonian approximations based purely on these coordinates
are popular [35]. Can we learn both purely from data if we did not have theories for either?
In addition to the scientific questions, there are also significant machine learning challenges:
Multiscale dynamics.
Rigid-body correction (
Vresid
), tidal force, and GR correction are all much
smaller forces compared to point-mass gravity. Consequently, their effects do not manifest until long
time. Thus, one challenge for learning them is that the dynamical system exhibits different behaviors
over
multiple timescales
. It is reasonable to require long time series data for the small effects to be
learned; meanwhile, when observations are expensive to make, the observation time step
∆t
can be
much longer than the smallest timescales. Can we still learn the physics in this case? We will leverage
symplectic integrator and its mild growth of error over long time [
7
,
26
] to provide a positive answer.
Respecting the Lie group manifold.
However, even having a symplectic integrator is not enough
because the position variable of the latent dynamics (i.e. truth) stays on
SE(3)⊗N
. If the integrated
solution falls off this manifold such that
R|R=I
no longer holds, it is not only incorrect but likely
misleading for the learning of
V(q,R)
. Popular integrators such as forward Euler, Runge-Kutta 4
(RK4) and Leapfrog [1,6,36] unfortunately do no maintain the manifold structure.
2.2 Learning with Lie Symplectic RNNs
Our method can be viewed as a Lie-group generalization of the seminal work of SRNN [
6
], where a
good integrator that is both symplectic and Lie-group preserving is employed as a recurrent block.
Lie T2: A Symplectic Lie-Group Preserving Integrator.
To construct an integrator that achieves
both properties, we borrow from [
20
] the idea of Lie-group and symplecticity preserving splitting, and
split our Hamiltonian as
H=HKE +HPE +Hasym
, which contains the axial-symmetric kinetic energy,
potential energy and asymmetric kinetic energy correction terms. This enables computing the exact
integrators
φ[KE]
t,φ[PE]
t
and
φ[asym]
t
(see App Bfor details). We then construct a 2nd-order symplectic
integrator Lie
T2
by applying the Strang composition scheme. To account for non-conservative forces,
the corresponding non-conservative momentum update
φ[force]:(p,Π)←F(q,R,p,Π)
is inserted in
the middle of the composition [20]. This gives φ[Lie T2]
hfor stepsize has
φLie T2
h:=φ[KE]
h/2◦φ[PE]
h/2◦φ[asym]
h/2◦φ[force]
h◦φ[asym]
h/2◦φ[PE]
h/2◦φ[KE]
h/2(3)
A Recurrent Architecture for Nonlinear Regression.
Given the simplicity of
Vpoint
, we assume this
is known and learn
Vresid
and
Fθ
with multi-layer perceptron (MLP)
Vθ
resid
and
Fθ
without assuming
any pairwise structure (see App Cfor discussion). We then use
φθ,Lie T2
to integrate dynamics
forward, where
θ
denotes the dependence on the networks. However, when the temporal spacing
between observations
∆t
is large, using a single
φθ,Lie T2
∆t
will result in large errors for the fast
3