
2. Methodologies
2.1. ODE-Net for crowd dynamics
We start by introducing the ODE-Net from a deep neural
network perspective. Traditional deep neural networks, such
as residual networks, build complicated transformations by
composing a sequence of transformations to a hidden state:
zt+δt−zt
δt
=ht(zt), δt= 1,(2.1)
where ht(zt)is a function parameterized by a neural net-
work. These iterative updates can be interpreted as an Euler
discretization of a continuous transformation. In contrast to
traditional deep neural networks where δt= 1 is fixed, ODE-
Net [14] introduced a continuous version in which δt→0.
As a result, Eq. (2.1) becomes
dz(t)
dt=h(z(t), t).(2.2)
In this continuous framework, training the networks becomes
to learn the function h(z, t)and next we will discuss how to
learn this function.
First we assume that the function h(z, t)is represented
by a neural network hθ(z, t)parameterized by θ, and we
have observed data at t0and t1, denoted as ˆ
z(t0)and ˆ
z(t1)
respectively. Starting from the input layer ˆ
z(t0), the output
layer z(t1)can be defined by the solution to this ODE initial
value problem at some time t1:
z(t1) = ˆ
z(t0) + Zt1
t0
hθ(z(t), t)dt, (2.3)
and the time from t0to t1is referred to as the integration
time of the data point. Eq. (2.3) can be computed using an
off-the-shelf differential equation solver and we write it as,
z(t1) = ODESolve (ˆ
z(t0), hθ, t0, t1).(2.4)
The network parameters θare computed by iteratively mini-
mizing a prescribed loss function L(ˆ
z(t1),z(t1)), which mea-
sures the difference between the observed data ˆ
z(t1)and
the model prediction z(t1). An interesting feature of this
method is that the gradient of the loss function with respect
to θcan be computed using the adjoint sensitivity method,
which is more memory efficient than directly backpropagat-
ing through the integrator [14].
As has been discussed earlier, ODE-Net allows us to con-
struct a continuous-time model for the crowd dynamics.
Namely, let z(t)represents the state of the crowd at time
tand as a result Eq. (2.2) becomes the governing equation
of the crowd dynamics; suppose that we have observations of
the crowd flow ˆ
z(t), and we can use the training process de-
scribed above to learn the function h(z, t)(or more precisely
its neural network representation hθ(z, t)).
Though the application of ODE-Net to crowd dynamics is
conceptually straightforward, the implementation is highly
challenging. When applied to crowd dynamics, zrepresents
the state of motion of the entire crowd that may consist of
a large number of particles (i.e., pedestrians, and through-
out the paper we use these two terms interchangeably), and
it follows that zcan be of very high dimensions since the
dimensionality of zis proportional to the size of the crowd.
In this case, learning a high-dimensional function h(z, t)can
be prohibitively difficult: it may require a massive amount
of training data which may not be available in practice, and
the computational cost for training such a complex model
can be exceedingly high. In addition, as one can see, in the
formulation described above, the dimensionality of zneeds
to be fixed, which often does not meet the reality, as in most
situations people may enter or leave the scene of interest and
the dimensionality of zvaries over time. More importantly,
as the dimensionality of zis fixed, once the model is learned
from the data, it can only be used to predict systems of the
same number of particles, a serious limitation of the use-
fulness of the method. To address these issues, we propose
to address the dimensionality issue by incorporating the so-
cial force (SF) concept into the ODE-Net method, which is
detailed in Section 2.2.
2.2. Social-force based ODE-Net
Suppose that we consider a crowd of Nparticles and we can
write the state variable z= (z1, ..., zN)Twhere znrepresents
the state of motion of particle nfor each n= 1...N. In partic-
ular we have zn= (xn, vn)where xnand vnare respectively
the position and the velocity of particle n. We also introduce
the notations x= (x1, ..., xN)Tand v= (v1, ..., vN)T. Now
according to the Newton’s second law, model (2.2) can be
re-written as ˙
x
˙
v=v
M−1f,(2.5)
where
f(x,v) =
f1(x,v)
...
fN(x,v)
with fn(x,v)being the force applied to particle nand
M= diag[m1, ..., mN]with mnbeing the “mass” of particle
n. With formulation (2.5), the original ODE-Net problem
is turned into learning the force function f(x,v)and esti-
mating the mass matrix M, where one can see that learning
function f(x,v)is by far the more challenging task here.
It is important to note that in such problems fand M
should not be understood as the usual physical forces and
masses respectively. Rather, following the assumption of the
social force model [19], frepresents the socio-psychological
forces driven by personal motivations and environmental
constraints, and the mass matrix Mcharacterizes how easy
or difficult to change the velocity of each pedestrian. At
this point the force field f(x,v)is still a high dimensional
function for large crowd size N, and further simplification is
needed to make the learning problem feasible.
We now introduce further assumptions to simplify the
force function. First we assume that the total force applied
to each particle/pedestrian consists of two parts:
fn=fmot
n+fint
n,(2.6)
where fmot
nis the force generated by personal motivation to
reach certain desired state of motion, and fint
nis the force
caused by the interactions with other particles and the en-
vironments (e.g., obstacles). The total interaction force is
further written as,
fint
n=
N
X
j(6=n)=1
fp
nj +
W
X
w=1
fo
nw,(2.7)
where fp
nj is the interaction force between pedestrians nand
jand fo
nw between pedestrian nand the w-th obstacles (as-
suming there are Wobstacles in total). We now need to
deal with both the motivation and the interaction forces.
We first assume that the personal motivation force depends
on the particle’s state of motion:
fmot
n=fmot
θ(xn, vn, d),(2.8)
where drepresents some environmental factors that also af-
fect the motivation force, and fmot
θ(·)is an artificial neural
network parametrized by θ. Next we consider the interac-
tion force fint. To this end, it is common to assume that
pedestrians psychologically tend to keep a distance between
2