
following, we consider only swish activation functions [53]. They correspond to some non-monotonic
version of the classical RELU functions max(0, x), with σ(x) = x/(1 + e−x). Other choices are of
course possible, such as RELU or tanh but in practice the swish nonlinearity gives very good results
and at the same time smoother representations. The last layer Nout is in general linear. The depth
Land care important parameters. The expressivity of Nbecomes better as Land cincreases but
at the same time, it becomes harder to train, ie finding ad-hoc optimal values for θ.
As stated above, we now replace the main and auxiliary fields (u, v, w) in (11) by their NN parametriza-
tions u→ Nu,v→ Nvand w→ Nwwith parameters θu,θv,θw. The cost functional is then
minimized with respect to these parameters: we have thus performed a nonlinear projection onto
the neural network space. This is indeed a very general methodology in machine-learning. The cost
functional is written as
C[θu,θv,θw] = γgCg[θv,θw] + γarcCarc[θu] + Cconstraints + (γbcsCBCs.).(12)
The first term corresponds to the geometrical action in its continuous form:
Cg[θv,θw] = Z1
0
(||Nv||χ||Nv||χ− hNv,Nwiχ)dτ, ||N||χ≡ZDN(t, x) (χN)(t, x)dx1
2
.(13)
An important issue is to insure that when ||·||χis discretized, one still has the properties of a norm,
e.g. definite positive and Cauchy-Schwarz inequality. It is in general trivial (e.g. L2→l2) but it can
be more tricky if χis complicated: a nontrivial case is discussed in subsection 3.6. We now describe
the other functionals in the next subsections.
3.2. Boundary value ansatz
The cost functional (12) must take into account the boundary value (BV) problem, namely that
Nu(0,x) = a(x) and Nu(1,x) = b(x) where a, b are given. There are two strategies: either imposing
these constraints explicitly by penalisation or considering some ansatz for Nuwhich automatically
include them. The second choice means that one uses the general ansatz
U(t, x;θu)=Λa(t)a(x)+Λb(t)b(x)+Λu(t)Nu(t, x;θu),(14)
where Λa(0) = 1, Λa(1) = 0, Λb(0) = 0, Λb(1) = 1, Λu(0) = Λu(1) = 0. In addition, the zeros of the
functions Λ must be only those required at the boundaries τ= 0, τ = 1.
We then replace Nuin (12) by Uinstead whenever it is explicitly needed. In practice, we use
Λa(t)=1−t, Λb(t) = tand Λu(t) = t(1 −t) mimicking a double Taylor expansion. We do not claim
for optimal decision here, as many other choices would work. This one gives very good results in
all the situations we have met. It is preferred over the first approach (BV penalization) when it is
difficult for the system to relax on aand b.
3.3. Arclength condition
Although the geometrical action does not depend on the time parametrization chosen due to its
homogeneity, it is important in practice to restrict the problem. As a matter of fact, in the cost
functional landscape, there is an infinity of possible solutions, each having its own parametrization.
Fixing an arclength condition, say ||˙u|| =c, ∀s∈[0,1] is just a matter of convenience. But more
importantly, it stabilizes the gradient search preventing the NNs to drift towards ill-conditioned
parametrizations, especially in the context of PDEs. The use of a small penalisation parameter
γarc 1 is enough to prevent NNs to explore extreme landscape regions. The penalty constraint
after straightforward algebra is
Carc[θu] = Z1
0|| ˙
U||2ds −Z1
0|| ˙
U|| ds2
≥0,(15)
where Utakes the form (14). Note that the norm used is user-defined rather than the actual χor
χ−1norm.
5