parameter set of DFTB. This approach to regularization is problematic because it may overly bias the
training towards the reference parameters and does not prevent non-physical behaviors such as oscillation
of a trained function around the smooth form of the reference function.39 A commonly used approach for
smoothing splines applies a penalty to the magnitude of the second derivative.40,41 However, for DFTBML,
such a smoothing penalty substantially degrades performance of the models because there is no reason to
expect the second derivative to have a limited magnitude.
We instead adapt an approach from Akshay et al.42 which is motivated by the shape of the functions
in reference parameter sets, such as those of Auorg in Figure 2. For the Hamiltonian (H1) matrix elements,
the functions decay smoothly to zero and have an upward curvature. To enforce this behavior, we apply
a “convex” penalty that enforces the second derivative of the trained potentials, evaluated on a dense grid
of 500 points, to have a physically motivated sign. For overlaps (S), there can also be an inflection point
associated with nodes in the atomic orbitals (upper panels of Figure 2). We therefore extend the convex
penalty to allow a single inflection point, whose location is optimized during training. The results indicate
that, although inclusion of an inflection point improves model performance, the results are not sensitive to
its precise location (see Section S12.3 of the Supporting Information). The magnitude of the weighting factor
for these convex penalties does not require fine tuning beyond being large enough to prevent violations of the
constraints without being so large that it leads to numerical instabilities in gradient descent optimization.
The convex penalty successfully removes oscillatory behavior (middle column of Figure 2). However, the
resulting functions exhibit non-physical, piecewise-linear behavior, which is more pronounced in the overlap
integrals but also present in the Hamiltonian matrix elements (see inset in Figure 2).
To remove this piecewise-linear behavior, we apply a “smoothing” penalty to the third derivative, based
on the sum of squares of the third derivative evaluated on a grid of 500 points. Our use of a fifth-order spline
for H1and Sis motivated by the high order needed for the spline to have a continuous third derivative. The
magnitude of the penalty is adjusted to remove the piecewise-linear behavior while minimizing degradation
of the model performance (see Section S12.4 of the Supporting Information). The short-range repulsion (R)
does not exhibit piecewise-linear behavior, so a smoothing penalty is not applied and we use a third-order
spline for R.
It is somewhat surprising, given the highly non-physical behavior observed without regularization, that
the effects of regularization on model performance are not more dramatic (Table 1). For near-transfer, the
performance of the unregularized model (4.97 kcal/mol) is a factor of two better than the Auorg reference
model (10.55 kcal/mol). This is despite the highly oscillatory behavior of the functions and the fact that the
test data and training data have molecules with disjoint empirical formulas. This suggests coupling between
5