to an identity transform, which cannot preserve topology4for large organ displacements.
Deep learning image registration (DLIR) methods6,12,17,31 are often faster and more accurate
than iterative registration methods because they directly compute the diffeomorphic trans-
formation between images in a single step instead of solving a non-linear optimization to
align every image pair. DLIR methods, which typically use stationary velocity field (SVF)
to compute the diffeomorphic transformation, reduce the computational requirements by
reducing the search space to a set of diffeomorphisms that are within a Lie structure, but
it also limits their flexibility in handling large and complex deformations24. Furthermore,
DLIR network optimization as well as iterative registration optimization typically focuses
on minimizing an energy function composed of global smoothness and a variational intensity
regularization, which is insufficient to handle abrupt and large differences in motion occur-
ring between organs, especially at the organ boundaries13.
Compositional DIR strategies that extract diffeomorphic transformation in stages such as
used for non-sliding and sliding organs13, adaptive anisotropic filtering of the incrementally
refined deformation vector field (DVF)25 as well as cascaded network formulations4,10,35
are more robust than the single step methods for handling large deformations while still
retaining the SVF assumption, thus providing computational speed up compared to the
time-dependent diffeomorphic registration methods. However, cascaded DLIR methods are
limited by the memory requirements and thus require sequential training of individual net-
works, which increases training time. Additionally, because the networks in the cascade
are trained one after another, there is no guarantee that the deformations modeled in the
prior steps will be retained in the future steps. Recurrent registration method (R2N2)
that computes local parameterized Gaussian deformations27 has demonstrated ability to
model large anatomic deformations occurring in a respiratory cycle. However, the use of
local parametrization restricts the flexibility of this approach to handle large and contin-
uous deformations. R2N2 was shown to be less accurate than a progressive registration
method computing a continuous deformation flow field for quantifying longitudinal tumor
volume changes22. Our approach improves on these works to compute topology preserving
(quantified by non-negative Jacobian determinant) diffeomorphic deformations and multi-
organ segmentations using a progressive joint registration-segmentation (ProRSeg) approach,
wherein deformation flow computed at a given step is conditioned on the prior step using a
3D convolutional long short term memory network (CLSTM)28. Because the DVF is mod-
eled as a continuous and differentiable flow-field, it is invertible, thus ensuring diffeomorphic
transformations. ProRSeg is optimized using a multi-tasked learning of a registration and
segmentation network, which allows it to leverage the implicit backpropagated errors from
the two networks. Multi-tasked networks have previously shown to produce more accurate
normal tissue segmentation than individually trained DL networks7,12,17,31.
ProRSeg is most similar to a prior registration-segmentation method that we developed for
tracking lung tumors22 from cone-beam CT (CBCT) images. However, ProRSeg accounts for
both respiratory and large organ shape variations, while our prior work was only concerned
with tracking linearly shrinking tumors during radiation treatment. ProRSeg aligns images
with large deformation by computing a smooth interpolated sequence of dense deformation
flows using 3D CLSTM28 networks implemented in the encoder layers of registration and
segmentation networks. The CLSTM explicitly enforces consistency between the individ-
3