our proposed framework achieves a registration accuracy of
1.83 ±1.16 mm with a high success ratio of 90.1% on real
X-ray images showing a 23.9% increase compared to refer-
ence annotation-free techniques.
2. Related Work
We focus our related work discussion specific to rigid
2D/3D registration for optimization-based and learning-
based 2D/3D registration algorithms. In unsupervised do-
main adaptation, we broadly discuss the methods applied in
medical imaging tasks.
Optimization-Based 2D/3D Registration The problem
of 2D/3D registration for interventional image fusion has
been extensively researched with comprehensive reviews
of the techniques available [27, 29]. Due to the non-
convex nature of the 2D/3D registration problem, global
optimization [9, 13, 35] is required to reach optimal so-
lution. However, the high computational cost of global
optimization-based techniques limits the interventional ap-
plication. Faster techniques using local optimization-based
methods [46, 33, 30, 12, 42] rely on image similarity mea-
sures, making it highly dependent on good initialization.
Point-to-Plane Correspondence (PPC) constraint was pro-
posed [53, 52, 51] as a more robust alternative for com-
puting the 3D motion from the 2D misalignment visi-
ble between the 2D image and the forward projection of
the 3D volume. PPC-based techniques significantly im-
prove the registration accuracy and robustness compared
to other optimization-based techniques. Extensions of the
PPC-based technique proposed for multi-view scenario [40]
and hybrid learning-based solutions improve the robustness
significantly [39]. Recently, multi-level optimization-based
technique [25] was proposed with normalized gradient field
as the image similarity metric, showing further improve-
ment in the registration accuracy.
Learning-Based 2D/3D Registration Initially, learning-
based techniques were targeted to improve the computa-
tional efficiency [32] and robustness [31, 26, 39, 14, 8]
of the optimization-based techniques. DL-based tech-
niques significantly improve the robustness to initializa-
tion and content mismatch [39, 26, 31]. End-to-end
DL-driven registration [20] has shown improved robust-
ness compared to other learning-based methods [39, 19,
41], while also matching the registration accuracy of the
optimization-based techniques [51] with significant im-
provement in computational efficiency. Recently, fully
automatic DL-based registration has been proposed [6,
13, 11] that can perform both initialization and registra-
tion. A comprehensive review of the learning-based medi-
cal image registration [15] and the impact of learning-based
2D/3D registration for interventional applications [47] are
available. The advances in DL-based 2D/3D registration
techniques have been propelled by using supervised tech-
niques [20, 31, 26]. The variations in imaging protocol, de-
vice manufacturer, anatomy, and intervention-specific set-
ting alter the appearance of the acquired images signif-
icantly, preventing the adoption of the DL-based 2D/3D
registration techniques in interventional scenarios. At-
tempts have been made to reduce the number of annotated
data samples required with paired domain adaptation tech-
niques [58, 59]. Simulated X-ray projections generated
from CT volume remove the need for paired annotated data
requirement. However, this leads to a domain gap due to
the variations between the real and simulated X-ray projec-
tion. Unsupervised domain adaptation is required to mini-
mize the drop in performance while not requiring annotated
datasets.
Unsupervised Domain Adaptation The domain gap in-
troduced due to the use of simulated data is bridged either
by improving the realism of the simulated data [54, 48] or
by performing unsupervised domain adaptation to minimize
the domain gap [1, 7, 61, 36, 49, 16, 45, 17]. The use of
simulated X-ray projections is increasing in training DL-
based solutions [11, 3, 57, 55, 60, 43] for various medi-
cal imaging applications. DeepDRR [48], aims to bridge
the domain gap by rendering realistic simulated X-ray pro-
jections which are closer to real X-ray images. Domain
Randomization [45] was recently proposed to bridge the
domain gap problem in learning-based 2D/3D registration
networks [11]. Multiple different styles of simulated X-ray
images are used during training, allowing the network to
be robust to style variations encountered during inference.
However, a patient-specific retraining step is required on top
of domain randomization [11]. Unsupervised domain adap-
tation for CNN-based 6D pose regression of X-ray images
was proposed in [60]. However, the evaluation ignores the
use of surgical tools and content mismatch which is cru-
cial for the interventional application. Generative Adversar-
ial Network (GAN) [1, 57, 55, 60] and adversarial feature
adaptation techniques [22, 5, 60, 50] have shown promis-
ing results for bridging the domain gap in medical imag-
ing tasks like segmentation [57], reconstruction [55], pose
regression [60], depth estimation [22] and multi-modality
learning [50, 5].
3. Methods
Our proposed method is targeted towards rigid 2D/3D
registration for interventional image fusion between X-ray
(2D) and CT (3D) images as illustrated in Figure 1. The
live X-ray image is acquired from the C-arm system dur-
ing the intervention. The initial overlay depicts the fusion