Self-Supervised 2D3D Registration for X-Ray to CT Image Fusion Srikrishna Jaganathan12Maximilian Kukla2Jian Wang2Karthik Shetty1Andreas Maier1 1FAU Erlangen-N urnberg Erlangen Germany2Siemens Healthineers AG Forchheim Germany

2025-04-24 0 0 1.29MB 15 页 10玖币
侵权投诉
Self-Supervised 2D/3D Registration for X-Ray to CT Image Fusion
Srikrishna Jaganathan1,2Maximilian Kukla2Jian Wang2Karthik Shetty1Andreas Maier1
1FAU Erlangen-N¨
urnberg, Erlangen, Germany 2Siemens Healthineers AG, Forchheim, Germany
srikrishna.jaganathan@fau.de
Abstract
Deep Learning-based 2D/3D registration enables fast,
robust, and accurate X-ray to CT image fusion when large
annotated paired datasets are available for training. How-
ever, the need for paired CT volume and X-ray images
with ground truth registration limits the applicability in
interventional scenarios. An alternative is to use simu-
lated X-ray projections from CT volumes, thus removing the
need for paired annotated datasets. Deep Neural Networks
trained exclusively on simulated X-ray projections can per-
form significantly worse on real X-ray images due to the do-
main gap. We propose a self-supervised 2D/3D registration
framework combining simulated training with unsupervised
feature and pixel space domain adaptation to overcome the
domain gap and eliminate the need for paired annotated
datasets. Our framework achieves a registration accuracy
of 1.83 ±1.16 mm with a high success ratio of 90.1% on
real X-ray images showing a 23.9% increase in success ra-
tio compared to reference annotation-free algorithms.
1. Introduction
Image guidance for minimally invasive interventions is
generally provided using live fluoroscopic X-ray imaging.
The fusion of preoperative Computed Tomography (CT)
volume with the live fluoroscopic image enhances the in-
formation available during the intervention. Spatial align-
ment of the 3D volume on the current patient position is a
prerequisite for accurate fusion with the fluoroscopic im-
age. An optimal spatial alignment between preoperative
CT volume and live fluoroscopic X-ray is estimated with
2D/3D registration. Traditionally, optimization-based tech-
niques have been used for 2D/3D registration in the in-
terventional setting as it provides highly accurate registra-
tion [53, 29, 51]. However, optimization-based techniques
are sensitive to initialization and content mismatch between
X-ray and CT images. Deep Learning (DL)-based 2D/3D
registration techniques have been proposed to overcome the
limitations of the optimization-based techniques by improv-
ing the robustness significantly [26, 31, 32, 39], while still
relying on optimization-based techniques as a subsequent
refinement step to match the registration accuracy. Re-
cently, end-to-end DL-driven solutions have been proposed
that can achieve a combination of high registration accuracy
and high robustness with faster computation [20].
Despite the significant improvement in learning-based
registration techniques, the interventional application is still
limited due to the lack of generalizability of the learned
networks for different anatomy, interventions, scanner, and
protocol variations [47]. The collection of large-scale an-
notated datasets for all variations is prohibitive since the
data needed for training should be paired along with ground
truth registration. Either a large-scale annotated dataset that
consists of all the different variations or an annotation-free
unpaired training routine based on existing DL-based tech-
nique enables us one step closer to interventional applica-
tion. We focus on the latter, by removing the need for an-
notated paired dataset as this would immediately allow us
to train the current state-of-the-art registration networks for
different variations.
We propose a self-supervised 2D/3D rigid registra-
tion framework to achieve annotation-free unpaired train-
ing with minimal performance drop on real X-ray images
encountered during the interventional application. The
annotation-free unpaired dataset is generated from forward
projections of the CT volumes. Our framework consists
of simulated training combined with unsupervised feature
and pixel space domain adaptation. Our novel task-specific
feature space domain adaptation is trained in an end-to-
end manner with the registration network. We combine
the recently proposed Barlow Twins [56], adversarial fea-
ture discriminator [7, 22] and DL-based registration net-
work [20]. This allows the features to be robust for different
style variations while also being optimal for the registration
task. Our feature space adaptation adds no computational
cost during inference. We additionally perform unsuper-
vised style transfer of the real X-ray to simulated X-ray
image style using Contrastive Unpaired Translation [36].
We apply the style transfer network during inference, thus
allowing the registration network to operate on the fixed
style already encountered during training. In combination,
arXiv:2210.07611v1 [eess.IV] 14 Oct 2022
our proposed framework achieves a registration accuracy of
1.83 ±1.16 mm with a high success ratio of 90.1% on real
X-ray images showing a 23.9% increase compared to refer-
ence annotation-free techniques.
2. Related Work
We focus our related work discussion specific to rigid
2D/3D registration for optimization-based and learning-
based 2D/3D registration algorithms. In unsupervised do-
main adaptation, we broadly discuss the methods applied in
medical imaging tasks.
Optimization-Based 2D/3D Registration The problem
of 2D/3D registration for interventional image fusion has
been extensively researched with comprehensive reviews
of the techniques available [27, 29]. Due to the non-
convex nature of the 2D/3D registration problem, global
optimization [9, 13, 35] is required to reach optimal so-
lution. However, the high computational cost of global
optimization-based techniques limits the interventional ap-
plication. Faster techniques using local optimization-based
methods [46, 33, 30, 12, 42] rely on image similarity mea-
sures, making it highly dependent on good initialization.
Point-to-Plane Correspondence (PPC) constraint was pro-
posed [53, 52, 51] as a more robust alternative for com-
puting the 3D motion from the 2D misalignment visi-
ble between the 2D image and the forward projection of
the 3D volume. PPC-based techniques significantly im-
prove the registration accuracy and robustness compared
to other optimization-based techniques. Extensions of the
PPC-based technique proposed for multi-view scenario [40]
and hybrid learning-based solutions improve the robustness
significantly [39]. Recently, multi-level optimization-based
technique [25] was proposed with normalized gradient field
as the image similarity metric, showing further improve-
ment in the registration accuracy.
Learning-Based 2D/3D Registration Initially, learning-
based techniques were targeted to improve the computa-
tional efficiency [32] and robustness [31, 26, 39, 14, 8]
of the optimization-based techniques. DL-based tech-
niques significantly improve the robustness to initializa-
tion and content mismatch [39, 26, 31]. End-to-end
DL-driven registration [20] has shown improved robust-
ness compared to other learning-based methods [39, 19,
41], while also matching the registration accuracy of the
optimization-based techniques [51] with significant im-
provement in computational efficiency. Recently, fully
automatic DL-based registration has been proposed [6,
13, 11] that can perform both initialization and registra-
tion. A comprehensive review of the learning-based medi-
cal image registration [15] and the impact of learning-based
2D/3D registration for interventional applications [47] are
available. The advances in DL-based 2D/3D registration
techniques have been propelled by using supervised tech-
niques [20, 31, 26]. The variations in imaging protocol, de-
vice manufacturer, anatomy, and intervention-specific set-
ting alter the appearance of the acquired images signif-
icantly, preventing the adoption of the DL-based 2D/3D
registration techniques in interventional scenarios. At-
tempts have been made to reduce the number of annotated
data samples required with paired domain adaptation tech-
niques [58, 59]. Simulated X-ray projections generated
from CT volume remove the need for paired annotated data
requirement. However, this leads to a domain gap due to
the variations between the real and simulated X-ray projec-
tion. Unsupervised domain adaptation is required to mini-
mize the drop in performance while not requiring annotated
datasets.
Unsupervised Domain Adaptation The domain gap in-
troduced due to the use of simulated data is bridged either
by improving the realism of the simulated data [54, 48] or
by performing unsupervised domain adaptation to minimize
the domain gap [1, 7, 61, 36, 49, 16, 45, 17]. The use of
simulated X-ray projections is increasing in training DL-
based solutions [11, 3, 57, 55, 60, 43] for various medi-
cal imaging applications. DeepDRR [48], aims to bridge
the domain gap by rendering realistic simulated X-ray pro-
jections which are closer to real X-ray images. Domain
Randomization [45] was recently proposed to bridge the
domain gap problem in learning-based 2D/3D registration
networks [11]. Multiple different styles of simulated X-ray
images are used during training, allowing the network to
be robust to style variations encountered during inference.
However, a patient-specific retraining step is required on top
of domain randomization [11]. Unsupervised domain adap-
tation for CNN-based 6D pose regression of X-ray images
was proposed in [60]. However, the evaluation ignores the
use of surgical tools and content mismatch which is cru-
cial for the interventional application. Generative Adversar-
ial Network (GAN) [1, 57, 55, 60] and adversarial feature
adaptation techniques [22, 5, 60, 50] have shown promis-
ing results for bridging the domain gap in medical imag-
ing tasks like segmentation [57], reconstruction [55], pose
regression [60], depth estimation [22] and multi-modality
learning [50, 5].
3. Methods
Our proposed method is targeted towards rigid 2D/3D
registration for interventional image fusion between X-ray
(2D) and CT (3D) images as illustrated in Figure 1. The
live X-ray image is acquired from the C-arm system dur-
ing the intervention. The initial overlay depicts the fusion
C-arm
X-ray
source
Detector X-ray
TOpt
Initial Overlay
Registered Overlay
2D/3D Registration
Figure 1: Interventional image fusion for C-arm system,
showing the overlay before and after performing 2D/3D
registration.
of the 3D volume with the 2D image after performing ini-
tialization (either manually or automatically). The regis-
tered overlay depicts the overlay produced after performing
2D/3D registration which spatially transforms the preoper-
ative volume with the patient’s position and orientation. We
give a brief introduction to the 2D/3D registration problem
and the registration framework on top of which we build our
self-supervised method in Section 3.1. Following, we de-
scribe our proposed self-supervised registration technique,
with the different components that are used during train-
ing and inference in Section 3.2. We finally describe the
training and inference procedure used for our registration
framework in Section 3.3.
3.1. Background
Problem Formulation In 2D/3D registration, the goal is
to find an optimal spatial transformation Topt of the volume
Vfrom the observed X-ray projections Ir
fsuch that when
the images are overlaid, there is minimal misalignment. The
problem can be formulated as an optimization problem with
an objective function Fthat minimizes the misalignment as
described in Eq. 1. The X-ray projection Ir
f, the preopera-
tive volume Vand an initial registration estimate Tinit are
given as input to the registration algorithm. Our focus is
on recovering the registration matrix Treg which enables us
to find the optimal transformation Topt =TregTinit that
aligns the forward projected 3D volume R(V,T)with the
X-ray projection Ir
f.
argmin
T
F(Ir
f,R(V,T)) (1)
Forward Projection A common basis for comparison be-
tween the 2D and 3D images is established using the for-
ward projector (rendering) R(V,T), which is used to gen-
erate simulated X-ray projection Imalso referred to as Dig-
itally Reconstructed Radiograph (DRR). The forward pro-
jection from a CT volume to render a DRR is depicted in
Figure 2. The rendering is performed by computing each
Virtual
X-ray
source Virtual
detector
plane
Bone
Projection
Realistic
Projection Random Style Augmentation
Simulated X-ray Projections
Figure 2: Rendering simulated X-ray projection from 3D
CT volume with different styles.
detector pixel’s attenuation response from a virtual X-ray
source analytically using ray tracing [48]. The rendering
can be done using GPUs allowing for real-time computa-
tion [4]. Arbitrary DRRs can be rendered from CT volume
for different viewing angles by altering the position and ori-
entation Tof the virtual X-ray source and detector. The
style of the rendering can be controlled by selecting the de-
sired materials to be rendered from CT volume either using
segmentation or a simple threshold. We show examples of
bone projection and realistic projection styles in Figure 2.
The bone projection uses thresholding to render only the
bones from the CT volume. The realistic projection ren-
ders all the materials present in the CT volume. Addition-
ally, random style augmentations (Figure 2) of the projected
DRR are obtained by adjusting contrast, brightness, invert-
ing, and adding noise.
PPC-Based Registration Framework Point-to-Plane
Correspondence (PPC)-based registration framework [53,
52, 51] constraints the global 3D motion dv from the vis-
ible 2D misalignment using the PPC constraint described
in Eq. 2. The framework requires as input, the 3D volume
V, X-ray image Ir
fand initial registration estimate Tinit.
The contour points wand their gradients gare computed
from Vusing a 3D canny edge detector [53]. The 3D mo-
tion dv is estimated by solving the PPC constraint (Eq. 2).
The motion estimation dv is applied iteratively until con-
vergence [53]. During each motion estimation step, the
previous registration estimate Ti1is used for rendering
the DRR Im=R(V,Ti1)with T0=Tinit. Corre-
spondence is estimated for a set of projected contour points
pbetween Imand Ir
f. The corresponding projected con-
tour point pin Ir
fis used to compute the 2D misalignment
dp =pp. The plane normal nin Eq. 2 can be computed
from w,gand dp.
(2)W[n×w,n]
|{z }
A
dv = diag(W)nTw
|{z}
b
The 3D motion dv is in axis-angle representation and
can be directly converted to a 3D transformation matrix Ti
摘要:

Self-Supervised2D/3DRegistrationforX-RaytoCTImageFusionSrikrishnaJaganathan1,2MaximilianKukla2JianWang2KarthikShetty1AndreasMaier11FAUErlangen-N¨urnberg,Erlangen,Germany2SiemensHealthineersAG,Forchheim,Germanysrikrishna.jaganathan@fau.deAbstractDeepLearning-based2D/3Dregistrationenablesfast,robust,a...

展开>> 收起<<
Self-Supervised 2D3D Registration for X-Ray to CT Image Fusion Srikrishna Jaganathan12Maximilian Kukla2Jian Wang2Karthik Shetty1Andreas Maier1 1FAU Erlangen-N urnberg Erlangen Germany2Siemens Healthineers AG Forchheim Germany.pdf

共15页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:15 页 大小:1.29MB 格式:PDF 时间:2025-04-24

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 15
客服
关注