Self-Supervised 2D3D Registration for X-Ray to CT Image Fusion Srikrishna Jaganathan12Maximilian Kukla2Jian Wang2Karthik Shetty1Andreas Maier1 1FAU Erlangen-N urnberg Erlangen Germany2Siemens Healthineers AG Forchheim Germany

2025-04-24 0 0 1.29MB 15 页 10玖币

侵权投诉

Self-Supervised 2D/3D Registration for X-Ray to CT Image Fusion

Srikrishna Jaganathan1,2Maximilian Kukla2Jian Wang2Karthik Shetty1Andreas Maier1

1FAU Erlangen-N¨

urnberg, Erlangen, Germany 2Siemens Healthineers AG, Forchheim, Germany

srikrishna.jaganathan@fau.de

Abstract

Deep Learning-based 2D/3D registration enables fast,

robust, and accurate X-ray to CT image fusion when large

annotated paired datasets are available for training. How-

ever, the need for paired CT volume and X-ray images

with ground truth registration limits the applicability in

interventional scenarios. An alternative is to use simu-

lated X-ray projections from CT volumes, thus removing the

need for paired annotated datasets. Deep Neural Networks

trained exclusively on simulated X-ray projections can per-

form signiﬁcantly worse on real X-ray images due to the do-

main gap. We propose a self-supervised 2D/3D registration

framework combining simulated training with unsupervised

feature and pixel space domain adaptation to overcome the

domain gap and eliminate the need for paired annotated

datasets. Our framework achieves a registration accuracy

of 1.83 ±1.16 mm with a high success ratio of 90.1% on

real X-ray images showing a 23.9% increase in success ra-

tio compared to reference annotation-free algorithms.

1. Introduction

Image guidance for minimally invasive interventions is

generally provided using live ﬂuoroscopic X-ray imaging.

The fusion of preoperative Computed Tomography (CT)

volume with the live ﬂuoroscopic image enhances the in-

formation available during the intervention. Spatial align-

ment of the 3D volume on the current patient position is a

prerequisite for accurate fusion with the ﬂuoroscopic im-

age. An optimal spatial alignment between preoperative

CT volume and live ﬂuoroscopic X-ray is estimated with

2D/3D registration. Traditionally, optimization-based tech-

niques have been used for 2D/3D registration in the in-

terventional setting as it provides highly accurate registra-

tion [53, 29, 51]. However, optimization-based techniques

are sensitive to initialization and content mismatch between

X-ray and CT images. Deep Learning (DL)-based 2D/3D

registration techniques have been proposed to overcome the

limitations of the optimization-based techniques by improv-

ing the robustness signiﬁcantly [26, 31, 32, 39], while still

relying on optimization-based techniques as a subsequent

reﬁnement step to match the registration accuracy. Re-

cently, end-to-end DL-driven solutions have been proposed

that can achieve a combination of high registration accuracy

and high robustness with faster computation [20].

Despite the signiﬁcant improvement in learning-based

registration techniques, the interventional application is still

limited due to the lack of generalizability of the learned

networks for different anatomy, interventions, scanner, and

protocol variations [47]. The collection of large-scale an-

notated datasets for all variations is prohibitive since the

data needed for training should be paired along with ground

truth registration. Either a large-scale annotated dataset that

consists of all the different variations or an annotation-free

unpaired training routine based on existing DL-based tech-

nique enables us one step closer to interventional applica-

tion. We focus on the latter, by removing the need for an-

notated paired dataset as this would immediately allow us

to train the current state-of-the-art registration networks for

different variations.

We propose a self-supervised 2D/3D rigid registra-

tion framework to achieve annotation-free unpaired train-

ing with minimal performance drop on real X-ray images

encountered during the interventional application. The

annotation-free unpaired dataset is generated from forward

projections of the CT volumes. Our framework consists

of simulated training combined with unsupervised feature

and pixel space domain adaptation. Our novel task-speciﬁc

feature space domain adaptation is trained in an end-to-

end manner with the registration network. We combine

the recently proposed Barlow Twins [56], adversarial fea-

ture discriminator [7, 22] and DL-based registration net-

work [20]. This allows the features to be robust for different

style variations while also being optimal for the registration

task. Our feature space adaptation adds no computational

cost during inference. We additionally perform unsuper-

vised style transfer of the real X-ray to simulated X-ray

image style using Contrastive Unpaired Translation [36].

We apply the style transfer network during inference, thus

allowing the registration network to operate on the ﬁxed

style already encountered during training. In combination,

arXiv:2210.07611v1 [eess.IV] 14 Oct 2022

our proposed framework achieves a registration accuracy of

1.83 ±1.16 mm with a high success ratio of 90.1% on real

X-ray images showing a 23.9% increase compared to refer-

ence annotation-free techniques.

2. Related Work

We focus our related work discussion speciﬁc to rigid

2D/3D registration for optimization-based and learning-

based 2D/3D registration algorithms. In unsupervised do-

main adaptation, we broadly discuss the methods applied in

medical imaging tasks.

Optimization-Based 2D/3D Registration The problem

of 2D/3D registration for interventional image fusion has

been extensively researched with comprehensive reviews

of the techniques available [27, 29]. Due to the non-

convex nature of the 2D/3D registration problem, global

optimization [9, 13, 35] is required to reach optimal so-

lution. However, the high computational cost of global

optimization-based techniques limits the interventional ap-

plication. Faster techniques using local optimization-based

methods [46, 33, 30, 12, 42] rely on image similarity mea-

sures, making it highly dependent on good initialization.

Point-to-Plane Correspondence (PPC) constraint was pro-

posed [53, 52, 51] as a more robust alternative for com-

puting the 3D motion from the 2D misalignment visi-

ble between the 2D image and the forward projection of

the 3D volume. PPC-based techniques signiﬁcantly im-

prove the registration accuracy and robustness compared

to other optimization-based techniques. Extensions of the

PPC-based technique proposed for multi-view scenario [40]

and hybrid learning-based solutions improve the robustness

signiﬁcantly [39]. Recently, multi-level optimization-based

technique [25] was proposed with normalized gradient ﬁeld

as the image similarity metric, showing further improve-

ment in the registration accuracy.

Learning-Based 2D/3D Registration Initially, learning-

based techniques were targeted to improve the computa-

tional efﬁciency [32] and robustness [31, 26, 39, 14, 8]

of the optimization-based techniques. DL-based tech-

niques signiﬁcantly improve the robustness to initializa-

tion and content mismatch [39, 26, 31]. End-to-end

DL-driven registration [20] has shown improved robust-

ness compared to other learning-based methods [39, 19,

41], while also matching the registration accuracy of the

optimization-based techniques [51] with signiﬁcant im-

provement in computational efﬁciency. Recently, fully

automatic DL-based registration has been proposed [6,

13, 11] that can perform both initialization and registra-

tion. A comprehensive review of the learning-based medi-

cal image registration [15] and the impact of learning-based

2D/3D registration for interventional applications [47] are

available. The advances in DL-based 2D/3D registration

techniques have been propelled by using supervised tech-

niques [20, 31, 26]. The variations in imaging protocol, de-

vice manufacturer, anatomy, and intervention-speciﬁc set-

ting alter the appearance of the acquired images signif-

icantly, preventing the adoption of the DL-based 2D/3D

registration techniques in interventional scenarios. At-

tempts have been made to reduce the number of annotated

data samples required with paired domain adaptation tech-

niques [58, 59]. Simulated X-ray projections generated

from CT volume remove the need for paired annotated data

requirement. However, this leads to a domain gap due to

the variations between the real and simulated X-ray projec-

tion. Unsupervised domain adaptation is required to mini-

mize the drop in performance while not requiring annotated

datasets.

Unsupervised Domain Adaptation The domain gap in-

troduced due to the use of simulated data is bridged either

by improving the realism of the simulated data [54, 48] or

by performing unsupervised domain adaptation to minimize

the domain gap [1, 7, 61, 36, 49, 16, 45, 17]. The use of

simulated X-ray projections is increasing in training DL-

based solutions [11, 3, 57, 55, 60, 43] for various medi-

cal imaging applications. DeepDRR [48], aims to bridge

the domain gap by rendering realistic simulated X-ray pro-

jections which are closer to real X-ray images. Domain

Randomization [45] was recently proposed to bridge the

domain gap problem in learning-based 2D/3D registration

networks [11]. Multiple different styles of simulated X-ray

images are used during training, allowing the network to

be robust to style variations encountered during inference.

However, a patient-speciﬁc retraining step is required on top

of domain randomization [11]. Unsupervised domain adap-

tation for CNN-based 6D pose regression of X-ray images

was proposed in [60]. However, the evaluation ignores the

use of surgical tools and content mismatch which is cru-

cial for the interventional application. Generative Adversar-

ial Network (GAN) [1, 57, 55, 60] and adversarial feature

adaptation techniques [22, 5, 60, 50] have shown promis-

ing results for bridging the domain gap in medical imag-

ing tasks like segmentation [57], reconstruction [55], pose

regression [60], depth estimation [22] and multi-modality

learning [50, 5].

3. Methods

Our proposed method is targeted towards rigid 2D/3D

registration for interventional image fusion between X-ray

(2D) and CT (3D) images as illustrated in Figure 1. The

live X-ray image is acquired from the C-arm system dur-

ing the intervention. The initial overlay depicts the fusion

C-arm

X-ray

source

Detector X-ray

TOpt

Initial Overlay

Registered Overlay

2D/3D Registration

Figure 1: Interventional image fusion for C-arm system,

showing the overlay before and after performing 2D/3D

registration.

of the 3D volume with the 2D image after performing ini-

tialization (either manually or automatically). The regis-

tered overlay depicts the overlay produced after performing

2D/3D registration which spatially transforms the preoper-

ative volume with the patient’s position and orientation. We

give a brief introduction to the 2D/3D registration problem

and the registration framework on top of which we build our

self-supervised method in Section 3.1. Following, we de-

scribe our proposed self-supervised registration technique,

with the different components that are used during train-

ing and inference in Section 3.2. We ﬁnally describe the

training and inference procedure used for our registration

framework in Section 3.3.

3.1. Background

Problem Formulation In 2D/3D registration, the goal is

to ﬁnd an optimal spatial transformation Topt of the volume

Vfrom the observed X-ray projections Ir

fsuch that when

the images are overlaid, there is minimal misalignment. The

problem can be formulated as an optimization problem with

an objective function Fthat minimizes the misalignment as

described in Eq. 1. The X-ray projection Ir

f, the preopera-

tive volume Vand an initial registration estimate Tinit are

given as input to the registration algorithm. Our focus is

on recovering the registration matrix Treg which enables us

to ﬁnd the optimal transformation Topt =TregTinit that

aligns the forward projected 3D volume R(V,T)with the

X-ray projection Ir

argmin

F(Ir

f,R(V,T)) (1)

Forward Projection A common basis for comparison be-

tween the 2D and 3D images is established using the for-

ward projector (rendering) R(V,T), which is used to gen-

erate simulated X-ray projection Imalso referred to as Dig-

itally Reconstructed Radiograph (DRR). The forward pro-

jection from a CT volume to render a DRR is depicted in

Figure 2. The rendering is performed by computing each

Virtual

X-ray

source Virtual

detector

plane

Bone

Projection

Realistic

Projection Random Style Augmentation

Simulated X-ray Projections

Figure 2: Rendering simulated X-ray projection from 3D

CT volume with different styles.

detector pixel’s attenuation response from a virtual X-ray

source analytically using ray tracing [48]. The rendering

can be done using GPUs allowing for real-time computa-

tion [4]. Arbitrary DRRs can be rendered from CT volume

for different viewing angles by altering the position and ori-

entation Tof the virtual X-ray source and detector. The

style of the rendering can be controlled by selecting the de-

sired materials to be rendered from CT volume either using

segmentation or a simple threshold. We show examples of

bone projection and realistic projection styles in Figure 2.

The bone projection uses thresholding to render only the

bones from the CT volume. The realistic projection ren-

ders all the materials present in the CT volume. Addition-

ally, random style augmentations (Figure 2) of the projected

DRR are obtained by adjusting contrast, brightness, invert-

ing, and adding noise.

PPC-Based Registration Framework Point-to-Plane

Correspondence (PPC)-based registration framework [53,

52, 51] constraints the global 3D motion dv from the vis-

ible 2D misalignment using the PPC constraint described

in Eq. 2. The framework requires as input, the 3D volume

V, X-ray image Ir

fand initial registration estimate Tinit.

The contour points wand their gradients gare computed

from Vusing a 3D canny edge detector [53]. The 3D mo-

tion dv is estimated by solving the PPC constraint (Eq. 2).

The motion estimation dv is applied iteratively until con-

vergence [53]. During each motion estimation step, the

previous registration estimate Ti−1is used for rendering

the DRR Im=R(V,Ti−1)with T0=Tinit. Corre-

spondence is estimated for a set of projected contour points

pbetween Imand Ir

f. The corresponding projected con-

tour point p′in Ir

fis used to compute the 2D misalignment

dp =p′−p. The plane normal nin Eq. 2 can be computed

from w,gand dp.

(2)W[n×w,−n]

|{z }

dv = diag(W)nTw

|{z}

The 3D motion dv is in axis-angle representation and

can be directly converted to a 3D transformation matrix Ti

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Self-Supervised2D/3DRegistrationforX-RaytoCTImageFusionSrikrishnaJaganathan1,2MaximilianKukla2JianWang2KarthikShetty1AndreasMaier11FAUErlangen-N¨urnberg,Erlangen,Germany2SiemensHealthineersAG,Forchheim,Germanysrikrishna.jaganathan@fau.deAbstractDeepLearning-based2D/3Dregistrationenablesfast,robust,a...

展开>> 收起<<

Self-Supervised 2D3D Registration for X-Ray to CT Image Fusion Srikrishna Jaganathan12Maximilian Kukla2Jian Wang2Karthik Shetty1Andreas Maier1 1FAU Erlangen-N urnberg Erlangen Germany2Siemens Healthineers AG Forchheim Germany.pdf

共15页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Self-Supervised 2D3D Registration for X-Ray to CT Image Fusion Srikrishna Jaganathan12Maximilian Kukla2Jian Wang2Karthik Shetty1Andreas Maier1 1FAU Erlangen-N urnberg Erlangen Germany2Siemens Healthineers AG Forchheim Germany

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: