1 FAS-UNet A Novel FAS-driven Unet to Learn Variational Image Segmentation

2025-04-30 0 0 5.22MB 18 页 10玖币
侵权投诉
1
FAS-UNet: A Novel FAS-driven Unet to Learn
Variational Image Segmentation
Hui Zhu, Shi Shu, and Jianping Zhang
Abstract—Solving variational image segmentation problems
with hidden physics is often expensive and requires different
algorithms and manually tunes model parameter. The deep
learning methods based on the U-Net structure have obtained
outstanding performances in many different medical image
segmentation tasks, but designing such networks requires a lot of
parameters and training data, not always available for practical
problems. In this paper, inspired by traditional multi-phase con-
vexity Mumford-Shah variational model and full approximation
scheme (FAS) solving the nonlinear systems, we propose a novel
variational-model-informed network (denoted as FAS-Unet) that
exploits the model and algorithm priors to extract the multi-scale
features. The proposed model-informed network integrates image
data and mathematical models, and implements them through
learning a few convolution kernels. Based on the variational
theory and FAS algorithm, we first design a feature extraction
sub-network (FAS-Solution module) to solve the model-driven
nonlinear systems, where a skip-connection is employed to fuse
the multi-scale features. Secondly, we further design a convolution
block to fuse the extracted features from the previous stage,
resulting in the final segmentation possibility. Experimental
results on three different medical image segmentation tasks
show that the proposed FAS-Unet is very competitive with other
state-of-the-art methods in qualitative, quantitative and model
complexity evaluations. Moreover, it may also be possible to train
specialized network architectures that automatically satisfy some
of the mathematical and physical laws in other image problems
for better accuracy, faster training and improved generalization.
The code is available at https://github.com/zhuhui100/FASUNet.
Index Terms—Model-informed deep learning; Interpretable
network; Variational image segmentation; Full approximation
scheme.
I. INTRODUCTION
Image segmentation is one of the most important prob-
lems in computer vision and also is a difficult problem in
the medical imaging community [1]–[3]. It has been widely
used in many medical image processing fields such as the
identification of cardiovascular diseases [4], the measurement
This work was supported by the National Natural Science Foundation of
China (NSFC) under Grants 11971414, 11771369, also partly by grants from
Natural Science Foundation of Hunan Province under Grants 2018JJ2375,
2018XK2304, and 2018WK4006. (Corresponding author: Jianping Zhang).
H. Zhu is with the School of Mathematics and Computational Science,
Xiangtan University, and Key Laboratory of Intelligent Computing & Informa-
tion Processing of Ministry of Education (201931000089@smail.xtu.edu.cn.
S. Shu is with the School of Mathematics and Computational Sci-
ence, Xiangtan University, and Hunan Key Laboratory for Computation
and Simulation in Science and Engineering, Xiangtan, 411105, China
(shushi@xtu.edu.cn).
J. Zhang is with the School of Mathematics and Computational Sci-
ence, Xiangtan University, and Hunan National Applied Mathematics Center
(jpzhang@xtu.edu.cn).
of bone and tissue [5], and the extraction of suspicious lesions
to aid radiologists. Therefore, image segmentation has a vital
role in promoting medical image analysis and applications as
a powerful image processing tool [5], [6].
Deep learning (DL) has achieved great success in the field
of medical image segmentation [5], [7], [8]. One of the most
important reasons is that the convolutional neural networks
(CNNs) can effectively extract image features. Therefore,
much work at present involves design a network architecture
with strong feature extraction ability, and many well-known
CNN architectures have been proposed such as UNet [9],
V-Net [10], UNet++ [11], 3D UNet [12], Y-Net [13], Res-
UNet [14], KiU-Net [15], DenseUNet [16], and nnU-Net [17].
More and more studies based on data-driven methods have
been reported for medical image segmentation. Although UNet
and its variants have achieved considerably impressive per-
formance in many medical image segmentation datasets, they
still suffer two limitations. One is that most of researchers
have introduced more parameters to improve the performance
of medical image segmentation, but have tended to ignore the
technical branch of the model’s memory and computational
overhead, which makes it difficult to popularize the algorithm
to industry applications [18]. The other disadvantage is that
these variants only design many suitable architectures through
the researcher’s experience or experiments, but do not focus on
the mathematical theoretical guidance of network architectures
such as explainability, generalizability, etc., which limits the
application of these models and the improvement of task-
driven medical image segmentation methods [19], [20].
Recently, many works on image recognition and image
reconstruction have been focusing on the interpretability of
the network architecture. Inspired by some mathematical
viewpoints, many related unroll networks have been designed
and successfully applied. He et al. [21] proposed the deep
residual learning framework, which utilizes an identity map
to facilitate training; it is well known that it is very similar
to the iterative method solving ordinary differential equations
(ODEs) and also achieves promising performance on image
recognition. G. Larsson et al employed the fractal idea to
design a self-similar FractalNet [22], also discovering that
its architecture is similar to the Runge–Kutta (RK) scheme
in numerical calculations. According to the nature of poly-
nomials, Zhang et al. designed PolyNet [23] by improving
ResNet to strengthen the expressive ability of the network,
and Gomez et al. [24] proposed RevNet by using some ideas
of the dynamic system. Chen et al. [25] analyzed the process
of solving ODEs, then proposed Neural ODE, which further
shows that mathematics and neural networks have a strong
arXiv:2210.15164v2 [cs.CV] 6 Nov 2022
2
Fig. 1. Classical variational image segmentation and model-inspired learning method. (a) The first stage solves the nonlinear differential equations using
classical iterative method, and then the second stage thresholds the smooth solution in the first stage to extract objects. (b) The first stage learns the solution
mapping TK(f;θ1)by optimizing the convolution kernel θ1to extract image features, The second stage learns feature fusion and segmentation thresholding
parameter.
relationship. Meanwhile, He et al. designed a network archi-
tecture for the super-resolution task based on the forward Euler
and RK methods of solving ODEs [18] and achieved good
performance. Sun et al. [26] designed ADMM-Net through the
alternating direction method to learn an image reconstruction
problem. Inspired by a multigrid algorithm for solving inverse
problems, He et al. [27] proposed a learnable classification
network denoted as MgNet to extract image features u,
which uses a few parameters to achieve good performance
on the CIFAR datasets. Alt et al. [28] analyzed the behavior
and mathematical foundations of UNet, and interpreted them
as approximations of continuous nonlinear partial differen-
tial equations (PDEs) by using full approximation schemes
(FASs). Experimental evaluations showed that the proposed
architectures for the denoising and inpainting tasks save half
of the trainable parameters and can thus outperform standard
ones with the same model complexity.
Unfortunately, only a few studies based on model-driven
techniques have been reported for the segmentation task.
In this paper, we mainly focus on the explainable DL frame-
work combining the advantages of the FAS and UNet for
medical image segmentation.
A. Problem
H. Helmholtz proposed that the ill-posed problem of
producing reliable perception from fuzzy signals can be
solved through the process of “unconscious inference” (the
Helmholtz Hypothesis) [29]. This theory implies that hu-
man vision is incomplete and that details are inferred by
the unconscious mind to create a complete image. That is,
our perception system can also integrate the fuzzy evidence
received from the senses into the situation based on its own
environmental model.
Let p(u|f;α)be a probabilistic distribution for feature
representations uof the source image f. The prior probability
of ucan be modeled as the multivariate normal distribution.
In general, ucan be extracted from a given image fby
optimizing the maximum a posteriori (MAP) estimation as
arg max
ulog p(u|f;α),(1)
where αis the environmental parameter in classical “uncon-
scious inference” or the inverse problem, and this problem
leads to the nonlinear system defined by
F(u;α) = b,(2)
where the nonlinear operator F(·;α)is employed to generate
the image b, e.g., b=ATfis a deconvoluted image of fin
the image deblurring problem with a convolution operator A.
We consider that image segmentation refers to a composite
process of feature extraction (6) and feature fusion segmenta-
tion. Here, the fusing process for feature uis defined by
s=S(u;β),(3)
where S(·;β)denotes a fusing segmentation with a fixed
conscious parameter β, and sis the segmentation results or
probability maps.
Such strongly interpretable segmentation models [30]–[32]
are so general that, depending on the amount of well-
predefined sparsity priors of the input image, they have the
advantages of theoretical support and strong convergence.
The total flowchart of classical variational segmentation can
be summarized as shown in Figure 1a. However, they usually
require expensive computations, but also have to face the
problems of the selection of suitable regularizers φ(·)and
model parameters (α,β). Consequently, some reconstructed
results are unsatisfactory.
It is well known that the solution uusually has the multi-
scale property, so a natural idea is to exploit the multi-layer
convolution and multigrid architecture, which can describe
multiscale features to learn u. Based on the above facts,
we propose a two-stage segmentation framework for learning
feature uin Stage 1 and segmentation sin Stage 2, which is
shown in Figure 1b.
B. Contributions
In this work, we focus on analyzing the feature extraction
inverse problem (2) and the feature fusion segmentation (3)
to design an explainable deep learning network. It is well
known that the unrolled iterations of the classical solution
algorithm can be considered as the layers of a neural network,
so we propose a novel FAS-driven UNet (FAS-UNet), which
integrates image data and a multiscale algorithm for solving
the nonlinear inverse problem (7). The major differences with
3
our approach are that MgNet is not a U-shaped architecture
and is only used for image classification, which leads to the
output result not being able to be converted to the segmen-
tation prediction of the input image. Besides, the proposed
network was inspired by the traditional multiphase convexity
Mumford–Shah variational model [30] and FAS algorithm for
solving nonlinear systems [33], which exploits the model and
algorithm priors’ information to extract the image features.
Indeed, the goal of our work is to show that, under some
assumptions about the operators, it is possible to interpret
the smoothing operations of the FAS and image geometric
extracting operations of the variational model as the layers
of a CNN, which in turn, provide fairly specialized network
architectures that allow us to solve the standard nonlinear
system (7) for a specific choice of the parameters involved.
Our main contributions are summarized as follows:
1. We propose a novel variational-model-informed two-
stage image segmentation network (FAS-UNet), where
an explainable and lightweight sub-network for feature
extraction is designed by combining the traditional mul-
tiphase convexity Mumford–Shah variational model and
FAS algorithm for solving nonlinear systems. To the best
of our knowledge, it is the first unrolled architecture
designed based on model and algorithm priors in the
image segmentation community.
2. The proposed model-informed network integrates image
data and mathematical models, and it provides a helpful
viewpoint for designing the image segmentation network
architecture.
3. The proposed architecture can be trained from additional
model information obtained by enforcing some of the
mathematical and physical laws for better accuracy,
faster training, and improved generalization. Extensive
experimental results show that it performs better than
the other state-of-the-art methods.
The rest of the paper is organized as follows. The novel
FAS-UNet framework for solving nonlinear inverse problems
by analyzing variational segmentation theory and the FAS
algorithm is proposed in Section II. We show experimental
results in Section III. Finally, we conclude this work in
Section IV.
II. VARIATIONAL SEGMENTATION VIA THE
CNN FRAMEWORK
The goal of image segmentation is to partition a given image
f: Ω Rinto rregions {i}r
i=1 that contain distinct objects
and satisfy ij=, j 6=i, and Sr
i=1 i= Ω, where the
image domain is a bounded and open subset of R2. Assume
that Γ = Siis the union of boundaries of i,|Γ|, denoting
the arc length of curve Γ.
A. Multiphase Variational Image Segmentation
As mentioned, various ways of variational image segmen-
tation have been proposed. Below, we review a few of them.
1) Variational Image Segmentation: The Mumford–Shah
(M-S) model is a well-known variational image segmentation
method proposed by Mumford and Shah [34], which can be
defined as follows:
min
u,Γτ1Z
(fu)2dx+τ2ZΓ
|∇u|2dx+|Γ|,
where τ1and τ2are the weight parameters. The first term
requires that u: Ω Rapproximates f, the second term that
udoes not vary much on each i, and the third term that the
boundary Γis as short as possible. This shows that uis a
piecewise smooth approximation of f.
In particular, Chan and Vese considered the special case
of the M-S model where the function uis chosen to be a
piecewise constant function; thus, the minimization for two-
phase segmentation is given as
min
Γ,c1,c2
λ1Zinside (Γ)
|fc1|2dx+λ2Zoutside (Γ)
|fc2|2dx+|Γ|,
where c1and c2are the average image intensities inside and
outside of boundary Γ, respectively, and λ1and λ2are the
weight parameters.
Sometimes, the given image is degraded by noise and
problem-related blur operator A. Therefore, Cai et al. [30]
extended the two-stage image segmentation strategy using a
convex variant of the Mumford–Shah model as
min
uW1,2(Ω) Zκ1(f− Au)2+κ2|∇u|2+|∇u|dx,(4)
where κ1and κ2are positive parameters, and the existence
and uniqueness of uwere analyzed in their work.
We assume the image features u= (u1,...,ud)T: Ω
Rd, where ui: ΩiRis a smooth mapping defined on
the tissue or lesion i. In this work, we extend the above
model (4) to the multiphase case, which can deal with d-phase
segmentation (multiple objects), which refers to a two-stage
composite process of feature extraction (6) and feature fusion
segmentation (3).
2) Feature Extraction: The first stage is to extract image
features uby maximizing a posterior probabilistic distribution
(6) for feature representations uof a given image fas
arg max
up(u|f;α) = arg max
ulog p(u|f;α)
= arg max
ulog p(f|u;α)p(u;α)
p(f)
= arg max
ulog p(f|u;α)p(u;α),
(5)
where αis the environmental parameter in classical “uncon-
scious inference” or the inverse problem. Especially, the like-
lihood probability p(f|u;α)and the prior probability p(u;α)
can be modeled as normal distributions, respectively, denoted
by
p(f|u;α)e1
2σ2R(Auf)2d=eγR(Auf)2d,
p(u;α)eλRφ(u)d;
thus, the first stage is to find a smooth approximation uby
minimizing the multiphase generalizability (TS-MCMS) of
(4), which can be rewritten as
min
uW1,2(Ω) Z
(f− Au)2dx+µZ
φ(u)dx,(6)
4
where A:RdRis a convolutional blur operator, φ(u) =
ν|∇u|2+|∇u|is a geometric prior of u, and µ=λ
γ. Hence,
this leads to the nonlinear system as
F(u;α) := ATAuµ∇ · (φ0(u)) = b,(7)
where b=ATfand α= (A,, µ, ν).
3) Feature Fusion Segmentation: Once the features uare
obtained, the segmentation is performed by fusing uproperly
in the second stage; for example, many novel image seg-
mentation methods [30]–[32] have been proposed based on
thresholding the smooth solution u. Then, the fusing process
for feature uis finished in (3).
The model-driven methods introduce prior knowledge re-
garding many desirable mathematical properties of the un-
derlying anatomical structure, such as phase field theory,
Γ-approximation, smoothness, and sparseness. The informed
priors may help to render the segmentation method more
robust and stable. However, these model-inspired methods
generally solve the optimization problem in the image domain,
while the numerical minimization method for the feature
representations uis very slow because the regularization of
the TV-norm, the high dimensionality of u, as well as the
nonlinear relationship between the images and the parameters
pose significant computational challenges. Furthermore, it is
challenging to introduce priors flexibly under different clinical
scenes. These limitations make it hard for purely model-based
segmentation to obtain the solutions efficiently and flexibly.
The goal of this work was to learn powerful solvers of (7)
and (3) to aggregate a variety of mechanisms to address the
medical image segmentation problem efficiently.
B. Proposed Learnable Framework of TS-MCMS Algorithm
We summarized the two-stage algorithm to formulate med-
ical image segmentation based on the TS-MCMS model,
inspired by the CNN architectures of unrolled iterations, and
we propose a learnable framework with two CNN modules on
multiscale feature spaces, FAS-UNet (see Figure 1b), aimed at
learning the nonlinear inverse operators of (7) and (3) in the
context of the variational inverse problem to segment a given
image f.
It is already well known that the unrolled iterations of
many classical algorithms can be considered as the layers of
a neural network [22]–[26]. In this part, we are not interested
in designing another approach for inferring the classes in
MgNet [27], but rather, we aim at extracting the features of a
given image f.
Inspired by the variational segmentation model (6), one
of the key ideas in the proposed architecture is that we
split our framework into a solution module TK(f;θ1)and
a feature fusion module SK(u;θ2), where TK(f;θ1)is the
feature extraction part of the framework (in the multi-stage
case) and SK(·;θ2)is the stage fusion part to be learned.
Therefore, how to design the effective function maps TKfor
approximately solving (7) and SKfor approximating (3) is an
important problem.
This work applies a nonlinear multigrid method to design
FAS-UNet for explainable medical image segmentation by
learning the two following modules:
(u=TK(f;θ1)
s=SK(u;θ2),(8)
where fis an input image, uis the feature maps, and sis
the prediction for the truth partitions, leading to the overall
approximation function as
s=SK(TK(f;θ1); θ2),(9)
where θ1and θ2are parameters to be
learned in the proposed explainable FAS-UNet
architecture.
To understand the approximation ability of the proposed
modules TK(f;θ1)and SK(u;θ2)generated by the FAS-UNet
architecture, we refer the readers to D. Zhou’s work [35],
which answers an open question in CNN learning theory about
how deep CNN can be used to approximate any continuous
function to an arbitrary accuracy when the depth of the neural
network is large enough.
C. FAS-Module for Feature Extraction
In this part, we discuss how the multigrid method can be
used to solve nonlinear problems. The Helmholtz Hypothe-
sis [29] demonstrates that the extracted features can also be
represented by solving the equation:
F(u,α) = b:= ATf, (10)
subject to
min
ukSK(u;θ2)yk,
where Fdenotes the transformation of combining feature u
with a deblurred image b=ATf,uis the unknown features,
and yis the ground-truth of image f. Our starting point is the
traditional FAS algorithm solving (10).
1) The Full Approximation Scheme: The multigrid method
is usually used to solve nonlinear algebraic systems (10).
For simplicity, the parameter αin F(u,α)is omitted when
only the classical FAS algorithm is involved, i.e.,
F(u) = b.(11)
The multigrid ingredients including the error smoothing
and the coarse grid correction ideas are not restricted to the
linear situation, but can be immediately used for the nonlinear
problem itself, which leads to the so-called FAS algorithm.
The fundamental idea of the nonlinear multigrid is the same
as in the linear case, and the FAS method can be recursively
defined on the basis of a two-grid method. We start with the
description of one fine–coarse cycle (finer grid layer `and
coarser grid layer `+ 1) of the nonlinear two-grid method for
solving (11). To proceed, let the fine grid equation be written
as
F`(u`) = b`.
Firstly, we compute an approximation ¯
u`:= u`
mof the
fine grid problem by applying mpre-smoothing steps to u`
as follows
摘要:

1FAS-UNet:ANovelFAS-drivenUnettoLearnVariationalImageSegmentationHuiZhu,ShiShu,andJianpingZhangAbstract—Solvingvariationalimagesegmentationproblemswithhiddenphysicsisoftenexpensiveandrequiresdifferentalgorithmsandmanuallytunesmodelparameter.ThedeeplearningmethodsbasedontheU-Netstructurehaveobtainedo...

展开>> 收起<<
1 FAS-UNet A Novel FAS-driven Unet to Learn Variational Image Segmentation.pdf

共18页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:18 页 大小:5.22MB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 18
客服
关注