1 FAS-UNet A Novel FAS-driven Unet to Learn Variational Image Segmentation

2025-04-30 0 0 5.22MB 18 页 10玖币

侵权投诉

FAS-UNet: A Novel FAS-driven Unet to Learn

Variational Image Segmentation

Hui Zhu, Shi Shu, and Jianping Zhang

Abstract—Solving variational image segmentation problems

with hidden physics is often expensive and requires different

algorithms and manually tunes model parameter. The deep

learning methods based on the U-Net structure have obtained

outstanding performances in many different medical image

segmentation tasks, but designing such networks requires a lot of

parameters and training data, not always available for practical

problems. In this paper, inspired by traditional multi-phase con-

vexity Mumford-Shah variational model and full approximation

scheme (FAS) solving the nonlinear systems, we propose a novel

variational-model-informed network (denoted as FAS-Unet) that

exploits the model and algorithm priors to extract the multi-scale

features. The proposed model-informed network integrates image

data and mathematical models, and implements them through

learning a few convolution kernels. Based on the variational

theory and FAS algorithm, we ﬁrst design a feature extraction

sub-network (FAS-Solution module) to solve the model-driven

nonlinear systems, where a skip-connection is employed to fuse

the multi-scale features. Secondly, we further design a convolution

block to fuse the extracted features from the previous stage,

resulting in the ﬁnal segmentation possibility. Experimental

results on three different medical image segmentation tasks

show that the proposed FAS-Unet is very competitive with other

state-of-the-art methods in qualitative, quantitative and model

complexity evaluations. Moreover, it may also be possible to train

specialized network architectures that automatically satisfy some

of the mathematical and physical laws in other image problems

for better accuracy, faster training and improved generalization.

The code is available at https://github.com/zhuhui100/FASUNet.

Index Terms—Model-informed deep learning; Interpretable

network; Variational image segmentation; Full approximation

scheme.

I. INTRODUCTION

Image segmentation is one of the most important prob-

lems in computer vision and also is a difﬁcult problem in

the medical imaging community [1]–[3]. It has been widely

used in many medical image processing ﬁelds such as the

identiﬁcation of cardiovascular diseases [4], the measurement

This work was supported by the National Natural Science Foundation of

China (NSFC) under Grants 11971414, 11771369, also partly by grants from

Natural Science Foundation of Hunan Province under Grants 2018JJ2375,

2018XK2304, and 2018WK4006. (Corresponding author: Jianping Zhang).

H. Zhu is with the School of Mathematics and Computational Science,

Xiangtan University, and Key Laboratory of Intelligent Computing & Informa-

tion Processing of Ministry of Education (201931000089@smail.xtu.edu.cn.

S. Shu is with the School of Mathematics and Computational Sci-

ence, Xiangtan University, and Hunan Key Laboratory for Computation

and Simulation in Science and Engineering, Xiangtan, 411105, China

(shushi@xtu.edu.cn).

J. Zhang is with the School of Mathematics and Computational Sci-

ence, Xiangtan University, and Hunan National Applied Mathematics Center

(jpzhang@xtu.edu.cn).

of bone and tissue [5], and the extraction of suspicious lesions

to aid radiologists. Therefore, image segmentation has a vital

role in promoting medical image analysis and applications as

a powerful image processing tool [5], [6].

Deep learning (DL) has achieved great success in the ﬁeld

of medical image segmentation [5], [7], [8]. One of the most

important reasons is that the convolutional neural networks

(CNNs) can effectively extract image features. Therefore,

much work at present involves design a network architecture

with strong feature extraction ability, and many well-known

CNN architectures have been proposed such as UNet [9],

V-Net [10], UNet++ [11], 3D UNet [12], Y-Net [13], Res-

UNet [14], KiU-Net [15], DenseUNet [16], and nnU-Net [17].

More and more studies based on data-driven methods have

been reported for medical image segmentation. Although UNet

and its variants have achieved considerably impressive per-

formance in many medical image segmentation datasets, they

still suffer two limitations. One is that most of researchers

have introduced more parameters to improve the performance

of medical image segmentation, but have tended to ignore the

technical branch of the model’s memory and computational

overhead, which makes it difﬁcult to popularize the algorithm

to industry applications [18]. The other disadvantage is that

these variants only design many suitable architectures through

the researcher’s experience or experiments, but do not focus on

the mathematical theoretical guidance of network architectures

such as explainability, generalizability, etc., which limits the

application of these models and the improvement of task-

driven medical image segmentation methods [19], [20].

Recently, many works on image recognition and image

reconstruction have been focusing on the interpretability of

the network architecture. Inspired by some mathematical

viewpoints, many related unroll networks have been designed

and successfully applied. He et al. [21] proposed the deep

residual learning framework, which utilizes an identity map

to facilitate training; it is well known that it is very similar

to the iterative method solving ordinary differential equations

(ODEs) and also achieves promising performance on image

recognition. G. Larsson et al employed the fractal idea to

design a self-similar FractalNet [22], also discovering that

its architecture is similar to the Runge–Kutta (RK) scheme

in numerical calculations. According to the nature of poly-

nomials, Zhang et al. designed PolyNet [23] by improving

ResNet to strengthen the expressive ability of the network,

and Gomez et al. [24] proposed RevNet by using some ideas

of the dynamic system. Chen et al. [25] analyzed the process

of solving ODEs, then proposed Neural ODE, which further

shows that mathematics and neural networks have a strong

arXiv:2210.15164v2 [cs.CV] 6 Nov 2022

Fig. 1. Classical variational image segmentation and model-inspired learning method. (a) The ﬁrst stage solves the nonlinear differential equations using

classical iterative method, and then the second stage thresholds the smooth solution in the ﬁrst stage to extract objects. (b) The ﬁrst stage learns the solution

mapping TK(f;θ1)by optimizing the convolution kernel θ1to extract image features, The second stage learns feature fusion and segmentation thresholding

parameter.

relationship. Meanwhile, He et al. designed a network archi-

tecture for the super-resolution task based on the forward Euler

and RK methods of solving ODEs [18] and achieved good

performance. Sun et al. [26] designed ADMM-Net through the

alternating direction method to learn an image reconstruction

problem. Inspired by a multigrid algorithm for solving inverse

problems, He et al. [27] proposed a learnable classiﬁcation

network denoted as MgNet to extract image features u,

which uses a few parameters to achieve good performance

on the CIFAR datasets. Alt et al. [28] analyzed the behavior

and mathematical foundations of UNet, and interpreted them

as approximations of continuous nonlinear partial differen-

tial equations (PDEs) by using full approximation schemes

(FASs). Experimental evaluations showed that the proposed

architectures for the denoising and inpainting tasks save half

of the trainable parameters and can thus outperform standard

ones with the same model complexity.

Unfortunately, only a few studies based on model-driven

techniques have been reported for the segmentation task.

In this paper, we mainly focus on the explainable DL frame-

work combining the advantages of the FAS and UNet for

medical image segmentation.

A. Problem

H. Helmholtz proposed that the ill-posed problem of

producing reliable perception from fuzzy signals can be

solved through the process of “unconscious inference” (the

Helmholtz Hypothesis) [29]. This theory implies that hu-

man vision is incomplete and that details are inferred by

the unconscious mind to create a complete image. That is,

our perception system can also integrate the fuzzy evidence

received from the senses into the situation based on its own

environmental model.

Let p(u|f;α)be a probabilistic distribution for feature

representations uof the source image f. The prior probability

of ucan be modeled as the multivariate normal distribution.

In general, ucan be extracted from a given image fby

optimizing the maximum a posteriori (MAP) estimation as

arg max

ulog p(u|f;α),(1)

where αis the environmental parameter in classical “uncon-

scious inference” or the inverse problem, and this problem

leads to the nonlinear system deﬁned by

F(u;α) = b,(2)

where the nonlinear operator F(·;α)is employed to generate

the image b, e.g., b=ATfis a deconvoluted image of fin

the image deblurring problem with a convolution operator A.

We consider that image segmentation refers to a composite

process of feature extraction (6) and feature fusion segmenta-

tion. Here, the fusing process for feature uis deﬁned by

s=S(u;β),(3)

where S(·;β)denotes a fusing segmentation with a ﬁxed

conscious parameter β, and sis the segmentation results or

probability maps.

Such strongly interpretable segmentation models [30]–[32]

are so general that, depending on the amount of well-

predeﬁned sparsity priors of the input image, they have the

advantages of theoretical support and strong convergence.

The total ﬂowchart of classical variational segmentation can

be summarized as shown in Figure 1a. However, they usually

require expensive computations, but also have to face the

problems of the selection of suitable regularizers φ(·)and

model parameters (α,β). Consequently, some reconstructed

results are unsatisfactory.

It is well known that the solution uusually has the multi-

scale property, so a natural idea is to exploit the multi-layer

convolution and multigrid architecture, which can describe

multiscale features to learn u. Based on the above facts,

we propose a two-stage segmentation framework for learning

feature uin Stage 1 and segmentation sin Stage 2, which is

shown in Figure 1b.

B. Contributions

In this work, we focus on analyzing the feature extraction

inverse problem (2) and the feature fusion segmentation (3)

to design an explainable deep learning network. It is well

known that the unrolled iterations of the classical solution

algorithm can be considered as the layers of a neural network,

so we propose a novel FAS-driven UNet (FAS-UNet), which

integrates image data and a multiscale algorithm for solving

the nonlinear inverse problem (7). The major differences with

our approach are that MgNet is not a U-shaped architecture

and is only used for image classiﬁcation, which leads to the

output result not being able to be converted to the segmen-

tation prediction of the input image. Besides, the proposed

network was inspired by the traditional multiphase convexity

Mumford–Shah variational model [30] and FAS algorithm for

solving nonlinear systems [33], which exploits the model and

algorithm priors’ information to extract the image features.

Indeed, the goal of our work is to show that, under some

assumptions about the operators, it is possible to interpret

the smoothing operations of the FAS and image geometric

extracting operations of the variational model as the layers

of a CNN, which in turn, provide fairly specialized network

architectures that allow us to solve the standard nonlinear

system (7) for a speciﬁc choice of the parameters involved.

Our main contributions are summarized as follows:

1. We propose a novel variational-model-informed two-

stage image segmentation network (FAS-UNet), where

an explainable and lightweight sub-network for feature

extraction is designed by combining the traditional mul-

tiphase convexity Mumford–Shah variational model and

FAS algorithm for solving nonlinear systems. To the best

of our knowledge, it is the ﬁrst unrolled architecture

designed based on model and algorithm priors in the

image segmentation community.

2. The proposed model-informed network integrates image

data and mathematical models, and it provides a helpful

viewpoint for designing the image segmentation network

architecture.

3. The proposed architecture can be trained from additional

model information obtained by enforcing some of the

mathematical and physical laws for better accuracy,

faster training, and improved generalization. Extensive

experimental results show that it performs better than

the other state-of-the-art methods.

The rest of the paper is organized as follows. The novel

FAS-UNet framework for solving nonlinear inverse problems

by analyzing variational segmentation theory and the FAS

algorithm is proposed in Section II. We show experimental

results in Section III. Finally, we conclude this work in

Section IV.

II. VARIATIONAL SEGMENTATION VIA THE

CNN FRAMEWORK

The goal of image segmentation is to partition a given image

f: Ω →Rinto rregions {Ωi}r

i=1 that contain distinct objects

and satisfy Ωi∩Ωj=∅, j 6=i, and Sr

i=1 Ωi= Ω, where the

image domain Ωis a bounded and open subset of R2. Assume

that Γ = S∂Ωiis the union of boundaries of Ωi,|Γ|, denoting

the arc length of curve Γ.

A. Multiphase Variational Image Segmentation

As mentioned, various ways of variational image segmen-

tation have been proposed. Below, we review a few of them.

1) Variational Image Segmentation: The Mumford–Shah

(M-S) model is a well-known variational image segmentation

method proposed by Mumford and Shah [34], which can be

deﬁned as follows:

min

u,Γτ1ZΩ

(f−u)2dx+τ2ZΩ−Γ

|∇u|2dx+|Γ|,

where τ1and τ2are the weight parameters. The ﬁrst term

requires that u: Ω →Rapproximates f, the second term that

udoes not vary much on each Ωi, and the third term that the

boundary Γis as short as possible. This shows that uis a

piecewise smooth approximation of f.

In particular, Chan and Vese considered the special case

of the M-S model where the function uis chosen to be a

piecewise constant function; thus, the minimization for two-

phase segmentation is given as

min

Γ,c1,c2

λ1Zinside (Γ)

|f−c1|2dx+λ2Zoutside (Γ)

|f−c2|2dx+|Γ|,

where c1and c2are the average image intensities inside and

outside of boundary Γ, respectively, and λ1and λ2are the

weight parameters.

Sometimes, the given image is degraded by noise and

problem-related blur operator A. Therefore, Cai et al. [30]

extended the two-stage image segmentation strategy using a

convex variant of the Mumford–Shah model as

min

u∈W1,2(Ω) ZΩκ1(f− Au)2+κ2|∇u|2+|∇u|dx,(4)

where κ1and κ2are positive parameters, and the existence

and uniqueness of uwere analyzed in their work.

We assume the image features u= (u1,...,ud)T: Ω →

Rd, where ui: Ωi→Ris a smooth mapping deﬁned on

the tissue or lesion Ωi. In this work, we extend the above

model (4) to the multiphase case, which can deal with d-phase

segmentation (multiple objects), which refers to a two-stage

composite process of feature extraction (6) and feature fusion

segmentation (3).

2) Feature Extraction: The ﬁrst stage is to extract image

features uby maximizing a posterior probabilistic distribution

(6) for feature representations uof a given image fas

arg max

up(u|f;α) = arg max

ulog p(u|f;α)

= arg max

ulog p(f|u;α)p(u;α)

p(f)

= arg max

ulog p(f|u;α)p(u;α),

(5)

where αis the environmental parameter in classical “uncon-

scious inference” or the inverse problem. Especially, the like-

lihood probability p(f|u;α)and the prior probability p(u;α)

can be modeled as normal distributions, respectively, denoted

p(f|u;α)∝e−1

2σ2RΩ(Au−f)2dΩ=e−γRΩ(Au−f)2dΩ,

p(u;α)∝e−λRΩφ(∇u)dΩ;

thus, the ﬁrst stage is to ﬁnd a smooth approximation uby

minimizing the multiphase generalizability (TS-MCMS) of

(4), which can be rewritten as

min

u∈W1,2(Ω) ZΩ

(f− Au)2dx+µZΩ

φ(∇u)dx,(6)

where A:Rd→Ris a convolutional blur operator, φ(∇u) =

ν|∇u|2+|∇u|is a geometric prior of u, and µ=λ

γ. Hence,

this leads to the nonlinear system as

F(u;α) := ATAu−µ∇ · (φ0(∇u)) = b,(7)

where b=ATfand α= (A,∇, µ, ν).

3) Feature Fusion Segmentation: Once the features uare

obtained, the segmentation is performed by fusing uproperly

in the second stage; for example, many novel image seg-

mentation methods [30]–[32] have been proposed based on

thresholding the smooth solution u. Then, the fusing process

for feature uis ﬁnished in (3).

The model-driven methods introduce prior knowledge re-

garding many desirable mathematical properties of the un-

derlying anatomical structure, such as phase ﬁeld theory,

Γ-approximation, smoothness, and sparseness. The informed

priors may help to render the segmentation method more

robust and stable. However, these model-inspired methods

generally solve the optimization problem in the image domain,

while the numerical minimization method for the feature

representations uis very slow because the regularization of

the TV-norm, the high dimensionality of u, as well as the

nonlinear relationship between the images and the parameters

pose signiﬁcant computational challenges. Furthermore, it is

challenging to introduce priors ﬂexibly under different clinical

scenes. These limitations make it hard for purely model-based

segmentation to obtain the solutions efﬁciently and ﬂexibly.

The goal of this work was to learn powerful solvers of (7)

and (3) to aggregate a variety of mechanisms to address the

medical image segmentation problem efﬁciently.

B. Proposed Learnable Framework of TS-MCMS Algorithm

We summarized the two-stage algorithm to formulate med-

ical image segmentation based on the TS-MCMS model,

inspired by the CNN architectures of unrolled iterations, and

we propose a learnable framework with two CNN modules on

multiscale feature spaces, FAS-UNet (see Figure 1b), aimed at

learning the nonlinear inverse operators of (7) and (3) in the

context of the variational inverse problem to segment a given

image f.

It is already well known that the unrolled iterations of

many classical algorithms can be considered as the layers of

a neural network [22]–[26]. In this part, we are not interested

in designing another approach for inferring the classes in

MgNet [27], but rather, we aim at extracting the features of a

given image f.

Inspired by the variational segmentation model (6), one

of the key ideas in the proposed architecture is that we

split our framework into a solution module TK(f;θ1)and

a feature fusion module SK(u;θ2), where TK(f;θ1)is the

feature extraction part of the framework (in the multi-stage

case) and SK(·;θ2)is the stage fusion part to be learned.

Therefore, how to design the effective function maps TKfor

approximately solving (7) and SKfor approximating (3) is an

important problem.

This work applies a nonlinear multigrid method to design

FAS-UNet for explainable medical image segmentation by

learning the two following modules:

(u=TK(f;θ1)

s=SK(u;θ2),(8)

where fis an input image, uis the feature maps, and sis

the prediction for the truth partitions, leading to the overall

approximation function as

s=SK(TK(f;θ1); θ2),(9)

where θ1and θ2are parameters to be

learned in the proposed explainable FAS-UNet

architecture.

To understand the approximation ability of the proposed

modules TK(f;θ1)and SK(u;θ2)generated by the FAS-UNet

architecture, we refer the readers to D. Zhou’s work [35],

which answers an open question in CNN learning theory about

how deep CNN can be used to approximate any continuous

function to an arbitrary accuracy when the depth of the neural

network is large enough.

C. FAS-Module for Feature Extraction

In this part, we discuss how the multigrid method can be

used to solve nonlinear problems. The Helmholtz Hypothe-

sis [29] demonstrates that the extracted features can also be

represented by solving the equation:

F(u,α) = b:= ATf, (10)

subject to

min

ukSK(u;θ2)−yk,

where Fdenotes the transformation of combining feature u

with a deblurred image b=ATf,uis the unknown features,

and yis the ground-truth of image f. Our starting point is the

traditional FAS algorithm solving (10).

1) The Full Approximation Scheme: The multigrid method

is usually used to solve nonlinear algebraic systems (10).

For simplicity, the parameter αin F(u,α)is omitted when

only the classical FAS algorithm is involved, i.e.,

F(u) = b.(11)

The multigrid ingredients including the error smoothing

and the coarse grid correction ideas are not restricted to the

linear situation, but can be immediately used for the nonlinear

problem itself, which leads to the so-called FAS algorithm.

The fundamental idea of the nonlinear multigrid is the same

as in the linear case, and the FAS method can be recursively

deﬁned on the basis of a two-grid method. We start with the

description of one ﬁne–coarse cycle (ﬁner grid layer `and

coarser grid layer `+ 1) of the nonlinear two-grid method for

solving (11). To proceed, let the ﬁne grid equation be written

F`(u`) = b`.

Firstly, we compute an approximation ¯

u`:= u`

mof the

ﬁne grid problem by applying mpre-smoothing steps to u`

as follows

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

1FAS-UNet:ANovelFAS-drivenUnettoLearnVariationalImageSegmentationHuiZhu,ShiShu,andJianpingZhangAbstractSolvingvariationalimagesegmentationproblemswithhiddenphysicsisoftenexpensiveandrequiresdifferentalgorithmsandmanuallytunesmodelparameter.ThedeeplearningmethodsbasedontheU-Netstructurehaveobtainedo...

展开>> 收起<<

1 FAS-UNet A Novel FAS-driven Unet to Learn Variational Image Segmentation.pdf

共18页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

1 FAS-UNet A Novel FAS-driven Unet to Learn Variational Image Segmentation

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: