Uniformly convex neural networks and non-stationary iterated network Tikhonov iNETT method Davide Bianchi Guanghao Lai and Wenbin Li

2025-05-06 1 0 2.22MB 34 页 10玖币

侵权投诉

Uniformly convex neural networks and non-stationary iterated

network Tikhonov (iNETT) method

Davide Bianchi, Guanghao Lai, and Wenbin Li ∗

School of Science, Harbin Institute of Technology, Shenzhen, Shenzhen 518055, China.

Email: bianchi@hit.edu.cn, 21s058002@stu.hit.edu.cn, liwenbin@hit.edu.cn

Abstract

We propose a non-stationary iterated network Tikhonov (iNETT) method for the solu-

tion of ill-posed inverse problems. The iNETT employs deep neural networks to build a

data-driven regularizer, and it avoids the diﬃcult task of estimating the optimal regulariza-

tion parameter. To achieve the theoretical convergence of iNETT, we introduce uniformly

convex neural networks to build the data-driven regularizer. Rigorous theories and detailed

algorithms are proposed for the construction of convex and uniformly convex neural net-

works. In particular, given a general neural network architecture, we prescribe suﬃcient

conditions to achieve a trained neural network which is component-wise convex or uniformly

convex; moreover, we provide concrete examples of realizing convexity and uniform convexity

in the modern U-net architecture. With the tools of convex and uniformly convex neural

networks, the iNETT algorithm is developed and a rigorous convergence analysis is provided.

Lastly, we show applications of the iNETT algorithm in 2D computerized tomography, where

numerical examples illustrate the eﬃcacy of the proposed algorithm.

Keywords: iterated network Tikhonov; uniformly convex neural networks; data-driven regularizer; U-net;

regularization of inverse problem.

MSC2020: 47A52; 65F22; 68T07.

1 Introduction

Consider the discretized form of an ill-posed linear problem,

Fx=y,(1.1)

where Xand Ydenote ﬁnite dimensional normed spaces, i.e. X=RN,k·kX,Y=RM,k·kY, and

F:X→Yis the discretization of an ill-posed linear operator F. The inverse problem aims to recover

xfrom the observed data yδcontaminated by unknown error with bounded norm,

yδ=y+η,where kηkY≤δ .

∗Corresponding Author: Wenbin Li

arXiv:2210.03314v2 [math.NA] 1 Feb 2023

As yδis not necessarily in the range of F, i.e. yδ/∈Rg(F), we consider a variational approach to solve

the inverse problem,

xδ:= argmin

x∈XkFx−yδk2

Y.(1.2)

For the ill-posed inverse problem, regularization techniques should be introduced when solving equa-

tion (1.2). The regularization aims to provide prior knowledge and improve stability of the solution.

For example, a typical choice of regularization in imaging is the `p-norm of x, with p≥1, which can be

weighted by the Laplacian operator with Dirichlet or Neumann boundary condition [3,23]. Recently, deep

learning approaches are introduced to develop data-driven regularization terms in the solutions of inverse

problems. In [37], the authors propose a network Tikhonov (NETT) approach, which combines deep

neural networks with a Tikhonov regularization strategy. The general form of NETT can be summarized

as follows,

xδ

α:= argmin

x∈dom(F)∩dom(ΦΘ)A(Fx, yδ) + αψ (ΦΘ(x)) ,NETT

where A(Fx, yδ)≥0 is the data-ﬁdelity term which measures misﬁts between the approximated and

measurement data, α > 0 is a regularization parameter, and ψ(ΦΘ(x)) is the regularization (or penalty)

term including a neural network architecture ΦΘ. In particular, ψis a nonnegative functional, and the

neural network ΦΘis trained to penalize artifacts in the recovered solution. By training ΦΘin an appro-

priate way, the neural-network based regularization term is able to capture the feature of solution errors

due to data noises and the inexact iterative scheme, so that it can provide penalization on the artifacts

of solutions in an adaptive manner. This data-driven regularization strategy shows many advantages in

solving inverse problems, and related studies can be found in [4,6,43] and [1] as well.

Motivated by NETT, we propose an iterated network Tikhonov (iNETT) method which combines

the data-driven regularization strategy with an iterated Tikhonov method. In a Tikhonov-like method as

NETT, the regularization parameter αplays an important role since it controls the trade-oﬀ between the

data-ﬁdelity term and the regularization term. The value of αrelies on the noise level δ, and it will aﬀect

the proximity of the recovered solution to the minimizer of the data-ﬁdelity term. Poor choices of αcan

lead to very poor solutions, and it is well known that an accurate estimate of the optimal αis diﬃcult to

achieve and it typically relies on heuristic assumptions (e.g., [26,28,46]). As a result, a natural strategy

is to consider an iterated Tikhonov method with non-stationary values of αin the iteration. The non-

stationary iterated Tikhonov method is able to avoid exhaustive tuning of the regularization parameter,

and it achieves better convergence rates in many applications. For example, we refer the readers to [12,

14,16,18,20,22,27] for the applications of iterated Tikhonov in Hilbert spaces, and [9,34,35,45] in

Banach spaces.

Combining the strategy of neural-network based regularizer with the non-stationary iterated Tikhonov

method, the iNETT method has the following general form,











xδ

n:= argmin

x∈X

rkFx−yδkr

Y+αnBR

ξδ

n−1

(x,xδ

n−1),

ξδ

n:= ξδ

n−1−1

αnFTJrFxδ

n−yδ,

x0∈X, ξ0∈∂R(x0),

iNETT

where R:= Φuc

Θ:X→(R,|·|) is a uniformly convex neural network, BR

ξδ

n−1

(·,·) is the Bregman distance

induced by Rin the direction ξδ

n−1∈∂R(xδ

n−1), Jrdenotes the duality map for r∈(1,∞), and {αn}n

is a sequence of positive real numbers. The value of αncontrols the amount of regularization, and it

plays the role of regularization parameter. By taking a decreasing sequence of {αn}nand considering the

standard discrepancy principle as the stopping rule, the iNETT algorithm can automatically determines

the amount of regularization. We will provide the details of iNETT in Section 5, including a rigorous

convergence analysis and many implementation details.

In the formula of iNETT, the neural network Φuc

Θis employed to build the regularization term, and

it is required to be uniformly convex. The property of uniform convexity is demanded in the convergence

analysis of the iterated Tikhonov method [35]. As a result, another important aspect of the paper

is the modeling of convex and uniformly convex neural networks. In Section 2, we provide an exact

mathematical modeling for the general architecture of neural networks. Our modeling can express the

modern convolutional neural networks, where the operations like skip connection and concatenation are

included. In Section 3, we propose rigorous theories for the convex and uniformly convex neural networks.

Given a general neural network ΦΘ:X→Z, we prescribe suﬃcient conditions to obtain a related neural

network which is component-wise convex or uniformly convex. The main idea comes from some recent

works on convex neural networks, e.g. [2,42,51,53], but we largely extend them to build modern

architectures which can embrace state-of-the-art neural networks. In Section 4, we provide particular

examples of convex and uniformly convex U-net architectures. The U-net is a convolutional neural

network widely used in image processing and related imaging science [48]. We give rigorous formulas

for the U-net architecture, and explain the approaches to obtain convex and uniformly convex U-net

architectures according to the general theories proposed in section 3. In Section 5and Section 6, we

provide implementation details as we employ the convex U-net to build a uniformly convex regularizer

for the iNETT algorithm. The proposed method is successfully applied to computerized tomography

in Section 6. The tool of convex and uniformly convex neural networks is actually a by-product when

designing the iNETT algorithm, but it seems more interesting than the algorithm itself. The tool of

convex neural networks shall have many interesting applications in the future study.

2 Notation and setting

We collect here most of the notations and deﬁnitions we will use through this work. As main references,

the reader can look at [47,50,56]. First of all, let us ﬁx X:= RN,k·kXand Y:= RM,k·kY, where

N, M ∈N, and k·kXand k·kYare some norms on RNand RM, respectively. In the case of standard `p

spaces, with p≥1, then we will indicate the corresponding norm with the usual notation k·kp.

We will indicate in bold any ﬁnite dimensional (column) vector, e.g. x:= (x1, . . . , xN)T∈RN, where

Tdenotes the transpose operation, and we will use the notation x≤ˆ

xmeaning that xi≤ˆxifor every

i= 1, . . . , N. With abuse of language, given a real-valued function σ:RN→R, we will say that σis

monotone nondecreasing if σ(x)≤σ(ˆ

x) for every x≤ˆ

x. In case of a function with multivariate output,

σ:RN→RD, we will indicate with σd:RN→Rits components, for d= 1, . . . , D.

For a ﬁxed z∈Z:= (RD,k·kZ), we indicate with C(z,·) : X→Z×Xthe “concatenation” operator,

that is,

C(z,x):= (z1, . . . , zD, x1, . . . , xN)T.(2.1)

Fix now a matrix F:X→Y, which is the discretization of an ill-posed linear operator between

Banach spaces. We will assume that, given the unperturbed and observed data y∈Yand yδ∈Y,

respectively, then

y∈Rg(F),that is, F x=yis solvable,(H0)

and

yδ=y+ηwhere kηkY≤δ.

We recall that a Banach space Yis uniformly smooth if its modulus of smoothness

ρ(τ):= sup ky+τˆ

yk+ky−τˆ

2−1| kyk=kˆ

yk= 1, τ > 0

satisﬁes limτ→0+ρ(τ)/τ = 0. Examples of uniformly smooth spaces are all the `p-spaces for p∈(1,∞).

Given an extended real-valued function, R: dom(R)⊆X→(−∞,+∞], then Ris uniformly convex

if there exists a nonnegative map h: [0,+∞)→[0,+∞] such that h(s) = 0 if and only if s= 0 and

R(tx+ (1 −t)ˆ

x) + t(1 −t)h(kx−ˆ

xkX)≤tR(x) + (1 −t)R(ˆ

x),∀t∈[0,1] and ∀x,ˆ

x∈dom(R).

Finally, we recall that Ris coercive if it is bounded below on bounded sets and

lim inf

kxkX→∞ R(x)

kxkX

=∞.

2.1 Bregman distance

Given a convex function

R: dom (R)⊆X→(−∞,+∞],

Ris called proper if dom (R) := {x∈X:R(x)<+∞} 6=∅. For every ˆ

x∈dom (R), a subgradient of

Rat ˆ

xis an element ξof the dual space X∗such that

R(x)− R(ˆ

x)− hξ,x−ˆ

xi ≥ 0∀x∈X,

where the bracket is the evaluation of ξat x−ˆ

x. Clearly, since Xis ﬁnite dimensional, then Xis reﬂexive

and h·,·i is the standard inner product, that is,

hξ,x−ˆ

xi=

i=1

ξi(xi−ˆxi).

The collection of all subgradients of Rat ˆ

xis denoted by ∂R(ˆ

x). The subdiﬀerential of Ris the multi-

valued map ∂R: dom (∂R)⊆X→2X∗such that

dom (∂R) := {ˆ

x∈dom(R) : ∂R(ˆ

x)6=∅},

x7→ ∂R(ˆ

x).

Let us recall that if dom(R) = X, then dom (∂R) = X(e.g. [50, Lemma 3.16]).

Finally, for every ˆ

x∈dom (∂R) and ξ∈∂R(ˆ

x), the Bregman distance BR

ξ(·,ˆ

x) : X→[0,+∞)

induced by Rat ˆ

xin the direction ξis deﬁned by

ξ(x,ˆ

x) := R(x)− R(ˆ

x)− hξ,x−ˆ

xi.

Remark 2.1. It is straightforward to check that if Ris uniformly convex then BR

ξ(·,ˆ

x)is uniformly

convex too, for any ﬁxed ξand ˆ

x. Moreover, since Xis reﬂexive, then BR

ξ(·,ˆ

x)is coercive. See for

example [57, Corollary 2.4]

We can now introduce the deﬁnition of solution of the model problem (1.1), with respect to the

Bregman distance from a reference initial guess.

Deﬁnition 2.1. Fix x0∈dom (∂R),ξ0∈∂R(x0). An element x†∈dom(R)is called a BR

ξ0-minimizing

solution of (1.1)if Fx†=yand

ξ0(x†,x0) = min BR

ξ0(x,x0) : x∈dom(R), F x=y.

As a last piece of notation, we introduce the duality map. For every ﬁxed r∈(1,∞), the duality map

Jr:X→2X∗is given by

Jr(x):={ξ∈X∗| kξk=kxkr−1

Xand hξ,xi=kxkr

X}.

In particular, Jris the subdiﬀerential of the map x7→ kxkr

Remark 2.2. If Xis an `pspace, then Jris single-valued and for r= 2 it holds

J2(x) = sgn(x)|x|p−1

kxkp−2

More generally, if a Banach space is uniformly smooth, then Jris single-valued.

In view of the above remark and for the well-posedness of the iNETT method (see Section 5), we will

assume that

Yis uniformly smooth.(H1)

2.2 Neural networks

A neural network is a chain of compositions of aﬃne operators and nonlinear operators. For an introduc-

tion to neural networks from an applied mathematical point of view, we refer to [31], whereas we refer to

[8] for a focus on deep learning techniques for inverse problems. We present here the basic architecture

upon which we will devise the neural networks to be implemented in iNETT. There are several many

choices for the linear and the nonlinear operators, and each of them generate a diﬀerent neural network.

We do not focus now on those choices, which will be made only later (see Sections 4,5.3 and 6), to keep

here a more general setting.

First, ﬁx a set of parameters Θ := {bk;Ak,jk;Wk}L

k=1,where bkare vectors commonly called bias

terms, and Ak,jkand Wkare matrices. Second, ﬁx a collection {σk}L

k=1 of possibly nonlinear operators.

Finally, deﬁne ΦΘ:X→Zsuch that











ΦΘ(x):=z∈Zwhere z=zL+1,

zk+1 =σkˆ

bk+Wkˆ

zk∈Zk+1 for k= 1, . . . , L,

zk=Ck(zk):=(zkor

C(zik,zk)for ik∈ {1, . . . , k},

bk=bk+Ak,jkzjkfor jk∈ {1, . . . , k},

z1:=x∈X,

(NN)

where ZL+1 =Z:= (RD,k·kZ) and Zk:= (RDk,k·kZk) are ﬁnite dimensional normed vector spaces,

and C(·,·) is the concatenation operator (2.1). The operators C(zik,·) and Ak,jkrepresent the skip

connections, that is, some of the data in the previous iterations are used in future iterations, skipping

intermediate steps. See Figure 1for a visual representation. The integer Lis referred to as the depth

of the neural network, and σkˆ

bk+Wkˆ

zkas the k-th layer. When L > 2, then the neural network is

commonly called deep neural network.

The set Θ is made by the disjoint union of two subsets, the set of free parameters Θfree and the set

of frozen parameters Θf rozen, that is

Θ=Θfree tΘfrozen.

The set of free parameters Θfree is typically initialized to a starting set of values and then it is trained

by minimizing a loss function over a training sample. Vice-versa, the set of frozen parameters Θf rozen is

ﬁxed and unaﬀected by the training process. For example, some of the matrices Wkcan be ﬁxed to be

the identity matrix Ior the bias terms bkto be the zero vector 0. Θfrozen can be empty, that is, all the

parameters are trainable. About the speciﬁc training strategy we will employ, see Subsection 5.3.

In the case that Ak,jkis ﬁxed to be the zero matrix and ˆ

zk=zk, for every k, then we have a

feedforward neural network. For examples of simple architectures of feedforward neural networks of

convolutional type, see [19,41]. For examples of more involved neural networks described by (NN), see

ResNet [29], DenseNet [32] and U-Net [48].

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Uniformlyconvexneuralnetworksandnon-stationaryiteratednetworkTikhonov(iNETT)methodDavideBianchi,GuanghaoLai,andWenbinLi*SchoolofScience,HarbinInstituteofTechnology,Shenzhen,Shenzhen518055,China.Email:bianchi@hit.edu.cn,21s058002@stu.hit.edu.cn,liwenbin@hit.edu.cnAbstractWeproposeanon-stationaryitera...

展开>> 收起<<

Uniformly convex neural networks and non-stationary iterated network Tikhonov iNETT method Davide Bianchi Guanghao Lai and Wenbin Li.pdf

共34页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Uniformly convex neural networks and non-stationary iterated network Tikhonov iNETT method Davide Bianchi Guanghao Lai and Wenbin Li

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: