Uniformly convex neural networks and non-stationary iterated network Tikhonov iNETT method Davide Bianchi Guanghao Lai and Wenbin Li

2025-05-06 0 0 2.22MB 34 页 10玖币
侵权投诉
Uniformly convex neural networks and non-stationary iterated
network Tikhonov (iNETT) method
Davide Bianchi, Guanghao Lai, and Wenbin Li
School of Science, Harbin Institute of Technology, Shenzhen, Shenzhen 518055, China.
Email: bianchi@hit.edu.cn, 21s058002@stu.hit.edu.cn, liwenbin@hit.edu.cn
Abstract
We propose a non-stationary iterated network Tikhonov (iNETT) method for the solu-
tion of ill-posed inverse problems. The iNETT employs deep neural networks to build a
data-driven regularizer, and it avoids the difficult task of estimating the optimal regulariza-
tion parameter. To achieve the theoretical convergence of iNETT, we introduce uniformly
convex neural networks to build the data-driven regularizer. Rigorous theories and detailed
algorithms are proposed for the construction of convex and uniformly convex neural net-
works. In particular, given a general neural network architecture, we prescribe sufficient
conditions to achieve a trained neural network which is component-wise convex or uniformly
convex; moreover, we provide concrete examples of realizing convexity and uniform convexity
in the modern U-net architecture. With the tools of convex and uniformly convex neural
networks, the iNETT algorithm is developed and a rigorous convergence analysis is provided.
Lastly, we show applications of the iNETT algorithm in 2D computerized tomography, where
numerical examples illustrate the efficacy of the proposed algorithm.
Keywords: iterated network Tikhonov; uniformly convex neural networks; data-driven regularizer; U-net;
regularization of inverse problem.
MSC2020: 47A52; 65F22; 68T07.
1 Introduction
Consider the discretized form of an ill-posed linear problem,
Fx=y,(1.1)
where Xand Ydenote finite dimensional normed spaces, i.e. X=RN,k·kX,Y=RM,k·kY, and
F:XYis the discretization of an ill-posed linear operator F. The inverse problem aims to recover
xfrom the observed data yδcontaminated by unknown error with bounded norm,
yδ=y+η,where kηkYδ .
Corresponding Author: Wenbin Li
1
arXiv:2210.03314v2 [math.NA] 1 Feb 2023
As yδis not necessarily in the range of F, i.e. yδ/Rg(F), we consider a variational approach to solve
the inverse problem,
xδ:= argmin
xXkFxyδk2
Y.(1.2)
For the ill-posed inverse problem, regularization techniques should be introduced when solving equa-
tion (1.2). The regularization aims to provide prior knowledge and improve stability of the solution.
For example, a typical choice of regularization in imaging is the `p-norm of x, with p1, which can be
weighted by the Laplacian operator with Dirichlet or Neumann boundary condition [3,23]. Recently, deep
learning approaches are introduced to develop data-driven regularization terms in the solutions of inverse
problems. In [37], the authors propose a network Tikhonov (NETT) approach, which combines deep
neural networks with a Tikhonov regularization strategy. The general form of NETT can be summarized
as follows,
xδ
α:= argmin
xdom(F)dom(ΦΘ)A(Fx, yδ) + αψ Θ(x)) ,NETT
where A(Fx, yδ)0 is the data-fidelity term which measures misfits between the approximated and
measurement data, α > 0 is a regularization parameter, and ψΘ(x)) is the regularization (or penalty)
term including a neural network architecture ΦΘ. In particular, ψis a nonnegative functional, and the
neural network ΦΘis trained to penalize artifacts in the recovered solution. By training ΦΘin an appro-
priate way, the neural-network based regularization term is able to capture the feature of solution errors
due to data noises and the inexact iterative scheme, so that it can provide penalization on the artifacts
of solutions in an adaptive manner. This data-driven regularization strategy shows many advantages in
solving inverse problems, and related studies can be found in [4,6,43] and [1] as well.
Motivated by NETT, we propose an iterated network Tikhonov (iNETT) method which combines
the data-driven regularization strategy with an iterated Tikhonov method. In a Tikhonov-like method as
NETT, the regularization parameter αplays an important role since it controls the trade-off between the
data-fidelity term and the regularization term. The value of αrelies on the noise level δ, and it will affect
the proximity of the recovered solution to the minimizer of the data-fidelity term. Poor choices of αcan
lead to very poor solutions, and it is well known that an accurate estimate of the optimal αis difficult to
achieve and it typically relies on heuristic assumptions (e.g., [26,28,46]). As a result, a natural strategy
is to consider an iterated Tikhonov method with non-stationary values of αin the iteration. The non-
stationary iterated Tikhonov method is able to avoid exhaustive tuning of the regularization parameter,
and it achieves better convergence rates in many applications. For example, we refer the readers to [12,
14,16,18,20,22,27] for the applications of iterated Tikhonov in Hilbert spaces, and [9,34,35,45] in
Banach spaces.
Combining the strategy of neural-network based regularizer with the non-stationary iterated Tikhonov
method, the iNETT method has the following general form,
xδ
n:= argmin
xX
1
rkFxyδkr
Y+αnBR
ξδ
n1
(x,xδ
n1),
ξδ
n:= ξδ
n11
αnFTJrFxδ
nyδ,
x0X, ξ0R(x0),
iNETT
where R:= Φuc
Θ:X(R,|·|) is a uniformly convex neural network, BR
ξδ
n1
(·,·) is the Bregman distance
induced by Rin the direction ξδ
n1R(xδ
n1), Jrdenotes the duality map for r(1,), and {αn}n
is a sequence of positive real numbers. The value of αncontrols the amount of regularization, and it
plays the role of regularization parameter. By taking a decreasing sequence of {αn}nand considering the
standard discrepancy principle as the stopping rule, the iNETT algorithm can automatically determines
the amount of regularization. We will provide the details of iNETT in Section 5, including a rigorous
convergence analysis and many implementation details.
2
In the formula of iNETT, the neural network Φuc
Θis employed to build the regularization term, and
it is required to be uniformly convex. The property of uniform convexity is demanded in the convergence
analysis of the iterated Tikhonov method [35]. As a result, another important aspect of the paper
is the modeling of convex and uniformly convex neural networks. In Section 2, we provide an exact
mathematical modeling for the general architecture of neural networks. Our modeling can express the
modern convolutional neural networks, where the operations like skip connection and concatenation are
included. In Section 3, we propose rigorous theories for the convex and uniformly convex neural networks.
Given a general neural network ΦΘ:XZ, we prescribe sufficient conditions to obtain a related neural
network which is component-wise convex or uniformly convex. The main idea comes from some recent
works on convex neural networks, e.g. [2,42,51,53], but we largely extend them to build modern
architectures which can embrace state-of-the-art neural networks. In Section 4, we provide particular
examples of convex and uniformly convex U-net architectures. The U-net is a convolutional neural
network widely used in image processing and related imaging science [48]. We give rigorous formulas
for the U-net architecture, and explain the approaches to obtain convex and uniformly convex U-net
architectures according to the general theories proposed in section 3. In Section 5and Section 6, we
provide implementation details as we employ the convex U-net to build a uniformly convex regularizer
for the iNETT algorithm. The proposed method is successfully applied to computerized tomography
in Section 6. The tool of convex and uniformly convex neural networks is actually a by-product when
designing the iNETT algorithm, but it seems more interesting than the algorithm itself. The tool of
convex neural networks shall have many interesting applications in the future study.
2 Notation and setting
We collect here most of the notations and definitions we will use through this work. As main references,
the reader can look at [47,50,56]. First of all, let us fix X:= RN,k·kXand Y:= RM,k·kY, where
N, M N, and k·kXand k·kYare some norms on RNand RM, respectively. In the case of standard `p
spaces, with p1, then we will indicate the corresponding norm with the usual notation k·kp.
We will indicate in bold any finite dimensional (column) vector, e.g. x:= (x1, . . . , xN)TRN, where
Tdenotes the transpose operation, and we will use the notation xˆ
xmeaning that xiˆxifor every
i= 1, . . . , N. With abuse of language, given a real-valued function σ:RNR, we will say that σis
monotone nondecreasing if σ(x)σ(ˆ
x) for every xˆ
x. In case of a function with multivariate output,
σ:RNRD, we will indicate with σd:RNRits components, for d= 1, . . . , D.
For a fixed zZ:= (RD,k·kZ), we indicate with C(z,·) : XZ×Xthe “concatenation” operator,
that is,
C(z,x):= (z1, . . . , zD, x1, . . . , xN)T.(2.1)
Fix now a matrix F:XY, which is the discretization of an ill-posed linear operator between
Banach spaces. We will assume that, given the unperturbed and observed data yYand yδY,
respectively, then
yRg(F),that is, F x=yis solvable,(H0)
and
yδ=y+ηwhere kηkYδ.
We recall that a Banach space Yis uniformly smooth if its modulus of smoothness
ρ(τ):= sup ky+τˆ
yk+kyτˆ
yk
21| kyk=kˆ
yk= 1, τ > 0
satisfies limτ0+ρ(τ)= 0. Examples of uniformly smooth spaces are all the `p-spaces for p(1,).
3
Given an extended real-valued function, R: dom(R)X(−∞,+], then Ris uniformly convex
if there exists a nonnegative map h: [0,+)[0,+] such that h(s) = 0 if and only if s= 0 and
R(tx+ (1 t)ˆ
x) + t(1 t)h(kxˆ
xkX)tR(x) + (1 t)R(ˆ
x),t[0,1] and x,ˆ
xdom(R).
Finally, we recall that Ris coercive if it is bounded below on bounded sets and
lim inf
kxkX→∞ R(x)
kxkX
=.
2.1 Bregman distance
Given a convex function
R: dom (R)X(−∞,+],
Ris called proper if dom (R) := {xX:R(x)<+∞} 6=. For every ˆ
xdom (R), a subgradient of
Rat ˆ
xis an element ξof the dual space Xsuch that
R(x)− R(ˆ
x)− hξ,xˆ
xi ≥ 0xX,
where the bracket is the evaluation of ξat xˆ
x. Clearly, since Xis finite dimensional, then Xis reflexive
and ,·i is the standard inner product, that is,
hξ,xˆ
xi=
N
X
i=1
ξi(xiˆxi).
The collection of all subgradients of Rat ˆ
xis denoted by R(ˆ
x). The subdifferential of Ris the multi-
valued map R: dom (R)X2Xsuch that
dom (R) := {ˆ
xdom(R) : R(ˆ
x)6=∅},
ˆ
x7→ R(ˆ
x).
Let us recall that if dom(R) = X, then dom (R) = X(e.g. [50, Lemma 3.16]).
Finally, for every ˆ
xdom (R) and ξR(ˆ
x), the Bregman distance BR
ξ(·,ˆ
x) : X[0,+)
induced by Rat ˆ
xin the direction ξis defined by
BR
ξ(x,ˆ
x) := R(x)− R(ˆ
x)− hξ,xˆ
xi.
Remark 2.1. It is straightforward to check that if Ris uniformly convex then BR
ξ(·,ˆ
x)is uniformly
convex too, for any fixed ξand ˆ
x. Moreover, since Xis reflexive, then BR
ξ(·,ˆ
x)is coercive. See for
example [57, Corollary 2.4]
We can now introduce the definition of solution of the model problem (1.1), with respect to the
Bregman distance from a reference initial guess.
Definition 2.1. Fix x0dom (R),ξ0R(x0). An element xdom(R)is called a BR
ξ0-minimizing
solution of (1.1)if Fx=yand
BR
ξ0(x,x0) = min BR
ξ0(x,x0) : xdom(R), F x=y.
As a last piece of notation, we introduce the duality map. For every fixed r(1,), the duality map
Jr:X2Xis given by
Jr(x):={ξX| kξk=kxkr1
Xand hξ,xi=kxkr
X}.
In particular, Jris the subdifferential of the map x7→ kxkr
X
r.
4
Remark 2.2. If Xis an `pspace, then Jris single-valued and for r= 2 it holds
J2(x) = sgn(x)|x|p1
kxkp2
2
.
More generally, if a Banach space is uniformly smooth, then Jris single-valued.
In view of the above remark and for the well-posedness of the iNETT method (see Section 5), we will
assume that
Yis uniformly smooth.(H1)
2.2 Neural networks
A neural network is a chain of compositions of affine operators and nonlinear operators. For an introduc-
tion to neural networks from an applied mathematical point of view, we refer to [31], whereas we refer to
[8] for a focus on deep learning techniques for inverse problems. We present here the basic architecture
upon which we will devise the neural networks to be implemented in iNETT. There are several many
choices for the linear and the nonlinear operators, and each of them generate a different neural network.
We do not focus now on those choices, which will be made only later (see Sections 4,5.3 and 6), to keep
here a more general setting.
First, fix a set of parameters Θ := {bk;Ak,jk;Wk}L
k=1,where bkare vectors commonly called bias
terms, and Ak,jkand Wkare matrices. Second, fix a collection {σk}L
k=1 of possibly nonlinear operators.
Finally, define ΦΘ:XZsuch that
ΦΘ(x):=zZwhere z=zL+1,
zk+1 =σkˆ
bk+Wkˆ
zkZk+1 for k= 1, . . . , L,
ˆ
zk=Ck(zk):=(zkor
C(zik,zk)for ik∈ {1, . . . , k},
ˆ
bk=bk+Ak,jkzjkfor jk∈ {1, . . . , k},
z1:=xX,
(NN)
where ZL+1 =Z:= (RD,k·kZ) and Zk:= (RDk,k·kZk) are finite dimensional normed vector spaces,
and C(·,·) is the concatenation operator (2.1). The operators C(zik,·) and Ak,jkrepresent the skip
connections, that is, some of the data in the previous iterations are used in future iterations, skipping
intermediate steps. See Figure 1for a visual representation. The integer Lis referred to as the depth
of the neural network, and σkˆ
bk+Wkˆ
zkas the k-th layer. When L > 2, then the neural network is
commonly called deep neural network.
The set Θ is made by the disjoint union of two subsets, the set of free parameters Θfree and the set
of frozen parameters Θf rozen, that is
Θ=Θfree tΘfrozen.
The set of free parameters Θfree is typically initialized to a starting set of values and then it is trained
by minimizing a loss function over a training sample. Vice-versa, the set of frozen parameters Θf rozen is
fixed and unaffected by the training process. For example, some of the matrices Wkcan be fixed to be
the identity matrix Ior the bias terms bkto be the zero vector 0. Θfrozen can be empty, that is, all the
parameters are trainable. About the specific training strategy we will employ, see Subsection 5.3.
In the case that Ak,jkis fixed to be the zero matrix and ˆ
zk=zk, for every k, then we have a
feedforward neural network. For examples of simple architectures of feedforward neural networks of
convolutional type, see [19,41]. For examples of more involved neural networks described by (NN), see
ResNet [29], DenseNet [32] and U-Net [48].
5
摘要:

Uniformlyconvexneuralnetworksandnon-stationaryiteratednetworkTikhonov(iNETT)methodDavideBianchi,GuanghaoLai,andWenbinLi*SchoolofScience,HarbinInstituteofTechnology,Shenzhen,Shenzhen518055,China.Email:bianchi@hit.edu.cn,21s058002@stu.hit.edu.cn,liwenbin@hit.edu.cnAbstractWeproposeanon-stationaryitera...

展开>> 收起<<
Uniformly convex neural networks and non-stationary iterated network Tikhonov iNETT method Davide Bianchi Guanghao Lai and Wenbin Li.pdf

共34页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:34 页 大小:2.22MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 34
客服
关注