Probabilistic Veriﬁcation of Approximate Algorithms with Unstructured Errors Application to Fully Inexact Generalized ADMM

2025-05-02 0 0 738.03KB 14 页 10玖币

侵权投诉

Probabilistic Veriﬁcation of Approximate

Algorithms with Unstructured Errors:

Application to Fully Inexact Generalized ADMM

Anis Hamadouche, Yun Wu, Andrew M. Wallace, and Jo˜

ao F. C. Mota

Abstract—We analyse the convergence of an approximate, fully

inexact, ADMM algorithm under additive, deterministic and

probabilistic error models. We consider the generalized ADMM

scheme that is derived from generalized Lagrangian penalty

with additive (smoothing) adaptive-metric quadratic proximal

perturbations. We derive explicit deterministic and probabilistic

convergence upper bounds for the lower-C2nonconvex case as

well as the convex case under the Lipschitz continuity condition.

We also present more practical conditions on the proximal

errors under which convergence of the approximate ADMM to

a suboptimal solution is guaranteed with high probability. We

consider statistically and dynamically-unstructured conditional

mean independent bounded error sequences. We validate our re-

sults using both simulated and practical software and algorithmic

computational perturbations. We apply the proposed algorithm

to a synthetic LASSO and robust regression with k-support

norm regularization problems and test our proposed bounds

under different computational noise levels. Compared to classical

convergence results, the adaptive probabilistic bounds are more

accurate in predicting the distance from the optimal set and

parasitic residual error under different sources of inaccuracies.

Index Terms—Numerical Linear Algebra; Numerical Op-

timization; ADMM; Douglas-Rachford method; Approximate

Computing.

I. INTRODUCTION

The Alternating Direction of Multiplier Method (ADMM)

originally emerged from the early works of Peaceman and

Rachford [2, 1, 3, 4], Glowinski and Marrocco [8], and Gabay

and Mercier [10] on applying numerical procedures to solve

partial differential equations arising in heat conduction and

continuum mechanics problems. Due to its simple implemen-

tation and efﬁciency in solving large-scale optimization prob-

lems encountered in statistics, and more recent machine learn-

ing problems, ADMM has evolved into different forms and

has been applied to solve more general nonconvex problems

that cannot be solved using conventional methods. ADMM is

also a popular method for online and distributed optimization

[22]. ADMM can be viewed as a decomposition procedure that

exploit the separability of the objective function to reduce the

difﬁculty of joint minimisation arising from classical methods

of multiplier [5, 6, 7, 9] at the cost of increased number of

iterations [39]. However, this procedure yields efﬁcient local

Work supported by UK’s EPSRC (EP/T026111/1, EP/S000631/1), and the

MOD University Defence Research Collaboration.

Anis Hamadouche, Yun Wu, Andrew M. Wallace, and Jo˜

ao F.

C. Mota are with the School of Engineering & Physical Sci-

ences, Heriot-Watt University, Edinburgh EH14 4AS, UK. (e-mail:

{ah225,y.wu,a.m.wallace,j.mota}@hw.ac.uk).

solutions to subproblems that are coordinated via constraints to

ﬁnd the global solution [10, 8]. The subproblems are typically

easier to solve than the original problem, and as a matter of

fact, closed-form solutions are often available. For instance,

in the following LASSO problem

minimize

x∈Rn,z∈Rmf(x) := kAx −bk2

2+kzk1,

subject to x−z= 0 ,

(1)

when the data matrix Ahas some circulant structure as those

in signal/image processing, the system can be solved via fast

Fourier transform with insigniﬁcant computation cost [21,

26, 30]. However, when the ADMM subproblems do not

possess closed-form solutions, the solution is obtained (or

approximated) iteratively which can lead to increased compu-

tation cost with additional incurred truncation (computational)

errors. This motivates the need for new computationally-aware

variants of the classical ADMM with inexpensive auxiliary

subproblems.

Another feature that makes ADMM technique practically

appealing is its robustness to poorly selected algorithm param-

eters, i.e., the method is guaranteed to converge independently

from the selected parameter. For instance, if the objective

functions are strongly convex and have Lipschitz-continuous

gradients, then the iterates produced by the ADMM algorithm

converge linearly to the optimum in a certain distance metric

no matter which parameter value is selected [39, 32, 41, 38, 40,

43, 46, 62]. For an extensive review of ADMM, the interested

reader is referred to the survey papers by [22, 36, 33].

Before we delve into the details of composite optimization,

ADMM technique and proximal calculus, let us ﬁrst recall

some basic deﬁnitions from the theory of optimization and

proximal calculus.

A. Background

Deﬁnition 1 (Convex and Concave functions).A function f:

Rn→R∪ {±∞} is convex if

f(βx +(1−β)y)≤βf (x)+ (1−β)f(y),∀x, y ∈Rn.(2)

with β∈(0,1). If fis convex then −fis concave.

Deﬁnition 2 (Proper function).A function is proper of its

value is never −∞ and it is ﬁnite somewhere.

arXiv:2210.02094v1 [math.OC] 5 Oct 2022

Deﬁnition 3 (lower semicontinuous function).A function f:

Rn→R∪{±∞} is lower semicontinuous at a point y∈Rn

if and only if

lim inf

x→yf(x)≤f(y)(3)

Deﬁnition 4 (Closed function).A convex and proper function

is closed if and only if it is lower semicontinuous.

Deﬁnition 5 (Saddle point).Let L(x, y) : Rn×Rm:→R∪

{±∞} be a convex-concave function. The tuple (x?, y?)is

said to be a saddle point of Lif

L(x?, y)≤ L(x?, y?)≤ L(x, y?)(4)

for all x∈Rn,y∈Rm.

Deﬁnition 6 (subgradient).Let w∈∂f(x), then ∀y∈Rn, we

have

f(y)≥f(x) + hw, y −xi.(5)

Deﬁnition 7 (ε-subgradient).Let w∈∂εf(x), then ∀y∈Rn,

we have

f(y)≥f(x) + hw, y −xi − ε. (6)

Deﬁnition 8 (Proximal operator).For all y∈Rnwe have

proxu(y) := arg min

u(z) + 1

2kz−yk2

2.(7)

Deﬁnition 9 (Approximate Proximal operator).

proxε

u(y) := nx∈Rn:u(x) + 1

2kx−yk2

2≤

ε+ inf

zu(z) + 1

2kz−yk2

2o,

(8)

Deﬁnition 10 (lower-C2functions [19]).The function fis

lower-C2on an open set Vif for each x∈Vthere is a

neighbourhood Vxof xupon which a representation f(x) =

max

t∈Tft(x)holds for xin Vx, where Tis a compact set and

the functions ftare twice continuously differentiable jointly in

both xand t.

Deﬁnition 11 (lower-C2functions [20]).The function fis

lower-C2on an open set Vif at any point xin V, the sum

of fwith a quadratic term is a convex function on an open

neighbourhood V0of x.

Deﬁnition 12. Weighted norm Given a PSD matrix M, we

say that kxk2

M=hx, Mxi=hM x, xiis an M-weighted norm

of x.

B. Problem statement

The category of problems that ADMM solves is

minimize

x∈Rn,z∈Rmf(x) := g(x) + h(z),

subject to Ax +Bz =c ,

(9)

where gand hare lower-C2, possibly convex and nondiffer-

entiable functions, A∈Rp×nand B∈Rp×m.

The augmented Lagrangian for problem (9) is constructed

as follows

Lρ(x, z, y) = g(x)+h(z)+y>(Ax+Bz−c)+ρ

2kAx +Bz −ck2

(10)

where yis the Lagrange multiplier associated with the linear

constraint in (9), and ρis a positive scalar parameter. The

ADMM iteration is obtained by minimizing the augmented

Lagrangian (10) with respect to xand zvariables, and updat-

ing the dual variable yas follows

xk+1 = arg min

xLρ(x, zk, yk)(11a)

zk+1 = arg min

zLρ(xk+1, z, yk)(11b)

yk+1 =yk+ρ(Axk+1 +Bzk+1 −c).(11c)

The equivalent scaled proximal ADMM is given by

xk+1 = arg min

g(x) + 1

2λkAx +Bzk−c+vkk2(12a)

zk+1 = arg min

h(z) + 1

2λkAxk+1 +Bz −c+vkk2

(12b)

vk+1 =vk+ (Axk+1 +Bzk+1 −c).

(12c)

where vk= (1/ρ)ykand λ= 1/ρ.

Consider the case where both ADMM subproblems (12a)

and (12a) are solved inexactly and/or using iterates with

additive perturbations (rk

xand rk

z); i.e., xk=xk+rk

xand

zk=zk+rk

z, where xkand zkare the exact iterates.

Classical convergence guarantees cannot be easily extended

to the inexact case since they completely ignore the presence

and propagation of computational errors during iterations.

Therefore, a more rigorous treatment of such errors is needed

to verify the convergence, reliability and portability of the

ADMM (and its variants) to different computational environ-

ments.

Before we analyse the convergence of the inexact ADMM,

let us introduce some other variants of the original scheme

and focus our analysis to the most generalized form.

C. WL-ADMM approximation

The approximate WL-ADMM is formulated by approximat-

ing the euclidean norm by some positive deﬁnite (PD) Lk-

weighted norm k.k2

Lkin the xand zupdates as follows:

xk+1 = arg min

g(x) + 1

2λkAx +Bzk−c+vkk2

(13a)

zk+1 = arg min

h(z) + 1

2λkAxk+1 +Bz −c+vkk2

(13b)

vk+1 =vk+ (Axk+1 +Bzk+1 −c).

(13c)

where vk= (1/ρ)yk,λ= 1/ρ.

D. WLM-ADMM approximation

Adding quadratic perturbations to (13a) and (13b) yields the

approximate WLM-ADMM

xk+1 = arg min

g(x) + 1

2λkAx +Bzk−c+vkk2

2kx−xkk2

x(14a)

zk+1 = arg min

h(z) + 1

2λkAxk+1 +Bz −c+vkk2

2kz−zkk2

z(14b)

vk+1 =vk+ (Axk+1 +Bzk+1 −c).

(14c)

where Mk

xand Mk

zare positive deﬁnite matrices. Note that in

the case of convex function f, adding quadratic terms (prox-

imal terms) makes the objective functions strongly convex

which improves the condition number of the problem being

solved at the expense of yielding approximate solutions.

(14) can be written as

xk+1 = arg min

g(x) + 1

2kΛ1/2

1kx−Λ−1/2

1kΓ1kk2(15a)

zk+1 = arg min

h(z) + 1

2kΛ1/2

2kz−Λ−1/2

2kΓ2kk2(15b)

vk+1 =vk+ (Axk+1 +Bzk+1 −c).(15c)

Or in generalized norms form as follows

xk+1 = arg min

g(x) + 1

2kx−Λ−1

1kΓ1kk2

Λ1k(16a)

zk+1 = arg min

h(z) + 1

2kz−Λ−1

2kΓ2kk2

Λ2k(16b)

vk+1 =vk+ (Axk+1 +Bzk+1 −c).(16c)

where

Λ1k=1

λA>LkA+Mk

x(17a)

Γ1k=Mk

xxk−1

λA>Lk(Bzk−c+vk)(17b)

Λ2k=1

λB>LkB+Mk

z(17c)

Γ2k=Mk

zzk−1

λB>Lk(Axk+1 −c+vk)(17d)

E. Related work

A new ADMM variant of (11) was proposed in [34] that

yields simple proximal evaluations of the objective functions

according the following scheme

ζk

x=1

ρ(sk−1

x−xk)∈∂g(xk)(18a)

ex(xk, vk) = xk−PX[xk−ρ(ζk−A>yk)] (18b)

x=xk+ρζk

x−γαkex(xk, yk)(18c)

xk+1 = arg min

g(x) + 1

2ρkx−sk

xk2(18d)

ζk

z=1

ρ(sk−1

z−zk)∈∂h(zk)(18e)

ez(zk, yk) = zk−PZ[zk−ρ(ζk

z−B>yk)] (18f)

z=zk+ρζk

z−γαkez(zk, yk)(18g)

zk+1 = arg min

h(z) + 1

2ρkz−sk

zk2(18h)

yk+1 =yk+ρ(Axk+1 +Bzk+1 −c).(18i)

where PXand PZdenote the orthogonal projections into sub-

spaces Xand Z, respectively. The stepsize sequence {αk}k>0

is chosen with convergence guarantees. Note that (18a) and

(18e) are both proximal evaluations of gand hat sk

xand sk

respectively.

1) The inexact proximal ADMM: For the method to

be implementable, it is important for numerical optimisa-

tion algorithms to handle approximate solutions. Consider-

ing error-resilient applications which involve optimization

problems that do not require highly accurate solutions,

more computationally-friendly inexact (approximate) ADMM

schemes can be used instead of the original scheme.

Motivated by Rockafellar’s proximal method of multipliers,

which involves an augmented Lagrangian with an additional

quadratic proximal term, [13, 29, 16] proposed inexact semi-

proximal ADMM with added proximal terms as follows

xk+1 ≈arg min

xLρ(x, zk, yk) + 1

2kx−xkk2

x(19a)

zk+1 ≈arg min

zLρ(xk+1, z, yk) + 1

2kz−zkk2

z(19b)

yk+1 =yk+αρ(Axk+1 +Bzk+1 −c).(19c)

with Fortin and Glowinski’s relaxation factor α∈[0,(1 +

√5)/2] proposed by [29]. This algorithm is shown to preserve

the good features of the primal proximal method of multipliers,

with the additional advantage that it leads to a decoupling of

the constraints. Mk

xand Mk

zare selected to be PSD matrices

in [29] and in [16] they are selected to have some positive

minimum spectral radius. Note that the inexact semi-proximal

ADMM reduces to the classical ADMM when Mk

x=Mk

z= 0

and to the proximal ADMM [13] when both proximal matrices

are PD and α= 1. Global convergence of the semi-proximal

ADMM was proved in [39] and the iteration complexity

(convergence rate) of O(1/k)was derived in [28]. Faster rates

can be achieved with varying αand better selection of ρ,

xand Mk

z[39]. For instance, [16, 39, 29, 37] considered

indeﬁnite self-adaptive Mk

xand Mk

zand an optimal self-

adaptive scheme is presented in [60]. Mxand Mzare typically

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ProbabilisticVericationofApproximateAlgorithmswithUnstructuredErrors:ApplicationtoFullyInexactGeneralizedADMMAnisHamadouche,YunWu,AndrewM.Wallace,andJoaoF.C.MotaAbstractWeanalysetheconvergenceofanapproximate,fullyinexact,ADMMalgorithmunderadditive,deterministicandprobabilisticerrormodels.Weconsid...

展开>> 收起<<

Probabilistic Veriﬁcation of Approximate Algorithms with Unstructured Errors Application to Fully Inexact Generalized ADMM.pdf

共14页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Probabilistic Veriﬁcation of Approximate Algorithms with Unstructured Errors Application to Fully Inexact Generalized ADMM

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: