A matrix-free ILU realization based on surrogates

2025-04-30 0 0 4.97MB 27 页 10玖币

侵权投诉

A MATRIX-FREE ILU REALIZATION BASED ON SURROGATES

DANIEL DRZISGA†, ANDREAS WAGNER† ∗,AND BARBARA WOHLMUTH†

Abstract.

Matrix-free techniques play an increasingly important role in large-scale simulations.

Schur complement techniques and massively parallel multigrid solvers for second-order elliptic partial

diﬀerential equations can signiﬁcantly beneﬁt from reduced memory traﬃc and consumption. The

matrix-free approach often restricts solver components to purely local operations, for instance, to the

most basic schemes like Jacobi- or Gauss–Seidel-Smoothers in multigrid methods. An incomplete LU

(ILU) decomposition cannot be calculated from local information and is therefore not amenable to an

on-the-ﬂy computation which is typically needed for matrix-free calculations. It generally requires the

storage and factorization of a sparse matrix which contradicts the low memory requirements in large

scale scenarios. In this work, we propose a matrix-free ILU realization. More precisely, we introduce

a memory-eﬃcient, matrix-free ILU(0)-Smoother component for low-order conforming ﬁnite-elements

on tetrahedral hybrid grids. Hybrid-grids consist of an unstructured macro-mesh which is subdivided

into a structured micro-mesh. The ILU(0) is used for degrees-of-freedom assigned to the interior

of macro-tetrahedra. This ILU(0)-Smoother can be used for the eﬃcient matrix-free evaluation of

the Steklov–Poincar´e operator from domain-decomposition methods. After introducing and formally

deﬁning our smoother, we investigate its performance on reﬁned macro-tetrahedra. Secondly, the

ILU(0)-Smoother on the macro-tetrahedrons is implemented via surrogate matrix polynomials in

conjunction with a fast on-the-ﬂy evaluation scheme resulting in an eﬃcient matrix-free algorithm.

The polynomial coeﬃcients are obtained by solving a least-squares problem on a small part of the

factorized ILU(0) matrices to stay memory eﬃcient. The convergence rates of this smoother with

respect to the polynomial order are thoroughly studied.

Key words. ILU-Smoother, multigrid, hybrid grids, polynomial surrogates, matrix-free

AMS subject classiﬁcations. 65F55, 65N55

1. Introduction.

The incomplete LU(0)-factorization [34] (ILU) approximates

an LU factorization by retaining the sparsity pattern of the original matrix. For strongly

anisotropic problems in 2D, it is often used as a smoother within multigrid algorithms

since its convergence rates are more stable than the ones of simpler smoothers like

the Gauss–Seidel- or Jacobi-Smoothers [47, Sec. 7.8]. This property carries on to

anisotropic 3D problems in which the coupling in one spatial direction is dominant while

other schemes have to be used if two of the spatial directions are dominant [26]. The

related thresholded ILU-Smoother was recently used for p-multigrid in isogeometric

analysis [41,42] or as a smoother for the wave equation [45].

Besides its usage as a smoother, incomplete factorizations like the ILU are used

as preconditioners [5,40], for instance in problems involving the incompressible Stokes

equation [24] or in electromagnetic scattering [31]. An algorithm for a communication

avoiding ILU(0) preconditioner in the high-performance context was introduced in [22].

Algorithms for the eﬃcient parallel assembly of thresholded ILU preconditioners can

be found in [3] including adaptions to GPUs in [4,30].

Matrix-free methods are becoming increasingly prevalent within ﬁnite-element

frameworks [27,29,46]. For instance, large scale mantle-convection simulations typically

operate on scales on which storing the discretization matrices is not always feasible [6].

On the other hand, reducing the memory traﬃc by not requiring to load a matrix from

memory has the potential to result in faster algorithms on today’s hardware. This

∗Corresponding author.

Funding:

This work was partly supported by the German Research Foundation through grant

WO671/11-1.

†

Lehrstuhl f¨ur Numerische Mathematik, Fakult¨at f¨ur Mathematik (M2), Technische Universit¨at

M¨unchen, Garching bei M¨unchen (drzisga@ma.tum.de,wagneran@ma.tum.de,wohlmuth@ma.tum.de)

arXiv:2210.15280v1 [math.NA] 27 Oct 2022

2DANIEL DRZISGA, ANDREAS WAGNER, BARBARA WOHLMUTH

generates interest in adapting old matrix-based algorithms to the matrix-free context.

For non-local factorization algorithms like the ILU, this poses a tremendous challenge

as the matrix entries cannot be locally computed on-the-ﬂy.

For structured grids, several techniques exist to approximate matrices for an

eﬃcient evaluation. For instance, stencil-scaling techniques that work for both scalar [7]

and vectorial [19] equations. However, since the ILU-approach relies on a matrix

factorization which cannot be computed locally, these approaches are inapplicable.

In our work, we propose a matrix-free ILU realization on structured subgrids based

on surrogates. Here, the discrete matrix, which usually approximates a continuous

operator is additionally approximated by surrogate polynomials [8

–

10,17,18]. These

techniques can also be adapted for hybrid structured grids which are extensively used

in [11

–

13,27] and consist of a coarse unstructured macro-grid which is subdivided

into a ﬁne structured micro-grid. The former gives the approach enough ﬂexibility to

represent relevant domains while the latter provides the computational advantages of

structured grids.

In this work, we apply the surrogate methodology to our factorized ILU matrix.

In the interior of the highly structured grids, we utilize an ILU factorization and

approximate the resulting matrix by surrogate polynomials. This approximation is

formed in a memory-eﬃcient way such that the memory costs stay within sensible

bounds. We therefore obtain an eﬃcient solver in the interior of our structured grid.

To illustrate the potential of our approach, we provide two examples of how the

matrix-free ILU can be used on hybrid grids: Our main application is the approximation

of the Steklov–Poincar´e operator for the Laplacian in a matrix-free way. This operator

is a main ingredient of many non-overlapping domain-decomposition methods and

therefore eﬃcient algorithms for its evaluation are highly relevant, see [15,28,32,38,43]

and references therein. It formally requires the exact inversion of an elliptic equation

inside a subdomain for which a multigrid method can be eﬃciently applied. By using

the ILU-factorization as a smoother within this inner multigrid, the inversion becomes

robust with respect to distortions along one axis. In the supplementary material a

second application is provided in which we extend the subgrid ILU-Smoother to a

smoother on the global grid.

The article is structured as follows: In Section 2, we describe the problem, introduce

the notation and present the Steklov–Poincar´e operator. In Section 3, we introduce an

ILU formulation that is amenable to a matrix-free algorithm. Next, we introduce a

reordering strategy on our hybrid mesh, to optimize its performance as a smoother

inside single subdomains. Finally, we introduce the matrix-free surrogate ILU in

Section 4and compare its asymptotic convergence rates within a multigrid algorithm

to the matrix-based ILU. We conclude with a short outlook and summary in Section 5.

2. Hybrid grids.

In this section, we describe our model problem in the context

of a low-order conforming ﬁnite element discretization on hybrid grids. Hybrid grids

combine the ﬂexibility of unstructured grids with the computational advantages of

structured grids [11

–

13,33]. In addition, they provide a natural domain partitioning

that can be used to distribute the work to diﬀerent nodes.

2.1. Preliminaries and notation.

In the weak form of a Poisson-type equation,

−div

(

K∇u

) =

, on an open domain Ω

⊆R3

with homogeneous Dirichlet boundary

conditions on Γ

D⊆∂

Ω, natural boundary conditions on

∂

Ω

and an inhomogeneous,

bounded, symmetric, uniformly positive-deﬁnite diﬀusion tensor

(

) : Ω

→R3×3

, we

A MATRIX-FREE ILU REALIZATION BASED ON SURROGATES 3

obtain the bilinear form

a(u, v) = ZΩ

∇u(x)>K(x)∇v(x) dx, u, v ∈V=u∈H1(Ω) : u|ΓD= 0.

This includes the special case of a bounded, uniformly-positive scalar material pa-

rameter

: Ω

→R

by setting

κId3

, where

Id3∈R3×3

is the identity matrix.

One application of the full diﬀusion tensor, would be the pull-back of a blending

function which maps a simple tetrahedral domain to a more complex domain, thereby

providing a better approximation of the domain boundary. Given a load

f∈L2

(Ω)

which deﬁnes the linear form

(

) =

RΩf v

, we obtain the standard variational

problem: Find u∈Vsatisfying a(u, v) = F(v) for v∈V.

The typical approach in HHG [11

–

13] and HyTeG [27] is to discretize the domain Ω

with a coarse, possibly unstructured, simplicial triangulation. This so-called macro-

mesh consists of macro-vertices

, macro-edges

, macro-faces

and macro-

tetrahedra

. All macro-primitives are referred to as

VH∪ EH∪ FH∪ TH

Based on this initial grid, we construct a hierarchy of

L∈N

, grids

{Thl, hl

−lH, l

= 2

,...L

+ 1

}

by successive global uniform reﬁnement. The choice to start

in the multigrid hierarchy with

= 2 guarantees that each macro-element contains

at least one interior element, which simpliﬁes the notation in our algorithms. As it

is standard, each of these reﬁnements is achieved by subdividing all elements in 3D

into 8 sub-elements. For details of the reﬁnement in 3D, we refer to [14]. Due to

this reﬁnement process, the element neighborhood at each vertex in the interior of a

macro element is always the same. The whole process of the hybrid grid mesh setup is

schematically depicted in Figure 1.

Associated with

Thl

, is the space

Vhl⊆V

of piecewise linear conforming ﬁnite

elements

Vhl={v∈V:v|t∈ P1(t) for each t∈ Thl}.

}

Fig. 1: Hybrid-grid reﬁnement procedure in 3D for a clipped tetrahedron in a cubic

macro-mesh. The DoF belonging to the index sets

and

I∂t

are illustrated

by diﬀerent colors and shapes.

Let

φi∈Vhl

and

φj∈Vhl

be the scalar-valued linear nodal basis functions

associated with the

-th and

-th mesh node. The set containing all our degrees-of-

freedom (DoF) indices is referred to by

Ihl

. If the multigrid level is obvious from the

context, we will try to suppress the level dependence

for a more compact notation.

Piuiφi

and

Piviφi

we denote linear combinations of the nodal basis

function with coeﬃcients

ui, vi∈R

. Deﬁning the matrix

Aij

(

φi, φj

) and vector

4DANIEL DRZISGA, ANDREAS WAGNER, BARBARA WOHLMUTH

(

φi

) results in the linear algebraic formulation of the discrete variational problem

associated with the weak formulation: Find u∈R|I| satisfying Au=f.

Hybrid meshes impose a domain-partitioning, which is also used for assigning the

DoF in an HPC environment to computing nodes. This approach avoids communication

between the DoF located inside the same macro-primitive, while for DoF on diﬀerent

macro-primitives communication is necessary. This has to be considered for an eﬃcient

evaluation of our operators since operations acting locally on the same primitive type

do not require inter-node communication. To deﬁne these local operations, we have

to introduce notation to localize our vectors and matrices: For arbitrary index sets

I⊆ I

we deﬁne restriction operators

R|I| →R|I|

consisting of zeros and ones,

which discard vector entries whose component is not present in the index set and just

retain entries in

. We also assume that the restriction operator retains the global

DoF ordering. Given a macro-primitive

p∈ PH

, we denote the set of all DoF which

are located on the primitive by

Ip⊆ I

and its restriction operator by

RIp

. For

an arbitrary macro-tetrahedron

t∈ TH

, which is adjacent to macro-vertices

vi∈ VH

≤i≤

4, macro-edges

ej∈ EH

, 1

≤j≤

6 and macro-faces

fk∈ FH

, 1

≤k≤

4 we

deﬁne the index-set of its ghost-layer as

I∂t

= (

∪4

i=1Ivi

)

∪

(

∪6

j=1Iej

)

∪

(

∪4

k=1Ifk

). All

these sets are illustrated in Figure 1.

Our surrogate ILU algorithm heavily relies on geometric properties associated

with our DoF: Each micro-vertex in a macro-tetrahedron on level

can be labeled by

the logical grid coordinates

{

(

x, y, z

)

∈Z3

: 0

≤x, y, z and x

z <

+ 1

}

Similarly, we deﬁne the inner grid coordinates by

{

(

x, y, z

)

∈Z3

: 1

≤

x, y, z and x

z <

. If we restrict the coordinates by setting z to a ﬁxed value, we

obtain a face-layer

{

(

x, y

)

∈Z2

: 0

≤x, y and x

y < N}.

For a vector

u|It∪∂It

on level lrestricted to a tetrahedron t, there is a one-to-one correspondence between

DoF-indices in

and inner logical grid coordinates

which can be constructed as

follows: Assume that

is adjacent to the macro-vertices

at coordinates

epi∈R3

for

≤i≤

4. The tetrahedron is spanned by the edges

epi+1 −ep1

at the base point

ep1

for 1

≤i≤

3 (see Figure 2left). The point

ep(x,y,z)

= (

d1·x

d2·y

d3·z

)

+ 1)

for (

x, y, z

)

∈GL

belongs to a shape function

φk∈Vhl

with

k∈ It∪ I∂t

such that

φk

(

ep(x,y,z)

) = 1. This induces the mapping

ιt

It∪ I∂t →GL

with

ιt

(

) = (

x, y, z

Thus, vector components

of a vector

u|It∪I∂t

with

ι∈ It∪ I∂t

will also be referred

to by

u(x,y,z)

for

= (

x, y, z

)

∈GL

, when the macro-tetrahedron

is evident

from the context.

The mapping between logical grid coordinates and local DoF in

also allows us to

specify an ordering of the DoF indices. This is crucial since the properties of the Gauss–

Seidel-Smoother (GS-Smoother) or the ILU-Smoother strongly depend on this order.

For

i, j ∈ It

with logical grid coordinates (

xi, yi, zi

) =

ιt

(

) and (

xj, yj, zj

) =

ιt

(

)

ﬁx the ordering by

i<j =⇒(zi< zj)∨(zi=zj∧yi< yj)∨(zi=zj∧yi=yj∧xi< xj).

Consequently, the ordering strongly depends on the order of the adjacent macro-vertices

which we used to construct

ιt

. In Section 3, we will use this by permutating

with

a permutation πto obtain good smoothing factors µtfor the new DoF ordering.

We will now introduce the applied stencil notation for our surrogate-ILU-Algorithm

in Section 4. For this, we ﬁrst deﬁne the stencil directions between logical coordinates

as displacement vectors, i.e.

{x−y|x, y ∈GL

. The most common directions are

named after the four cardinal directions, as well as the top and bottom directions,

such that the x-axis runs from west to east, the y-axis from south to north, and the

A MATRIX-FREE ILU REALIZATION BASED ON SURROGATES 5

bnw

tse

(x,y,z)

(x-1, y, z) (x+1, y, z)

(x,y,z+1)

(x,y,z-1) (x+1,y,z-1)

(x+1,y,z-1)

(x-1,y+1,z) (x,y+1,z)

(x+1,y-1,z+1)

(x-1,y,z+1)

(x+1,y-1,z)

(x,y-1,z)

Fig. 2: Left: Direction vectors in a one-to-one correspondence between DoF and coor-

dinates. Right: Stencil directions and grid coordinates inside a structured tetrahedral

grid.

z-axis from top to bottom. For instance, the west direction

corresponds to the

displacement (−1,0,0). All stencil directions are collected in the set

D={w, s, se, bnw, bn, bc, be, c, e, n, nw, tse, ts, tc, tw}.

Relying on the ordering deﬁned above, the set of all lower stencil directions needed in

our ILU is given by

Dl={w, s, se, bnw, bn, bc, be}.

Consider two indices

i∈ It

and

j∈ It∪ I∂t

with coordinates

ιt

(

) and

ιt

(

Due to the local support of the low order conforming ﬁnite-element shape functions,

we know that if

Aij 6

= 0 there exists a

d∈ D

such that

. We can deﬁne the

stencil (

Api

)

d∈D

Api

Aij

. The matrix-vector multiplication

v|It

= (

)

|It

the macro-tetrahedron tcan therefore be written in terms of stencils as

vp=X

d∈D

dup+dfor all p∈˚

where we identiﬁed the DoFs with logical coordinates. Stencils and the associated grid

coordinates are depicted in Figure 2(right).

We mainly rely on a geometric multigrid algorithm which combines a so called

smoother with a coarse grid correction step to an optimal solver (see e.g. [23]). We now

introduce smoothers which only act on the DoF of a single primitive. This is motivated

by our hybrid mesh on which only operations between DoF located on the same

primitive are cheap while everything else requires expensive inter-node communication.

In our setting, for a given primitive

p∈ PH

, a smoother acting on the DoF

located

on the primitive can be described by applying a preconditioner matrix

Cp∈R|Ip|×|Ip|

inside a Richardson iteration with the appropriate restriction operators

u←u+RT

pC−1

pRp(f−Au),

where we denote the current right-hand-side vector by

and the current estimate

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

AMATRIX-FREEILUREALIZATIONBASEDONSURROGATESDANIELDRZISGAy,ANDREASWAGNERy,ANDBARBARAWOHLMUTHyAbstract.Matrix-freetechniquesplayanincreasinglyimportantroleinlarge-scalesimulations.Schurcomplementtechniquesandmassivelyparallelmultigridsolversforsecond-orderellipticpartialdierentialequationscansignic...

展开>> 收起<<

A matrix-free ILU realization based on surrogates.pdf

共27页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

A matrix-free ILU realization based on surrogates

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: