A matrix-free ILU realization based on surrogates

2025-04-30 0 0 4.97MB 27 页 10玖币
侵权投诉
A MATRIX-FREE ILU REALIZATION BASED ON SURROGATES
DANIEL DRZISGA, ANDREAS WAGNER† ∗,AND BARBARA WOHLMUTH
Abstract.
Matrix-free techniques play an increasingly important role in large-scale simulations.
Schur complement techniques and massively parallel multigrid solvers for second-order elliptic partial
differential equations can significantly benefit from reduced memory traffic and consumption. The
matrix-free approach often restricts solver components to purely local operations, for instance, to the
most basic schemes like Jacobi- or Gauss–Seidel-Smoothers in multigrid methods. An incomplete LU
(ILU) decomposition cannot be calculated from local information and is therefore not amenable to an
on-the-fly computation which is typically needed for matrix-free calculations. It generally requires the
storage and factorization of a sparse matrix which contradicts the low memory requirements in large
scale scenarios. In this work, we propose a matrix-free ILU realization. More precisely, we introduce
a memory-efficient, matrix-free ILU(0)-Smoother component for low-order conforming finite-elements
on tetrahedral hybrid grids. Hybrid-grids consist of an unstructured macro-mesh which is subdivided
into a structured micro-mesh. The ILU(0) is used for degrees-of-freedom assigned to the interior
of macro-tetrahedra. This ILU(0)-Smoother can be used for the efficient matrix-free evaluation of
the Steklov–Poincar´e operator from domain-decomposition methods. After introducing and formally
defining our smoother, we investigate its performance on refined macro-tetrahedra. Secondly, the
ILU(0)-Smoother on the macro-tetrahedrons is implemented via surrogate matrix polynomials in
conjunction with a fast on-the-fly evaluation scheme resulting in an efficient matrix-free algorithm.
The polynomial coefficients are obtained by solving a least-squares problem on a small part of the
factorized ILU(0) matrices to stay memory efficient. The convergence rates of this smoother with
respect to the polynomial order are thoroughly studied.
Key words. ILU-Smoother, multigrid, hybrid grids, polynomial surrogates, matrix-free
AMS subject classifications. 65F55, 65N55
1. Introduction.
The incomplete LU(0)-factorization [34] (ILU) approximates
an LU factorization by retaining the sparsity pattern of the original matrix. For strongly
anisotropic problems in 2D, it is often used as a smoother within multigrid algorithms
since its convergence rates are more stable than the ones of simpler smoothers like
the Gauss–Seidel- or Jacobi-Smoothers [47, Sec. 7.8]. This property carries on to
anisotropic 3D problems in which the coupling in one spatial direction is dominant while
other schemes have to be used if two of the spatial directions are dominant [26]. The
related thresholded ILU-Smoother was recently used for p-multigrid in isogeometric
analysis [41,42] or as a smoother for the wave equation [45].
Besides its usage as a smoother, incomplete factorizations like the ILU are used
as preconditioners [5,40], for instance in problems involving the incompressible Stokes
equation [24] or in electromagnetic scattering [31]. An algorithm for a communication
avoiding ILU(0) preconditioner in the high-performance context was introduced in [22].
Algorithms for the efficient parallel assembly of thresholded ILU preconditioners can
be found in [3] including adaptions to GPUs in [4,30].
Matrix-free methods are becoming increasingly prevalent within finite-element
frameworks [27,29,46]. For instance, large scale mantle-convection simulations typically
operate on scales on which storing the discretization matrices is not always feasible [6].
On the other hand, reducing the memory traffic by not requiring to load a matrix from
memory has the potential to result in faster algorithms on today’s hardware. This
Corresponding author.
Funding:
This work was partly supported by the German Research Foundation through grant
WO671/11-1.
Lehrstuhl f¨ur Numerische Mathematik, Fakult¨at f¨ur Mathematik (M2), Technische Universit¨at
unchen, Garching bei M¨unchen (drzisga@ma.tum.de,wagneran@ma.tum.de,wohlmuth@ma.tum.de)
1
arXiv:2210.15280v1 [math.NA] 27 Oct 2022
2DANIEL DRZISGA, ANDREAS WAGNER, BARBARA WOHLMUTH
generates interest in adapting old matrix-based algorithms to the matrix-free context.
For non-local factorization algorithms like the ILU, this poses a tremendous challenge
as the matrix entries cannot be locally computed on-the-fly.
For structured grids, several techniques exist to approximate matrices for an
efficient evaluation. For instance, stencil-scaling techniques that work for both scalar [7]
and vectorial [19] equations. However, since the ILU-approach relies on a matrix
factorization which cannot be computed locally, these approaches are inapplicable.
In our work, we propose a matrix-free ILU realization on structured subgrids based
on surrogates. Here, the discrete matrix, which usually approximates a continuous
operator is additionally approximated by surrogate polynomials [8
10,17,18]. These
techniques can also be adapted for hybrid structured grids which are extensively used
in [11
13,27] and consist of a coarse unstructured macro-grid which is subdivided
into a fine structured micro-grid. The former gives the approach enough flexibility to
represent relevant domains while the latter provides the computational advantages of
structured grids.
In this work, we apply the surrogate methodology to our factorized ILU matrix.
In the interior of the highly structured grids, we utilize an ILU factorization and
approximate the resulting matrix by surrogate polynomials. This approximation is
formed in a memory-efficient way such that the memory costs stay within sensible
bounds. We therefore obtain an efficient solver in the interior of our structured grid.
To illustrate the potential of our approach, we provide two examples of how the
matrix-free ILU can be used on hybrid grids: Our main application is the approximation
of the Steklov–Poincar´e operator for the Laplacian in a matrix-free way. This operator
is a main ingredient of many non-overlapping domain-decomposition methods and
therefore efficient algorithms for its evaluation are highly relevant, see [15,28,32,38,43]
and references therein. It formally requires the exact inversion of an elliptic equation
inside a subdomain for which a multigrid method can be efficiently applied. By using
the ILU-factorization as a smoother within this inner multigrid, the inversion becomes
robust with respect to distortions along one axis. In the supplementary material a
second application is provided in which we extend the subgrid ILU-Smoother to a
smoother on the global grid.
The article is structured as follows: In Section 2, we describe the problem, introduce
the notation and present the Steklov–Poincar´e operator. In Section 3, we introduce an
ILU formulation that is amenable to a matrix-free algorithm. Next, we introduce a
reordering strategy on our hybrid mesh, to optimize its performance as a smoother
inside single subdomains. Finally, we introduce the matrix-free surrogate ILU in
Section 4and compare its asymptotic convergence rates within a multigrid algorithm
to the matrix-based ILU. We conclude with a short outlook and summary in Section 5.
2. Hybrid grids.
In this section, we describe our model problem in the context
of a low-order conforming finite element discretization on hybrid grids. Hybrid grids
combine the flexibility of unstructured grids with the computational advantages of
structured grids [11
13,33]. In addition, they provide a natural domain partitioning
that can be used to distribute the work to different nodes.
2.1. Preliminaries and notation.
In the weak form of a Poisson-type equation,
div
(
Ku
) =
f
, on an open domain Ω
R3
with homogeneous Dirichlet boundary
conditions on Γ
D
Ω, natural boundary conditions on
\
Γ
D
and an inhomogeneous,
bounded, symmetric, uniformly positive-definite diffusion tensor
K
(
x
) : Ω
R3×3
, we
A MATRIX-FREE ILU REALIZATION BASED ON SURROGATES 3
obtain the bilinear form
a(u, v) = Z
u(x)>K(x)v(x) dx, u, v V=uH1(Ω) : u|ΓD= 0.
This includes the special case of a bounded, uniformly-positive scalar material pa-
rameter
κ
: Ω
R
by setting
K
=
κId3
, where
Id3R3×3
is the identity matrix.
One application of the full diffusion tensor, would be the pull-back of a blending
function which maps a simple tetrahedral domain to a more complex domain, thereby
providing a better approximation of the domain boundary. Given a load
fL2
(Ω)
which defines the linear form
F
(
v
) =
Rf v
d
x
, we obtain the standard variational
problem: Find uVsatisfying a(u, v) = F(v) for vV.
The typical approach in HHG [11
13] and HyTeG [27] is to discretize the domain Ω
with a coarse, possibly unstructured, simplicial triangulation. This so-called macro-
mesh consists of macro-vertices
VH
, macro-edges
EH
, macro-faces
FH
and macro-
tetrahedra
TH
. All macro-primitives are referred to as
PH
=
VH∪ EH∪ FH∪ TH
.
Based on this initial grid, we construct a hierarchy of
LN
, grids
T
=
{Thl, hl
=
2
lH, l
= 2
,...L
+ 1
}
by successive global uniform refinement. The choice to start
in the multigrid hierarchy with
l
= 2 guarantees that each macro-element contains
at least one interior element, which simplifies the notation in our algorithms. As it
is standard, each of these refinements is achieved by subdividing all elements in 3D
into 8 sub-elements. For details of the refinement in 3D, we refer to [14]. Due to
this refinement process, the element neighborhood at each vertex in the interior of a
macro element is always the same. The whole process of the hybrid grid mesh setup is
schematically depicted in Figure 1.
Associated with
Thl
, is the space
VhlV
of piecewise linear conforming finite
elements
Vhl={vV:v|t∈ P1(t) for each t∈ Thl}.
}
Fig. 1: Hybrid-grid refinement procedure in 3D for a clipped tetrahedron in a cubic
macro-mesh. The DoF belonging to the index sets
Iv
,
Ie
,
If
,
It
and
It
are illustrated
by different colors and shapes.
Let
φiVhl
and
φjVhl
be the scalar-valued linear nodal basis functions
associated with the
i
-th and
j
-th mesh node. The set containing all our degrees-of-
freedom (DoF) indices is referred to by
Ihl
. If the multigrid level is obvious from the
context, we will try to suppress the level dependence
hl
for a more compact notation.
By
u
=
Piuiφi
and
v
=
Piviφi
we denote linear combinations of the nodal basis
function with coefficients
ui, viR
. Defining the matrix
Aij
=
a
(
φi, φj
) and vector
4DANIEL DRZISGA, ANDREAS WAGNER, BARBARA WOHLMUTH
fi
=
F
(
φi
) results in the linear algebraic formulation of the discrete variational problem
associated with the weak formulation: Find uR|I| satisfying Au=f.
Hybrid meshes impose a domain-partitioning, which is also used for assigning the
DoF in an HPC environment to computing nodes. This approach avoids communication
between the DoF located inside the same macro-primitive, while for DoF on different
macro-primitives communication is necessary. This has to be considered for an efficient
evaluation of our operators since operations acting locally on the same primitive type
do not require inter-node communication. To define these local operations, we have
to introduce notation to localize our vectors and matrices: For arbitrary index sets
I⊆ I
we define restriction operators
RI
:
R|I| R|I|
consisting of zeros and ones,
which discard vector entries whose component is not present in the index set and just
retain entries in
I
. We also assume that the restriction operator retains the global
DoF ordering. Given a macro-primitive
p∈ PH
, we denote the set of all DoF which
are located on the primitive by
Ip⊆ I
and its restriction operator by
Rp
=
RIp
. For
an arbitrary macro-tetrahedron
t∈ TH
, which is adjacent to macro-vertices
vi∈ VH
,
1
i
4, macro-edges
ej∈ EH
, 1
j
6 and macro-faces
fk∈ FH
, 1
k
4 we
define the index-set of its ghost-layer as
It
= (
4
i=1Ivi
)
(
6
j=1Iej
)
(
4
k=1Ifk
). All
these sets are illustrated in Figure 1.
Our surrogate ILU algorithm heavily relies on geometric properties associated
with our DoF: Each micro-vertex in a macro-tetrahedron on level
L
can be labeled by
the logical grid coordinates
GL
t
=
{
(
x, y, z
)
Z3
: 0
x, y, z and x
+
y
+
z <
2
L
+ 1
}
.
Similarly, we define the inner grid coordinates by
˚
GL
t
=
{
(
x, y, z
)
Z3
: 1
x, y, z and x
+
y
+
z <
2
L}
. If we restrict the coordinates by setting z to a fixed value, we
obtain a face-layer
GN
f
=
{
(
x, y
)
Z2
: 0
x, y and x
+
y < N}.
For a vector
u|ItIt
on level lrestricted to a tetrahedron t, there is a one-to-one correspondence between
DoF-indices in
It
and inner logical grid coordinates
˚
GL
t
which can be constructed as
follows: Assume that
t
is adjacent to the macro-vertices
vi
at coordinates
epiR3
for
1
i
4. The tetrahedron is spanned by the edges
di
=
epi+1 ep1
at the base point
ep1
for 1
i
3 (see Figure 2left). The point
ep(x,y,z)
= (
d1·x
+
d2·y
+
d3·z
)
/
(2
L
+ 1)
for (
x, y, z
)
GL
t
belongs to a shape function
φkVhl
with
k∈ It∪ It
such that
φk
(
ep(x,y,z)
) = 1. This induces the mapping
ιt
:
It∪ It GL
t
with
ιt
(
k
) = (
x, y, z
).
Thus, vector components
ui
of a vector
u|It∪It
with
ι∈ It∪ It
will also be referred
to by
u(x,y,z)
or
up
for
p
= (
x, y, z
)
GL
t
, when the macro-tetrahedron
t
is evident
from the context.
The mapping between logical grid coordinates and local DoF in
It
also allows us to
specify an ordering of the DoF indices. This is crucial since the properties of the Gauss–
Seidel-Smoother (GS-Smoother) or the ILU-Smoother strongly depend on this order.
For
i, j ∈ It
with logical grid coordinates (
xi, yi, zi
) =
ιt
(
i
) and (
xj, yj, zj
) =
ιt
(
j
)
,
we
fix the ordering by
i<j =(zi< zj)(zi=zjyi< yj)(zi=zjyi=yjxi< xj).
Consequently, the ordering strongly depends on the order of the adjacent macro-vertices
vi
which we used to construct
ιt
. In Section 3, we will use this by permutating
vi
with
a permutation πto obtain good smoothing factors µtfor the new DoF ordering.
We will now introduce the applied stencil notation for our surrogate-ILU-Algorithm
in Section 4. For this, we first define the stencil directions between logical coordinates
as displacement vectors, i.e.
{xy|x, y GL
t}
. The most common directions are
named after the four cardinal directions, as well as the top and bottom directions,
such that the x-axis runs from west to east, the y-axis from south to north, and the
A MATRIX-FREE ILU REALIZATION BASED ON SURROGATES 5
we
nw
se
s
n
bnw
tse
tc
bc
bn
be
tw
ts
(x,y,z)
(x-1, y, z) (x+1, y, z)
(x,y,z+1)
(x,y,z-1) (x+1,y,z-1)
(x+1,y,z-1)
(x-1,y+1,z) (x,y+1,z)
(x+1,y-1,z+1)
(x-1,y,z+1)
(x+1,y-1,z)
(x,y-1,z)
Fig. 2: Left: Direction vectors in a one-to-one correspondence between DoF and coor-
dinates. Right: Stencil directions and grid coordinates inside a structured tetrahedral
grid.
z-axis from top to bottom. For instance, the west direction
w
corresponds to the
displacement (1,0,0). All stencil directions are collected in the set
D={w, s, se, bnw, bn, bc, be, c, e, n, nw, tse, ts, tc, tw}.
Relying on the ordering defined above, the set of all lower stencil directions needed in
our ILU is given by
Dl={w, s, se, bnw, bn, bc, be}.
Consider two indices
i∈ It
and
j∈ It∪ It
with coordinates
pi
=
ιt
(
i
) and
pj
=
ιt
(
j
).
Due to the local support of the low order conforming finite-element shape functions,
we know that if
Aij 6
= 0 there exists a
e
d∈ D
such that
pj
=
pi
+
e
d
. We can define the
stencil (
Api
d
)
d∈D
by
Api
e
d
=
Aij
. The matrix-vector multiplication
v|It
= (
Au
)
|It
on
the macro-tetrahedron tcan therefore be written in terms of stencils as
vp=X
d∈D
Ap
dup+dfor all p˚
GL
t,
where we identified the DoFs with logical coordinates. Stencils and the associated grid
coordinates are depicted in Figure 2(right).
We mainly rely on a geometric multigrid algorithm which combines a so called
smoother with a coarse grid correction step to an optimal solver (see e.g. [23]). We now
introduce smoothers which only act on the DoF of a single primitive. This is motivated
by our hybrid mesh on which only operations between DoF located on the same
primitive are cheap while everything else requires expensive inter-node communication.
In our setting, for a given primitive
p∈ PH
, a smoother acting on the DoF
Ip
located
on the primitive can be described by applying a preconditioner matrix
CpR|Ip|×|Ip|
inside a Richardson iteration with the appropriate restriction operators
uu+RT
pC1
pRp(fAu),
where we denote the current right-hand-side vector by
f
and the current estimate
摘要:

AMATRIX-FREEILUREALIZATIONBASEDONSURROGATESDANIELDRZISGAy,ANDREASWAGNERy,ANDBARBARAWOHLMUTHyAbstract.Matrix-freetechniquesplayanincreasinglyimportantroleinlarge-scalesimulations.Schurcomplementtechniquesandmassivelyparallelmultigridsolversforsecond-orderellipticpartialdi erentialequationscansigni c...

展开>> 收起<<
A matrix-free ILU realization based on surrogates.pdf

共27页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:27 页 大小:4.97MB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 27
客服
关注