
2M. PHILLIPS AND P. FISCHER
We point out that the need to solve a sequence of problems differs from solving
a single problem in several significant ways. First, solver set-up costs are typically
amortized over thousands of right-hand sides and are therefore largely irrelevant to our
cost concerns. Second, the solution is typically devoid of significant low wave-number
content because we solve only for a perturbed solution, δum:= um−¯u, where ¯uis an
initial guess. If we take ¯u=um−1then the initial residual r0=b−Aum−1=O(∆t).
This result is improved to O(∆tl) by projecting umonto the space of prior solutions,
{um−1. . . um−l}[8,16]. Finally, with an initially small residual, GMRES is likely to
converge in just a few iterations, which obviates the need for restarts and mitigates
the O(k2) complexity terms in a k-iteration GMRES solve. This latter observation
puts less pressure on requiring a symmetric preconditioner since one can retain the
full benefits of using Krylov subspace projection (KSP) without resorting to conjugate
gradient iteration. With these circumstances in mind, we will drop the superscript m
in the sequel.
We note that Chebyshev smoothers have gained a lot of attention recently. Kro-
nbichler and co-workers [20,12,11] have employed Chebyshev smoothing for discon-
tinuous Galerkin discretizations of the NS equations. Rudi and coworkers employ
algebraic multigrid (AMG) with Chebyshev smoothing [39]. Similarly, Chebyshev
smoothing is considered by Sundar and coworkers as a multigrid smoother for high-
order continuous finite element discretizations [43].
A major difference here is that we consider Chebyshev in conjunction with addi-
tive Schwarz methods (ASM) [45,13,24,35] and restrictive additive Schwarz (RAS)
[5] in place of point-Jacobi smoothing. The principal idea is to use ASM or RAS to
eliminate high wave number content. In the case of the spectral element method, local
Schwarz solves can be effected at a cost that is comparable to forward operator eval-
uation through the use of fast diagonalization [25,15,24]. Another critical aspect of
the current context is that many of our applications are targeting exascale platforms
and beyond, where compute is performed on tens of thousands of GPUs for which the
relative cost of global communication and hence, coarse-grid solves, is high [31]. In
such cases, it often pays to have high-quality and broad bandwidth smoothing, such
as provided by Chebyshev, in order to reduce the number of visits to the bottom of
the V-cycle where the expensive coarse-grid solve is invoked.
Here, we explore a seemingly simple question: Given 2ksmoothing iterations,
what is the optimal choice of mpre-smoothing and npost-smoothing applications,
where m+n= 2k?More specifically, in the Chebyshev context, the question is
what order mpre-smoothing and order npost-smoothing should be used at the same
cost per iteration. An additional and important point to this question is, What kind
of Chebyshev smoothing should be used? One could use standard 1st-kind Chebyshev
polynomials with tuned parameters. (Recall, we can afford significant tuning over-
head.) Or, one could use standard or optimized 4th-kind Chebyshev polynomials that
were proposed in recent work by Lottes. (See [23] and references therein.) We explore
these questions under several different conditions: using finite differences and spectral
elements discretizations and using Jacobi, ASM, or RAS as the basic smoother.
The structure of this paper is as follows. Section 2 outlines the multigrid V-cycle
and Chebyshev smoothers. 2D finite difference Poisson results on varying aspect ra-
tio grids, along with a comparison between theoretical and observed multigrid error
contraction rates, are presented in section 3. Spectral element (SE)-based pressure
Poisson preconditioning schemes implemented in the scalable open-source CFD code,
nekRS [14], are presented in section 4. nekRS started as a fork of libParanumal [7]
and uses highly optimized kernels based on the Open Concurrent Compute Abstrac-