A portable coding strategy to exploit vectorization on combustion simulations

2025-04-30 0 0 1.75MB 40 页 10玖币
侵权投诉
A portable coding strategy to exploit vectorization on
combustion simulations
Fabio Banchellia, Guillermo Oyarzuna,
, Marta Garcia-Gasullaa, Filippo
Mantovania, Ambrus Botha, Guillaume Houzeauxa, Daniel Miraa
aBarcelona Supercomputing Center, Plaza Eusebi Guell, 1-3, 08034 Barcelona (Spain)
Abstract
The complexity of combustion simulations demands the latest high-performance
computing tools to accelerate its time-to-solution results. A current trend on
HPC systems is the utilization of CPUs with SIMD or vector extensions to ex-
ploit data parallelism. Our work proposes a strategy to improve the automatic
vectorization of finite-element-based scientific codes. The approach applies a
parametric configuration to the data structures to help the compiler detect the
block of codes that can take advantage of vector computation while maintain-
ing the code portable. A detailed analysis of the computational impact of this
methodology on the different stages of a CFD solver is studied on the PREC-
CINSTA burner simulation. Our parametric implementation has proven to help
the compiler generate more vector instructions in the assembly operation: this
results in a reduction of up to 9.39×of the total executed instruction maintain-
ing constant the Instructions Per Cycle and the CPU frequency. The proposed
strategy improves the performance of the CFD case under study up to 4.67×
on the MareNostrum 4 supercomputer.
Keywords: vectorization, high performance computing, combustion
simulations, performance analysis
Corresponding author
Email address: guillermo.oyarzun@bsc.es (Guillermo Oyarzun)
Preprint submitted to Computers & Fluids October 24, 2022
arXiv:2210.11917v1 [cs.DC] 21 Oct 2022
1. Introduction and related work
The decarbonization of the transportation sector is one of the fields with
high strategic importance for our society [1,2]. Implementing new greener fu-
els in real combustion systems demands advanced combustion simulations, as
their physical and chemical properties are expected to be significantly different
from those of conventional transportation fuels [3]. In such complex simula-
tions, the investigation of more accurate and efficient numerical algorithms is of
key importance to increase the accuracy and reduce the time-to-solution. The
difficulty relies on the constant evolution of the High-Performance Computing
(HPC) systems. Consequently, scientific software requires periodic updates to
exploit the new features and run efficiently on those systems.
On modern CPUs, the use of vector or Single Instruction Multiple Data
(SIMD) extensions is becoming more and more relevant. Beside the AVX-512
SIMD extension by Intel, we detect appearing on the market the first CPU im-
plementing the Arm SVE extension (Fujitsu A64FX, ranked first in the Top500)
and the NEC SX-Aurora vector engine, a discrete accelerator leveraging vector
CPUs able to operate with registers of up to 256 double precision elements. On
top of this market movements, we can not ignore the RISC-V architecture which
recently ratified v1.0 of the V-extension, boosting vector computation from the
academic world and the open-source community.
The efficient use of vector units within CPUs relies on auto-vectorization by
the compiler and often requires to adapt or rewrite classical algorithms to exploit
their full computing power [4]. Large-scale CFD codes are generally dominated
by two operations: the linear solver and the matrix assembly. The first can be
considered a black-box component that receives a matrix and a right-hand-side
as an input and returns a solution vector [5]. The solver is composed of al-
gebraic operations that can exploit vectorization by using specific libraries [6].
This strategy allows to port a part of large scientific codes to vector accelerators
in a relatively smooth way [7]. Regarding the matrix assembly, the algorithm
for unstructured meshes depends on the discretization method, where finite vol-
2
ume (FV) or finite elements (FE) are the most common strategies. Obtaining
gains from vectorization in FV assembly requires introducing changes that have
proved not practical on large-scale combustion codes [8,9]. On the contrary, the
FE assembly is constituted by matrix-like structures with the potential applica-
tion of SIMD-friendly functions [10]. Our work is implemented on Alya [11], a
large-scale computational mechanics code (FE-based) that is one of the thirteen
Unified European Applications Benchmark Suite codes. We propose and ana-
lyze a parametric configuration to its data structure, allowing the compiler to
enable auto-vectorization. We evaluate the proposed implementation on a state-
of-the-art supercomputer, MareNostrum 4, powered by Intel Skylake CPUs. We
show that Alya takes advantage of AVX-512 SIMD units present in the Skylake
CPUs while keeping the code portable. The strategy is extensible to any other
FE-based code.
The main contributions of this paper are: i) we propose a parametric config-
uration of the data structure for a complex fluid-dynamic code; ii) we measure
and explain the impact of the proposed configuration from a computational
point of view; iii) we quantify the overall performance gain on a state-of-the-art
HPC supercomputer.
The remaining part of the paper is structured as follows: Section 2sum-
marizes the computational fluid-dynamics problem solved with Alya; Section 3
briefly presents the technological context of the study performed in this paper,
including details of the hardware and software configurations. Section 4ana-
lyzes the optimizations applied to Alya in terms of execution time, instruction
mix and cache effects to quantify the overall performance gain. Section 5closes
the paper with general remarks and conclusions.
2. Application context
2.1. Governing equations
The governing equations describing the reacting flow field in the turbu-
lent premixed flame correspond to the low-Mach number approximation of the
3
Navier-Stokes equations with the energy equation represented by the total en-
thalpy. The combustion process is assumed to take place in the flamelet regime
and the flamelet database is based on the tabulation of a laminar premixed
flamelet at constant equivalence ratio that uses the chemistry from the San
Diego mechanism [12]. A Favre-filtered description of the governing equations
is followed to avoid modelling of terms including density fluctuations [13]. The
governing equations are given by:
ρ
t +∇ · (ρe
u) = 0 (1)
(ρe
u)
t +∇ · (ρe
ue
u) = −∇p+∇ · ρ(ν+νt)2S2
3(e
u)I (2)
ρe
h
t +∇ · ρe
ue
h=∇ · ρD+νt
P rte
h(3)
where ρ,t,eu,p,ν,e
hand Drepresent the density, time, velocity vector, pressure,
kinematic viscosity, total enthalpy and thermal diffusion coefficient respectively.
Heating due to viscous forces is neglected in the enthalpy equation and the
unresolved heat flux is modelled using a gradient diffusion approach [14]. The
formulation is closed by an appropriate expression for the subgrid-scale or eddy-
viscosity νtthat in this study is defined by the closured proposed by Vreman
[15] with a model constant of cs= 0.1. The viscous stress tensor is defined
based on Stokes’ assumption and the turbulence contribution is determined by
the use of the Boussinesq approximation [13], in which S=1
2he
u+ (e
u)Ti
and Iare the strain and the identity tensor respectively. A unity Lewis number
assumption has been made to simplify the multicomponent transport in the
governing equations. Turbulent Schmidt and Prandtl numbers are both set
constant with value of 0.7.
For the present combustion model, a controlling variable based on a reactive
scalar is used to couple the chemical states with the fluid flow. This controlling
variable can be understood as a progress variable Ycthat is used to describe
the thermochemical state from an unreacted mixture to a fully reacted mixture.
4
For numerical reasons [16], a scaled progress variable cis defined as:
c=YcY0
c
Yeq
cY0
c
(4)
where Y0
cand Yeq
care the values of the progress variable of the unreacted
mixture and at equilibrium conditions respectively. Considering the application
of this flamelet combustion model to premixed combustion in LES, the subscale
effects need to be addressed. The tabulated properties ψare integrated with
a presumed-shape probability density function (PDF) that is constructed from
the filtered progress variable ecand the subgrid variance f
c002=ecc ececusing
aβ-function [16]. A closure for the subgrid scale variance is provided by the
solution of the transport of f
c002following Domingo et al. (2005) [16].
The chemical state of the perfectly premixed flame in the LES framework is
ultimately described by the two controlling variables: ecand f
c002, so the governing
equations describing the chemical evolution of the flame are given by:
(ρec)
t +∇ · (ρe
uec) = ∇ · ρD+νt
Sctec+ ˙ωc(5)
ρf
c002
t +∇ · ρe
uf
c002=∇ · ρe
D+νt
Sctf
c002(6)
+ 2ρe
D|∇ec|2(7)
+ 2 c˙ωcec˙ωc(8)
ρeχc(9)
where eχcrepresents the scalar dissipation rate and ˙ωcis the filtered source term
of the progress variable. The scalar dissipation rate is composed by the resolved
and unresolved parts, which are given by:
eχc= 2 e
D|∇ec|2+χsgs
c= 2 e
D|∇ec|2+Cd
τtf
c002
where τtis a turbulent time scale, which is obtained following Ventosa et al. [17].
5
摘要:

AportablecodingstrategytoexploitvectorizationoncombustionsimulationsFabioBanchellia,GuillermoOyarzuna,,MartaGarcia-Gasullaa,FilippoMantovania,AmbrusBotha,GuillaumeHouzeauxa,DanielMiraaaBarcelonaSupercomputingCenter,PlazaEusebiGuell,1-3,08034Barcelona(Spain)AbstractThecomplexityofcombustionsimulatio...

展开>> 收起<<
A portable coding strategy to exploit vectorization on combustion simulations.pdf

共40页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:40 页 大小:1.75MB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 40
客服
关注