MAgNet Mesh Agnostic Neural PDE Solver Oussama Boussif Mila - Québec AI Institute

2025-04-24 0 0 4.21MB 23 页 10玖币
侵权投诉
MAgNet: Mesh Agnostic Neural PDE Solver
Oussama Boussif
Mila - Québec AI Institute
DIRO, Université de Montréal
oussama.boussif@mila.quebec
Dan Assouline
Mila - Québec AI Institute
DIRO, Université de Montréal
dan.assouline@mila.quebec
Loubna Benabbou
Université du Québec à Rimouski
Loubna_Benabbou@uqar.ca
Yoshua Bengio
Mila - Québec AI Institute
DIRO, Université de Montréal
yoshua.bengio@mila.quebec
Abstract
The computational complexity of classical numerical methods for solving Partial
Differential Equations (PDE) scales significantly as the resolution increases. As
an important example, climate predictions require fine spatio-temporal resolutions
to resolve all turbulent scales in the fluid simulations. This makes the task of
accurately resolving these scales computationally out of reach even with modern
supercomputers. As a result, current numerical modelers solve PDEs on grids
that are too coarse (3km to 200km on each side), which hinders the accuracy and
usefulness of the predictions. In this paper, we leverage the recent advances in
Implicit Neural Representations (INR) to design a novel architecture that predicts
the spatially continuous solution of a PDE given a spatial position query. By
augmenting coordinate-based architectures with Graph Neural Networks (GNN),
we enable zero-shot generalization to new non-uniform meshes and long-term
predictions up to 250 frames ahead that are physically consistent. Our Mesh
Agnostic Neural PDE Solver (MAgNet) is able to make accurate predictions across
a variety of PDE simulation datasets and compares favorably with existing baselines.
Moreover, MAgNet generalizes well to different meshes and resolutions up to four
times those trained on2.
1 Introduction
Partial Differential Equations (PDEs) describe the continuous evolution of multiple variables, e.g.
over time and/or space. They arise everywhere in physics, from quantum mechanics to heat transfer
and have several engineering applications in fluid and solid mechanics. However, most PDEs can’t
be solved analytically, so it is necessary to resort to numerical methods. Since the introduction of
computers, many numerical approximations were implemented, and new fields emerged such as
Computational Fluid Mechanics (CFD) (Richardson and Lynch, 2007). The most famous numerical
approximation scheme is the Finite Element Method (FEM) (Courant, 1943; Hrennikoff, 1941). In the
FEM, the PDE is discretized along with its domain, and the problem is transformed into solving a set
of matrix equations. However, the computational complexity scales significantly with the resolution.
For climate predictions, this number can be quite significant if the desired error is to be reached,
which renders its use impractical.
CIFAR Senior Fellow
2Code and dataset can be found on: https://github.com/jaggbow/magnet
36th Conference on Neural Information Processing Systems (NeurIPS 2022).
arXiv:2210.05495v1 [cs.LG] 11 Oct 2022
In this paper, we propose to learn the continuous solutions for spatio-temporal PDEs. Previous
methods focused on either generating fixed resolution predictions or generating arbitrary resolution
solutions on a fixed grid (Li et al., 2021; Wang et al., 2020). PDE models based on Multi-Layer
Perceptrons (MLPs) can generate solutions at any point of the domain (Dissanayake and Phan-Thien,
1994; Lagaris et al., 1998; Raissi et al., 2017a). However, without imposing a physics-motivated loss
that constrains the predictions to follow the smoothness bias resulting from the PDE, MLPs become
less competitive than CNN-based approaches especially when the PDE solutions have high-frequency
information (Rahaman et al., 2018).
We leverage the recent advances in Implicit Neural Representations ((Tancik et al., 2020), (Chen et al.,
2020), (Jiang et al., 2020)) and propose a general purpose model that can not only learn solutions to a
PDE with a resolution it was trained on, but it can also perform zero-shot super-resolution on irregular
meshes. The added advantage is that we propose a general framework where we can make predictions
given any spatial position query for both grid-based architectures like CNNs and graph-based ones
able to handle sensors and predictions at arbitrary spatial positions.
Contributions
Our main contributions are in the context of machine learning for approximately
but efficiently solving PDEs and can be summarized as follows:
We propose a framework that enables grid-based and graph-based architectures to generate
continuous-space PDE solutions given a spatial query at any position.
We show experimentally that this approach can generalize to resolutions up to four times
those seen during training in zero-shot super-resolution tasks.
2 Related Works
Current solvers can require a lot of computations to generate solutions on a fine spatio-temporal
grid. For example, climate predictions typically use General Circulation Models (GCM) to make
forecasts that span several decades over the whole planet (Phillips, 1956). These GCMs use PDEs to
model the climate in the atmosphere-ocean-land system and to solve these PDEs, classical numerical
solvers are used. However, the quality of predictions is bottlenecked by the grid resolution that is in
turn constrained by the available amount of computing power. Deep learning has recently emerged
as an alternative to these classical solvers in hopes of generating data-driven predictions faster and
making approximations that do not just rely on lower resolution grids but also on the statistical
regularities that underlie the family of PDEs being considered. Using deep learning also makes it
possible to combine the information in actual sensor data with the physical assumptions embedded in
the classical PDEs. All of this would enable practitioners to increase the actual resolution further for
the same computational budget, which in turn improves the quality of the predictions.
Machine Learning for PDE solving:
Dissanayake and Phan-Thien (1994) published one of the
first papers on PDE solving using neural networks. They parameterized the solutions to the Poisson
and heat transfer equations using an MLP and studied the evolution of the error with the mesh size.
Lagaris et al. (1998) used MLPs for solving PDEs and ordinary differential equations. They wrote
the solution as a sum of two components where the first term satisfies boundary conditions and is
not learnable, and the second is parameterized with an MLP and trained to satisfy the equations. In
Raissi et al. (2017a) the authors also parameterized the solution to a PDE using an MLP that takes
coordinates as input. With the help of automatic differentiation, they calculate the PDE residual and
use its MSE loss along with an MSE loss on the boundary conditions. In follow-up work, Raissi et al.
(2017b) also learn the parameters of the PDE (e.g. Reynolds number for Navier-Stokes equations).
The recently introduced Neural Operators framework (Kovachki et al., 2021; Li et al., 2020b,a)
attempts to learn operators between spaces of functions. Li et al. (2021) use "Fourier Layers" to
learn the solution to a PDE by framing the problem as learning an operator from the space of initial
conditions to the space of the PDE solutions. Their model can learn the solution to PDEs that lie on a
uniform grid while maintaining their performance in the zero-shot super-resolution setting. In the
same spirit, Jiang et al. (2020) developed a model based on Implicit Neural Representations called
"MeshFreeFlowNet" where they upsample existing PDE solutions to a higher resolution. They use
3D low-resolution space-time tensors as inputs to a 3DUnet in order to generate a feature map. Next,
some points are sampled uniformly from the corresponding high-resolution tensors and fed to an
MLP called ImNet (Chen and Zhang, 2018). They train their model using a PDE residual loss and
2
Forecasting (GNN)
Nearest Neighbors
Interpolation
Encoder
DecoderMLP
Parent Mesh Parent Mesh Embedding
Spatial Queries
Figure 1: We illustrate the "Encode-Interpolate-Forecast" framework of MAgNet. The
parent mesh
is fed to the encoder to generate the
parent mesh embedding
. Next, we estimate the values at
the
spatial queries
using the interpolation module that uses features from both the
parent mesh
points and the
parent mesh embedding
points closest to these queries. Finally, the
parent mesh
observations and interpolated values at
spatial queries
are gathered as nodes forming a
new graph
using nearest neighbors and the PDE solution is forecast for all nodes (therefore all spatial locations)
into the future using the forecasting module.
are able to predict the flow field at any spatio-temporal coordinate. Their approach is closest to the
one we propose here. The main difference is that we perform super-resolution on the spatial queries
and forecast the solution to a PDE instead of only doing super-resolution on the existing sequence.
Brandstetter et al. (2022) use the message-passing paradigm ((Gilmer et al., 2017), (Watters et al.,
2017), (Sanchez-Gonzalez et al., 2020)) to solve 1D PDEs. They are able to beat state-of-the-art
Fourier Neural Operators (Li et al., 2021) and classical WENO5 solvers while introducing the
"pushforward trick" that allows them to generate better long-term rollouts. Moreover, they present an
added advantage over existing methods since they can learn PDE solutions at any mesh. However,
they are not able to generalize to different resolutions, which is a crucial capability of our method.
Most machine learning approaches require data from a simulator in order to learn the required PDE
solutions and that can be expensive depending on the PDE and the resolution. Wandel et al. (2020)
alleviate that requirement by using a PDE loss.
Machine Learning for Turbulence Modeling:
Recent years have known a surge in machine
learning-based models for modeling turbulence. Since it is expensive to resolve all relevant scales,
some methods were developed that only solve large scales explicitly and separately model sub-grid
scales (SGS). Recently, Novati et al. (2021) used multi-agent reinforcement learning to learn the
dissipation coefficient of the Smagorinsky SGS model (Smagorinsky, 1963) using as reward the
recovery of the statistical properties of Direct Numerical Simulations (DNS). Rasp et al. (2018) used
MLPs to represent sub-grid processes in clouds and replace previous parametrization models in a
global general circulation model. In the same fashion, Park and Choi (2021) used MLPs to learn
DNS sub-grid scale (SGS) stresses using as input filtered flow variables in a turbulent channel flow.
Brenowitz and Bretherton (2018) use MLPs to predicts the apparent sources of heat and moisture
using coarse-grained data and use a multi-step loss to optimize their model.Wang et al. (2020) used
one-layer CNNs to learn the spatial filter in LES methods and the temporal filter in RANS as well
as the turbulent terms. A UNet (Ronneberger et al., 2015) is then used as a decoder to get the flow
velocity. de Bezenac et al. (2017) predict future frames by deforming the input sequence according to
the advection-diffusion equation and apply it to Sea-Surface Temperature forecasting.
Stachenfeld et al. (2021) use the "encode-process-decode" (Sanchez-Gonzalez et al., 2018, 2020)
paradigm along with dilated convolutional networks to capture turbulent dynamics seen in high-
resolution solutions only by training on low spatial and temporal resolutions. Their approach beats
existing neural PDE solvers in addition to the state-of-the-art
Athena++
engine (Stone et al., 2020).
We take inspiration of this approach but replace the process module by an interpolation module, to
allow the model to capture spatial correlations between known points and new query points.
3 Methodology
We present the developed framework that leverages recent advances in Implicit Neural Representations
(INR) (Jiang et al., 2020; Sitzmann et al., 2020; Chen et al., 2020; Tancik et al., 2020) and draws
3
inspiration from mesh-free methods for PDE solving. We first start by giving a mathematical
definition of a PDE. Next, we showcase the proposed "MAgNet" and derive two variants: A grid-
based architecture and a graph-based one.
3.1 Preliminaries
We define PDE as follows, using Dkto denote k-th order derivatives:
Definition 3.1.
Evans (2010) Let
U
denote an open subset of
Rn
and
k1
an integer. An expression
of the form:
L(Dku(x), Dk1u(x),...,u(x), x) = 0xU(1)
is called a
k
-th order system of PDEs, where
L:Rmnk×Rmnk1× · · · × Rmn ×Rm×URm
is given and u:URm,u= (u1, . . . , um)is the unknown function to be characterized.
In this paper, we are interested in spatio-temporal PDEs. In this class of PDEs, the domain is
U= [0,+]× S
(time
×
space) where
S Rn, n 1
and, with
Dk
indicating differentiation
w.r.t x, any such PDE can be formulated as:
u
t =L(Dku(x),...,u(x), x, t)t0,x∈ S.
u(0, x) = g(x)x∈ S
Bu= 0 t0,xS
(2)
Where
S
is the boundary of
S
,
B
is a non-linear operator enforcing boundary conditions on
u
and
g:S Rmrepresents the initial condition constraints for the solution u.
Numerical PDE simulations have enjoyed a great body of innovations especially where their use is
paramount in industrial applications and research. Mesh-based methods like the FEM numerically
compute the PDE solution on a predefined mesh. However, when there are regions in the PDE domain
that present large discontinuities, the mesh needs to be modified and provided with many more points
around that region in order to obtain acceptable approximations. Mesh-based methods typically solve
this problem by re-meshing in what is called Adaptive Mesh Refinement (Berger and Oliger, 1984;
Berger and Colella, 1989). However, this process can be quite expensive, which is why mesh-free
methods have become an attractive option that goes around these limitations.
3.2 MAgNet: Mesh-Agnostic Neural PDE Solver
3.2.1 "Encode-Interpolate-Forecast" framework
Let
{x1, x2, . . . , xT} ∈ RC×N
denote a sequence of
T
frames that represents the ground-truth
data coming from a PDE simulator or real-world observations.
C
denotes the number of physical
channels, that is the number of physical variables involved in the PDE and
N
is the number of points
in the mesh. These frames are defined on the same mesh, that is the mesh does not change in time.
We call that mesh the parent mesh and denote its normalized coordinates of dimensionality
n
by
{pi}1iN[1,1]n
. Let
{ci}1iM[1,1]n
denote a set of
M
coordinates representing the
spatial queries. The task is to predict the solution for subsequent time steps both at: (i) all coordinates
from the parent mesh
{pi}1iN
, and (ii) coordinates from the spatial queries
{ci}1iM
. At test
time, the model can be queried at any spatially continuous coordinate within the PDE domain to
provide an estimate of the PDE solution at those coordinates.
To perform the prediction, we first estimate the PDE solutions at the spatial queries for the first
T
frames and then use that to forecast the PDE solutions at the subsequent timesteps at the query
locations. We do this through three stages (see Figure 1):
1. Encoding
: The encoder takes as input the given PDE solution
{xt}1tT
at each point of
the parent mesh
{pi}1iN
and generates a state-representation of original frames, which
can be referred to as embeddings, and which we note
{zt}1tT
. This representation will
be used in the interpolation step to find the PDE solution at the spatial queries
{ci}1iM
.
Note that in this encoding step, we can generate one embedding for each frame such that we
have
T
embeddings or summarize all the information in the
T
frames into one embedding.
We will explain the methodology using
T
embeddings, as it is easier to grasp the time
dimension in this formulation, but the implementation has been done using a summarized
4
single embedding, as mentioned in section 3.2.2. We also note that the embedded mesh
remains the same, i.e. we don’t change it by upsampling or downsampling it.
2. Interpolation
: We follow the same approach as Jiang et al. (2020) and Chen et al. (2020)
by performing an interpolation in the feature space. Note that in case we generate one
representation that summarizes all
T
frames into one, then
zt=z
for
t= 1, . . . , T
. Let
{tk}1kTdenote the timesteps at which the xtare generated.
For each spatial query
ci
, let
N(ci)
denote the nearest points in the parent mesh
pj
. We
generate an interpolation of the features
zk[ci]
at coordinates
ci
and at timestep
tk
as follows:
k∈ {1, T },i∈ {1, M}:zk[ci] = Ppj∈N (ci)wjgθ(xk[pj], zk[pj], cipj, tk)
Ppj∈N (ci)wj
(3)
Where
zk[pj]
and
xk[pj]
denote the embedding and input frame at position
pj
and time
tk
respectively. Moreover,
wj
are interpolation weights and are positive and sum to one.
Weights are chosen such that points closer to the spatial query have a higher contribution to
the interpolated feature than points farther away from the spatial query. The
gθ
is an MLP.
To get the PDE solution
xk[ci]
at coordinate
ci
, we use a decoder
dθ
which is an MLP here:
xk[ci] = dθ(zk[ci])
. In practice, the number of neighbors that we choose is
2n
where
n
is
the dimensionality of the coordinates.
3. Forecasting
: Now that we generated the PDE solution at the spatial queries
ci
for all the
past frames, we forecast the PDE solution at future time points at both spatial queries and
the parent mesh coordinates. Let
G
denote the Nearest-Neighbors Graph (NNG) that has as
nodes all the
N
locations in the parent mesh (at original coordinates
{pi}1iN
) as well as
all the
M
query points (at locations
{ci}1iM
), with edges that include only the nearest
neighbors of each node among the
N+M1
others. This corresponds to a new mesh
represented by the graph
G
. Let
{c0
i}1iM+N
denote the corresponding new coordinates.
We generate the PDE solution for subsequent time steps on this graph auto-regressively
using a decoder θas follows:
xk+1[c0
i] = xk[c0
i]+(tk+1 tk)∆θ(xk[c0
i], . . . , x1[c0
i]), k =T, T + 1, . . . (4)
We train MAgNet by using two losses:
Interpolation Loss
: This loss makes sure that the interpolated points match the ground-truth
and is computed as follows:
Linterpolation =PM
i=1 PT
k=1 ||ˆxk[ci]xk[ci]||1
T×M(5)
Where ˆxk[ci]denotes the interpolated values generated by the model at the spatial queries.
Forecasting Loss
: This loss makes sure that the model predictions into the future are
accurate. If His the horizon of the predictions, then we can express the loss as follows:
Lforecasting =PM+N
i=1 PH
k=1 ||ˆxk+T[c0
i]xk+T[c0
i]||1
H×(M+N)(6)
Where
ˆxk+T[c0
i]
denotes the forecasted values generated by the model at the graph
G
which
combines both spatial queries and the parent mesh.
The final loss is then expressed as:
L=Lforecasting +Linterpolation.
3.2.2 Implementation Details
In the previous section, we described the general MAgNet framework. In this section, we present how
we build the inputs to MAgNet as well as the architectural choices for the encoding, interpolation and
forecasting modules and suggest two main architectures: MAgNet[CNN] and MAgNet[GNN].
5
摘要:

MAgNet:MeshAgnosticNeuralPDESolverOussamaBoussifMila-QuébecAIInstituteDIRO,UniversitédeMontréaloussama.boussif@mila.quebecDanAssoulineMila-QuébecAIInstituteDIRO,UniversitédeMontréaldan.assouline@mila.quebecLoubnaBenabbouUniversitéduQuébecàRimouskiLoubna_Benabbou@uqar.caYoshuaBengioMila-QuébecAIInst...

展开>> 收起<<
MAgNet Mesh Agnostic Neural PDE Solver Oussama Boussif Mila - Québec AI Institute.pdf

共23页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:23 页 大小:4.21MB 格式:PDF 时间:2025-04-24

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 23
客服
关注