[
12
], have emerged as a way to discover scalable multilevel algorithms and operator-adapted wavelets
for multiscale PDEs. Low-rank decomposition-based methods are another popular approach to
exploit the low-dimensional nature of MsPDEs. Notable example include the fast multipole method
[
13
], hierarchical matrices (
H
and
H2
matrices) [
14
], and hierarchical interpolative factorization
[
15
]. These methods can achieve (near-)linear scaling and high computational efficiency by exploiting
the low-rank approximation of the (elliptic) Green’s function [16].
Neural operators, unlike traditional solvers that operate with fixed parameters, are capable
of handling a range of input parameters, making them promising for data-driven forward and
inverse solving of PDE problems. Pioneering work in operator learning methods include [
17
,
18
,
19
,
20
]. Nevertheless, they are limited to problems with fixed discretization sizes. Recently, infinite-
dimensional operator learning has been studied, which learns the solution operator (mapping)
between infinite-dimensional Banach spaces for PDEs. Most notably, the Deep Operator Network
(DeepONet) [
21
] was proposed as a pioneering model to leverage deep neural networks’ universal
approximation for operators [
22
]. Taking advantage of the Fast Fourier Transform (FFT), Fourier
Neural Operator (FNO) [
23
] constructs a learnable parametrized kernel in the frequency domain
to render the convolutions in the solution operator more efficient. Other developments include the
multiwavelet extension of FNO [
24
], Message-Passing Neural Operators [
25
], dimension reduction
in the latent space [
26
], Gaussian Processes [
27
], Clifford algebra-inspired neural layers [
28
], and
Dilated convolutional residual network [29].
Attention neural architectures, popularized by the Transformer deep neural network [
30
], have
emerged as universal backbones in Deep Learning. These architectures serve as the foundation for
numerous state-of-the-art models, including GPT [
31
], Vision Transformer (ViT) [
32
], and Diffusion
models [
33
,
34
]. More recently, Transformers have been studied and become increasingly popular in
PDE operator learning problems, e.g., in [
35
,
36
,
37
,
38
,
39
,
40
,
41
] and many others. There are
several advantages in the attention architectures. Attention can be viewed as a parametrized instance-
dependent kernel integral to learn the “basis” [
35
] similar to those in the numerical homogenization;
see also the exposition featured in neural operators [
42
]. This layerwise latent updating resembles
the learned “basis” in DeepONet [
39
], or frame [
43
]. It is flexible to encode the non-uniform
geometries in the latent space [
44
]. In [
45
,
46
], advanced Transformer architectures (ViT) and
Diffusion models are combined with the neural operator framework. In [
47
], Transformers are
combined with reduced-order modeling to accelerate the fluid simulation for turbulent flows. In [
48
],
tensor decomposition techniques are employed to enhance the efficiency of attention mechanisms in
solving high-dimensional partial differential equation (PDE) problems.
Among these data-driven operator learning models, under certain circumstances, the numerical
results could sometimes overtake classical numerical methods in terms of efficiency or even in
accuracy. For instance, full wave inversion is considered in [
49
] with the fusion model of FNO and
DeepONet (Fourier-DeepONet); direct methods-inspired DNNs are applied to the boundary value
Calder´on problems achieve much more accurate reconstruction with the help of data [
50
,
51
,
52
]; in
[
53
], the capacity of FNO to jump significantly large time steps for spatialtemporal PDEs is exploited
to infer the wave packet scattering in quantum physics and achieves magnitudes more efficient result
than traditional implicit Euler marching scheme. [
54
] exploits the capacity of graph neural networks
to accelerate particle-based simulations. [
55
] investigates the integration of the neural operator
DeepONet with classical relaxation techniques, resulting in a hybrid iterative approach. Meanwhile,
Wu et al. [
56
] introduce an asymptotic-preserving convolutional DeepONet designed to capture the
diffusive characteristics of multiscale linear transport equations.
For multiscale PDEs, operator learning methods can be viewed as an advancement beyond
2