CONVERGENCE RATES FOR ANSATZ-FREE DATA-DRIVEN INFERENCE IN PHYSICALLY CONSTRAINED PROBLEMS S. CONTI1 F. HOFFMANN1AND M. ORTIZ23

2025-05-06 0 0 524.53KB 21 页 10玖币
侵权投诉
CONVERGENCE RATES FOR ANSATZ-FREE DATA-DRIVEN
INFERENCE IN PHYSICALLY CONSTRAINED PROBLEMS
S. CONTI1, F. HOFFMANN1AND M. ORTIZ2,3
1Institut f¨ur Angewandte Mathematik, Universit¨at Bonn, Germany
2Hausdorff Center for Mathematics, Universit¨at Bonn, Germany
3Division of Engineering and Applied Science, California Institute of Technology,
Pasadena
Abstract. We study a Data-Driven approach to inference in physical systems in
a measure-theoretic framework. The systems under consideration are characterized
by two measures defined over the phase space: i) A physical likelihood measure
expressing the likelihood that a state of the system be admissible, in the sense of
satisfying all governing physical laws; ii) A material likelihood measure expressing
the likelihood that a local state of the material be observed in the laboratory. We
assume deterministic loading, which means that the first measure is supported on
a linear subspace. We additionally assume that the second measure is only known
approximately through a sequence of empirical (discrete) measures. We develop
a method for the quantitative analysis of convergence based on the flat metric and
obtain error bounds both for annealing and the discretization or sampling procedure,
leading to the determination of appropriate quantitative annealing rates. Finally,
we provide an example illustrating the application of the theory to transportation
networks.
1. Introduction
We consider the problem of inferring the probability of finding a physical system
in a given state zin a linear space Z, or phase space, which we assume to be finite-
dimensional. For instance, if the system under consideration is an electrical circuit, then
the state of the system consists of the array of potential differences across the elements of
the circuit and the corresponding array of electric currents; if the system is a hydraulic
network, then the state of the system consists of the array of head differences across
each pipe and the corresponding array of mass fluxes; if the system is a mechanical truss
structure, then the state of the system consists of the array of displacement differences,
or strains, across each member and the corresponding array of internal forces, or stresses;
et cetera. We note that, in all these examples, the state of the system consists of a pair
of dual variables and the dimension of phase space is even.
Physical systems obey field equations, which place hard constraints on the possible
states attainable by the system. These constraints are material independent and can
be regarded as a restriction of the set of admissible states of the system. The view of
field equations as constraints for purposes of analysis has a long-standing tradition in
continuum mechanics and electromagnetism, and constitutes the foundation of recent
1
arXiv:2210.02846v1 [math.OC] 20 Sep 2022
2 S. CONTI, F. HOFFMANN AND M. ORTIZ
methods of data-driven analysis [8, 3], physically-informed neural networks (PINNs) [11]
and other applications of modern data science. Classically, deterministic problems in
mathematical physics are closed by further restricting the states of the system to lie in
a subset representing the material law of the system, i. e., the locus of states attainable
by a specific material.
(a)
ε
σ
E
LD
(b)
ε
σ
E
µD,h
Figure 1. Classical inference. a) Material likelihood function LD, here
in the form of a sliding Gaussian (dark: low likelihood; light: high likeli-
hood), constraint set Eand likelihood function Lobtained by restricting
LDto E. b) Empirical likelihood measure µD,h sampled from LD.
In this paper, we work within a general framework [2] for systems in which the ma-
terial law and the admissibility constraints are described by positive Radon likelihood
measures µD∈ M(Z) and µE∈ M(Z), respectively, representing the likelihood of yZ
being a (local) material state observed in the laboratory and of zZbeing admissible.
Before presenting our new contributions, the main ideas underlying the work may be
summarized as follows. The admissible states of the system may be random, e. g., due
to the application of random forcing to the system. The observed material states may be
random either because the material itself is random or because of experimental scatter,
cf. Fig. 1a. We expect the material states yZand admissible states zZof the sys-
tem to be distributed according to a notion of intersection measure µDµE∈ M(Z×Z),
which can be qualitatively understood as the product measure µD×µEconditioned to
y=z. In the special case in which µDand µEare regular with respect to the Lebesgue
measure, with continuous densities LDand LE, the likelihood of finding the system at
state zZis, simply, LD(z)LE(z), which determines the intersection µDµE. In
particular, if LE(z)LD(z) is integrable and non-zero, the expression
(1) E[f] = RZf(z)LD(z)LE(z)dL2N(z)
RZLD(z)LE(z)dL2N(z)ZZ
f(z)L(z)dL2N(z)
gives the expected value of a quantity of interest fCc(Z). Similarly, if µD=LDL2N
with LDcontinuous and µE=HNE, corresponding to deterministic loading, then
µDµE=LDHNE, Fig. 1a. If Z=R2,µD=H1Raand µE=H1Rb, with a
and bR2not parallel, then µDµE=δ0.
ANSATZ-FREE DATA-DRIVEN INFERENCE 3
Suppose now that, as is often the case, the likelihood measure µDis not known exactly,
but only approximately through sequences of empirical measures (µD,h) obtained, e. g.,
by means of material testing. Suppose further that the empirical measures supply an
increasingly better approximation of µD, e. g., as a result of increasingly accurate and
extensive measurements. We then may expect that, under appropriate conditions of
convergence of (µD,h), the sequence of approximate intersections (µD,h µE) converge to
the exact limiting likelihood measure µDµE, thus defining a convergent approximation
scheme for the inference problem.
A fundamental difficulty that arises immediately is that, for most notions of intersec-
tions of measures, the intersection of certain pairs of measures may not be well-defined
or may be zero. Consider for example the setting in which both µDand µEare ap-
proximated by empirical measures (µD,h) and (µE,h). A conventional response to this
challenge is to introduce Lebesgue-regular approximations (˜µD,h) and (˜µE,h) fitted to
the data (µD,h) and (µE,h) by means of some method of regression. By regularity, (˜µD,h)
and (˜µE,h) then have well-defined, continuous densities ˜
LD,h and ˜
LE,h, respectively, and
the intersections (˜µD,h ˜µE,h), which are intended as approximations of the exact in-
tersection µDµE, are simply given by ( ˜
LD,h(z)˜
LE,h(z))L2N(z). However, there is no
guarantee that this approximation will work in general and the approximations (˜µD,h)
and (˜µE,h) need to be chosen appropriately. Here, we take a different approach using
thermalizations.
Overall, there are three main cases of interest:
(1) Lebesgue-regular likelihoods;
(2) empirical measures;
(3) likelihood measures supported on linear subspaces.
The general framework presented here allows for the physical likelihood and the material
likelihood to be in either of these three classes independently of each other. For (2), we
may consider that there is a sequence of approximating empirical measures with the
limiting measure belonging to class (1) or (3). In this work, we focus on the case in
which the physical likelihood µDis in (2) approximating (1), and the material likelihood
µEis in (3), see (5) and (6) below.
Whereas the measure-theoretical framework just outlined is remarkable for its direct-
ness and simplicity, a Bayesian reinterpretation of the rules of inference is often favored
in the literature (cf. [14] and references therein). A common ansatz is to introduce the
representation z= (, σ) and a sequence of functions gD,h, parameterized by a set of
parameters ph, providing the model
(2) σ=gD,h(;ph) + η ,
where ηis a random variable, interpreted as observational noise, with likelihood fD,h(·;qh),
parameterized by further parameters qh, and to assume the approximate material likeli-
hood to be of the form
(3) ˜
LD,h((, σ)) = fD,hσgD,h(;ph); qh.
Evidently, if fD,h attains its maximum at 0, then (2) represents the most likely material
law given the ansatz and may thus be regarded as an identified, or learned, material
model. A common choice for gD,h are neural networks, in the context of machine learn-
ing [7], whereas a common choice of fD,h is Gaussian [10, 4]. Common methods of
regression used to determine the parameters from the data include classical methods of
4 S. CONTI, F. HOFFMANN AND M. ORTIZ
statistical inference such as maximum likelihood [9, 5], variational approaches based on
the introduction of a loss function [14], measure-theoretical approaches based, e. g., on
the Wasserstein distance [1] or the Kullback-Leibler discrepancy [12].
An essential problem with this approach is that the choice of material models gD,h,
observational noise fD,h, priors, loss functions and parameterizations thereof are often
not prescribed by theory or fundamental considerations but instead dictated by conve-
nience. Worse still, the form of fD,h is often fixed throughout the sequence, e. g., to be
Gaussian, which renders the approximation scheme non-convergent in cases where the
underlying likelihood measures µDand µEare not of the same form. Since the limit-
ing likelihood measures µDand µEare often not known in practice, it is generally not
possible to ensure that approximation schemes tied to particular choices of models and
priors be convergent. In addition, it is clear that, even in the best of circumstances,
representations of the form (2) and (3) introduce modeling bias and error and incur in
loss of information relative to the data sets themselves.
The ansatz-free approach of [2] adopted here leads to a direct connection between data
and inference and is therefore lossless and free of modeling bias. In addition, it allows
to treat unbounded likelihoods, a setting where it is not clear how to set-up a Bayesian
framework that is able to address the questions of inference and approximation. Our
approach overcomes the problem of unbounded likelihoods and zero intersection between
the approximating likelihood measures (µD,h) with (µE) by recourse to thermalization
and annealing. Specifically, we consider a sequence βh+of reciprocal temperatures
for h+, and replace µh=µD,h ×µEby its thermalization
(4) µh,βh:= B1
βheβhkyzk2µh, Bβh:= ZZ
eβhkξk2dL2N(ξ).
As h→ ∞, this regularization increasingly concentrates µhto the diagonal diag(Z×Z)
and is therefore expected to deliver the sought intersection µDµEin the limit. Suppose,
for instance, that the approximate material likelihood measure is
(5) µD,h =X
pPh
mpδp,
where (Ph) are point data sets in Zand mh
p0 are weights. Suppose, in addition, that
the loading is deterministic,
(6) µE=HNE,
where Eis an affine subspace of Zof dimension Nand HNEis the Hausdorff mea-
sure restricted to E. Then, the approximate expectation of a quantity fCb(Z×Z)
corresponding to (4) is, cf. Section 4.2,
Eh[f] = PpPhREmpB1
βheβhkpzk2f(p, z)dHN(z)
PpPhREmpB1
βheβhkpzk2dHN(z),(7)
which is explicit in the data and eschews the need for ans¨atze of any type, be they
material models or priors. We do not consider the definition of the constraint set Eas a
modelling step given that it encodes the governing physical laws. Data-driven inference
rules such as (7) are amenable to efficient numerical implementation in combination with
stochastic quadrature formulas for the evaluation of the integrals [13].
ANSATZ-FREE DATA-DRIVEN INFERENCE 5
However, the analysis of [2], based on the concept of weak convergence, is not quan-
titative. It does not permit to obtain convergence rates, both for the convergence of µβ
to some limit µand for the convergence of µh,β to µβ. In particular, it is not clear the
thermalization parameter βhin (7) should be chosen in practice.
1.1. Main Results. Our aim is to obtain quantitative estimates for the convergence
of µβto its limit µand the convergence of µh,β to µβ, leading in particular to a
prescription for the choice βhwhich ensures the desired convergence µh,βhµ. In
order to make convergence quantitative, we work in a metric setting and not only in
terms of weak convergence as in [2]. Our starting point is this observation:
The flat norm metrizes weak convergence on bounded and tight sets of measures.
We adopt, in this metric setting, the notion of transversality and of diagonal concentra-
tion for (possibly unbounded) measures via thermalization as developed in [2]. We in-
troduce a weaker concept, weak transversality, which corresponds to transversality along
subsequences, see Definition 3.1, and circumvents the need for regularity assumptions on
the measure µ. Then, Prokhorov’s theorem shows that:
If µis such that the measures µβare uniformly bounded and uniformly tight,
then it is weakly transversal and, in particular, it has one or more diagonal
concentrations.
Having framed the thermalization problem in a metric setting, we may make use of
standard devices such as uniform convergence and diagonal subsequences. This permits
to decouple the thermalization problem (β→ ∞) from the approximation problem (h
). A typical statement (not detailed here) is:
Assume that i) µhare uniformly transversal and ii) µhhave thermalizations
that are uniformly bounded, uniformly tight and uniformly approximate the
thermalization of µ. Then µis transversal and its diagonal concentration is
approximated by the diagonal concentrations of µh.
In particular, the diagonal concentration of an unknown measure µis recovered from the
diagonal concentrations of an approximating sequence of measures.
We demonstrate the usefulness of this abstract framework by considering specific
classes of measures. We first focus on sub-Gaussian material likelihoods combined with
a deterministic physical likelihoods. Specifically, we consider measures of the form
(8) µ= eΦL2N× HNE
for some N-dimensional affine subspace Eof Zand Φ : ZRsatisfying
(9) β0kyzk2+ Φ(y)ckyk2+kzk2bfor all yZ,zE
for some constants β0>0, c > 0 and b > 0. Condition (9) can be interpreted as a
transversality condition in view of the following result.
Theorem 1.1 (informal, see Prop. 3.3 and Prop. 3.4).Measures µsatisfying (8)-(9)
are weakly transversal and admit a diagonal concentration. Further, if ΦC1and its
derivative does not grow too fast, then µis strongly transversal and the thermalizations
µβof µconverge to the diagonal concentration µwith rate β1/2.
We remark that, for fully deterministic systems, the potential Φ is the indicator
function of a set DZ, in which case (9) corresponds precisely to the definition of
transversality introduced in [3].
摘要:

CONVERGENCERATESFORANSATZ-FREEDATA-DRIVENINFERENCEINPHYSICALLYCONSTRAINEDPROBLEMSS.CONTI1,F.HOFFMANN1ANDM.ORTIZ2;31InstitutfurAngewandteMathematik,UniversitatBonn,Germany2Hausdor CenterforMathematics,UniversitatBonn,Germany3DivisionofEngineeringandAppliedScience,CaliforniaInstituteofTechnology,Pa...

展开>> 收起<<
CONVERGENCE RATES FOR ANSATZ-FREE DATA-DRIVEN INFERENCE IN PHYSICALLY CONSTRAINED PROBLEMS S. CONTI1 F. HOFFMANN1AND M. ORTIZ23.pdf

共21页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:21 页 大小:524.53KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 21
客服
关注