A Review of Data-Driven Discovery for Dynamic Systems Joshua S. North1 Christopher K. Wikle2 and Erin M. Schliep3

2025-04-30 0 0 404.72KB 38 页 10玖币
侵权投诉
A Review of Data-Driven Discovery for Dynamic
Systems
Joshua S. North1,*, Christopher K. Wikle2, and Erin M. Schliep3
1Earth and Environmental Sciences, Lawrence Berkeley National Laboratory, Berkeley, CA, 1
Cyclotron Road
2Department of Statistics, University of Missouri, Columbia, MO, 146 Middlebush Hall
3Department of Statistics, North Carolina State University, Raleigh, NC, 2311 Stinson Drive
*Corresponding author: jsnorth@lbl.gov
Abstract
Many real-world scientific processes are governed by complex nonlinear dynamic sys-
tems that can be represented by differential equations. Recently, there has been increased
interest in learning, or discovering, the forms of the equations driving these complex non-
linear dynamic system using data-driven approaches. In this paper we review the current
literature on data-driven discovery for dynamic systems. We provide a categorization to
the different approaches for data-driven discovery and a unified mathematical framework
to show the relationship between the approaches. Importantly, we discuss the role of statis-
tics in the data-driven discovery field, describe a possible approach by which the problem
can be cast in a statistical framework, and provide avenues for future work.
Key Words: Differential Equations, Dynamic Equation Discovery, Probabilistic Dynamic
Equation Discovery
1 Introduction
Recently there has been a push from within the computer science, physics, applied mathemat-
ics, and statistics to learn the governing equations in complex dynamic systems parameterized
through dynamic equations (DE). There are a variety of reasons researchers may want to know
1
arXiv:2210.10663v1 [stat.ME] 19 Oct 2022
the underlying laws driving a system – to reinforce their assumptions, uncover extra infor-
mation about the system, or to produce a more realistic mathematical representation for the
system. Historically, scientists have relied on their ability to represent physical systems using
mathematical equations in the form of DEs. Dating back to at least the inference of equations
describing the motion of orbital bodies around the sun based on the positions of celestial bod-
ies (Legendre, 1806; Gauss, 1809), DEs have been used to model the evolution of complex
processes (e.g., the use of susceptible, infected, recovered models for epidemics), and have be-
come ubiquitous across virtually every area of science and engineering. Here, we review some
of the methods used to discover the governing equations driving complex, potentially nonlinear,
processes, often referred to as data-driven discovery.
Consider the general DE describing the evolution of a continuous process {u(s,t):s
Ds,tDt},
ut(J)(s,t) = Mu(s,t),ux(s,t),uy(s,t),...,ut(1)(s,t),...,ut(J1)(s,t),ω(s,t),(1)
where the vector u(s,t)RNdenotes the state of the system at location sand time t,ut(j)(s,t)is
the jth order temporal derivative of u(s,t),Jdenotes the highest order of the temporal deriva-
tive, M(·)represents the (potentially nonlinear) evolution function, and ω(s,t)represents any
covariates that might be included in the system. We will denote partial derivatives by a sub-
script; that is u
x=uxand u
t=ut, for example. Here, Nis the number of components in
the system (e.g., u(s,t)=[u(s,t,1),u(s,t,2),...,u(s,t,N)]0, sometimes called the system state),
s∈ {s1,...,sS}=Dsis a discrete location in the domain with |Ds|=S, and t∈ {1,...,T}=Dt
is the realization of the system at discrete times where |Dt|=T. Equation (1) is composed of
partial derivatives of the system with DsR2and s= (x,y)(although this can be simplified to
R1with s=xor generalized to higher dimensions) and is often referred to as a partial differen-
tial equation (PDE). Removing the spatial component from (1) results in a temporal ordinary
differential equation (ODE),
ut(J)(t) = Mu(t),ut(1)(t),...,ut(J1)(t),ω(t),(2)
2
where Mis composed solely of derivatives of the components in time (i.e., no partial deriva-
tives). This review will focus on methods to discover the evolution function Mfor both PDEs
(1) and ODEs (2).
The goal of data-driven discovery is to learn the governing equation(s) in (1) and (2) –
specifically the (non)linear function M– having only observed noisy realizations of the true
process u(i.e., true derivatives are unknown). Broadly, we group the approaches used for data-
driven discovery into three categories – classical sparse methods, classical symbolic methods,
and deep modeling methods using either symbolic or sparse regression techniques – but recog-
nize other categorization is possible. The first approach uses sparse regression where a library
of potential solutions are proposed and the correct solution set is obtained by regularization
based techniques, resulting in a sparse solution. The second uses symbolic regression where
the solution is learned, or generated, through the estimation procedure. The third uses deep
models to facilitate the discovery process of the previous two approaches (e.g., symbolic re-
gression using deep models). As this is an active area of research, we refer the reader to the
special issue Epureanu and Ghadami (2022) for emerging areas of research and applications.
While less common than the deterministic counterparts, methods to quantify uncertainty
in the discovered equations have been proposed. However, these methods generally do not
account for uncertainty in the observed data, missing a vital piece of the statistical puzzle.
We draw parallels between traditional statistical models and data-driven discovery, discussing
how statistical models can be formulated for data-driven discovery and highlighting possible
improvments to the methods.
The remainder of the paper is organized as follows. In Section 2we review sparse re-
gression methods for data-driven discovery, which are sub-categorized into deterministic and
probabilistic approaches. In Section 3we review symbolic methods for data-driven discov-
ery. In Section 4we review deep modeling approaches for data-driven discovery, which are
sub-divided into methods approximating and discovering the underlying dynamics. In Section
5we show how the problem can be formulated in a statistical paradigm and in Section 6we
review a possible method of data-driven discovery using a fully probablistic approach. Section
7concludes the paper.
3
2 Sparse Regression
Sparse regression approaches for dynamic discovery of ODEs and PDEs are fundamentally
the same. We formulate the general approach using (1), noting that the approach for (2) is
equivalent but with only one spatial location (i.e., S=1). First, consider rewriting (1) as a
linear (in parameters) system
ut(J)(s,t) = f(u(s,t),...)M,
where Mis a D×N sparse matrix of coefficients and f(·)is a vector-valued nonlinear transfor-
mation function of length Dtermed the feature library. The input of the arguments for f(·)are
general and contain terms that potentially relates to the system (e.g., advection term, polyno-
mial terms, interactions). Sparse identification seeks to identify relevant terms of M, thereby
identifying the components of fthat drive the system and discovering the governing dynamics.
Denote the matrix of all data (all components at all time points) for the jth derivative of the
system as
Ut(j)=
ut(j)(s1,1,1)ut(j)(s1,1,2)··· ut(j)(s1,1,N)
ut(j)(s1,2,1)ut(j)(s1,2,2)··· ut(j)(s1,2,N)
.
.
..
.
..
.
.
ut(j)(sS,T,1)ut(j)(sS,T,2)··· ut(j)(sS,T,N)
.
The response matrix is Ut(J)of size (ST )×Nand we generically denote the feature library as
F=1,Ut(0),...,Ut(J),Ux,Uy,Uxx, ..., .
where are the associated covariates indexed in space and time and Fis a (ST )×Dmatrix.
The library may also contain interactions of the components, partial derivatives, and covariates.
We can write the linear system
Ut(J)=FM,(3)
4
whereby identifying the terms of Mthat are non-zero, the DE is identified.
The derivatives of the system are rarely observed (i.e., only Ut(0)(t)is measured). To ob-
tain derivatives in space and time, numerical techniques are used to approximate the deriva-
tives. There are multiple methods to approximate derivatives numerically, and the choice of
approximation has the potential to impact the discovered equation (de Silva et al., 2020). Orig-
inally, a finite difference approach was suggested, but this approach is sensitive to noise (Char-
trand, 2011). When measurement noise is present, data are either smoothed a priori and then
derivatives are computed, or derivatives are computed using either total variation regularization
(Chartrand, 2011) or polynomial interpolation (Knowles and Renka, 2012).
Due to both the numerical approximation of the derivative and the potential for noise in the
observed data, (3) does not hold exactly. Instead,
Ut(J)=FM +,(4)
where i.i.d.
N(0,σ2IN)and σ2is the variance associated with the model approximation and the
numerical differentiation. To induce sparsity, and thereby identify the relevant terms governing
the system, solutions to (4) of the form
M=argmin
b
M
kUt(J)Fb
Mk2
2+Penθ(b
M),(5)
are sought, where Penθ(b
M)generically denotes some penalty term based on parameters θ(i.e.,
Penθ(b
M) = λkb
Mk1where θ=λfor the LASSO penalty).
2.1 Deterministic Approaches
The majority of deterministic approaches are composed of three steps – denoising and differ-
entiation, construction of a feature library, and sparse regression. Assuming data have been
properly differentiated and a library has been proposed, the deterministic approach seeks solu-
tions of the form (5). The original sparse regression approach to data-driven discovery, Sparse
Identification of Nonlinear Dynamics (SINDy; Brunton et al., 2016), uses sequential threshold
5
摘要:

AReviewofData-DrivenDiscoveryforDynamicSystemsJoshuaS.North1,*,ChristopherK.Wikle2,andErinM.Schliep31EarthandEnvironmentalSciences,LawrenceBerkeleyNationalLaboratory,Berkeley,CA,1CyclotronRoad2DepartmentofStatistics,UniversityofMissouri,Columbia,MO,146MiddlebushHall3DepartmentofStatistics,NorthCarol...

展开>> 收起<<
A Review of Data-Driven Discovery for Dynamic Systems Joshua S. North1 Christopher K. Wikle2 and Erin M. Schliep3.pdf

共38页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:38 页 大小:404.72KB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 38
客服
关注