A Review of Data-Driven Discovery for Dynamic Systems Joshua S. North1 Christopher K. Wikle2 and Erin M. Schliep3

2025-04-30 1 0 404.72KB 38 页 10玖币

侵权投诉

A Review of Data-Driven Discovery for Dynamic

Systems

Joshua S. North1,*, Christopher K. Wikle2, and Erin M. Schliep3

1Earth and Environmental Sciences, Lawrence Berkeley National Laboratory, Berkeley, CA, 1

Cyclotron Road

2Department of Statistics, University of Missouri, Columbia, MO, 146 Middlebush Hall

3Department of Statistics, North Carolina State University, Raleigh, NC, 2311 Stinson Drive

*Corresponding author: jsnorth@lbl.gov

Abstract

Many real-world scientiﬁc processes are governed by complex nonlinear dynamic sys-

tems that can be represented by differential equations. Recently, there has been increased

interest in learning, or discovering, the forms of the equations driving these complex non-

linear dynamic system using data-driven approaches. In this paper we review the current

literature on data-driven discovery for dynamic systems. We provide a categorization to

the different approaches for data-driven discovery and a uniﬁed mathematical framework

to show the relationship between the approaches. Importantly, we discuss the role of statis-

tics in the data-driven discovery ﬁeld, describe a possible approach by which the problem

can be cast in a statistical framework, and provide avenues for future work.

Key Words: Differential Equations, Dynamic Equation Discovery, Probabilistic Dynamic

Equation Discovery

1 Introduction

Recently there has been a push from within the computer science, physics, applied mathemat-

ics, and statistics to learn the governing equations in complex dynamic systems parameterized

through dynamic equations (DE). There are a variety of reasons researchers may want to know

arXiv:2210.10663v1 [stat.ME] 19 Oct 2022

the underlying laws driving a system – to reinforce their assumptions, uncover extra infor-

mation about the system, or to produce a more realistic mathematical representation for the

system. Historically, scientists have relied on their ability to represent physical systems using

mathematical equations in the form of DEs. Dating back to at least the inference of equations

describing the motion of orbital bodies around the sun based on the positions of celestial bod-

ies (Legendre, 1806; Gauss, 1809), DEs have been used to model the evolution of complex

processes (e.g., the use of susceptible, infected, recovered models for epidemics), and have be-

come ubiquitous across virtually every area of science and engineering. Here, we review some

of the methods used to discover the governing equations driving complex, potentially nonlinear,

processes, often referred to as data-driven discovery.

Consider the general DE describing the evolution of a continuous process {u(s,t):s∈

Ds,t∈Dt},

ut(J)(s,t) = Mu(s,t),ux(s,t),uy(s,t),...,ut(1)(s,t),...,ut(J−1)(s,t),ω(s,t),(1)

where the vector u(s,t)∈RNdenotes the state of the system at location sand time t,ut(j)(s,t)is

the jth order temporal derivative of u(s,t),Jdenotes the highest order of the temporal deriva-

tive, M(·)represents the (potentially nonlinear) evolution function, and ω(s,t)represents any

covariates that might be included in the system. We will denote partial derivatives by a sub-

script; that is ∂u

∂x=uxand ∂u

∂t=ut, for example. Here, Nis the number of components in

the system (e.g., u(s,t)=[u(s,t,1),u(s,t,2),...,u(s,t,N)]0, sometimes called the system state),

s∈ {s1,...,sS}=Dsis a discrete location in the domain with |Ds|=S, and t∈ {1,...,T}=Dt

is the realization of the system at discrete times where |Dt|=T. Equation (1) is composed of

partial derivatives of the system with Ds∈R2and s= (x,y)(although this can be simpliﬁed to

R1with s=xor generalized to higher dimensions) and is often referred to as a partial differen-

tial equation (PDE). Removing the spatial component from (1) results in a temporal ordinary

differential equation (ODE),

ut(J)(t) = Mu(t),ut(1)(t),...,ut(J−1)(t),ω(t),(2)

where Mis composed solely of derivatives of the components in time (i.e., no partial deriva-

tives). This review will focus on methods to discover the evolution function Mfor both PDEs

(1) and ODEs (2).

The goal of data-driven discovery is to learn the governing equation(s) in (1) and (2) –

speciﬁcally the (non)linear function M– having only observed noisy realizations of the true

process u(i.e., true derivatives are unknown). Broadly, we group the approaches used for data-

driven discovery into three categories – classical sparse methods, classical symbolic methods,

and deep modeling methods using either symbolic or sparse regression techniques – but recog-

nize other categorization is possible. The ﬁrst approach uses sparse regression where a library

of potential solutions are proposed and the correct solution set is obtained by regularization

based techniques, resulting in a sparse solution. The second uses symbolic regression where

the solution is learned, or generated, through the estimation procedure. The third uses deep

models to facilitate the discovery process of the previous two approaches (e.g., symbolic re-

gression using deep models). As this is an active area of research, we refer the reader to the

special issue Epureanu and Ghadami (2022) for emerging areas of research and applications.

While less common than the deterministic counterparts, methods to quantify uncertainty

in the discovered equations have been proposed. However, these methods generally do not

account for uncertainty in the observed data, missing a vital piece of the statistical puzzle.

We draw parallels between traditional statistical models and data-driven discovery, discussing

how statistical models can be formulated for data-driven discovery and highlighting possible

improvments to the methods.

The remainder of the paper is organized as follows. In Section 2we review sparse re-

gression methods for data-driven discovery, which are sub-categorized into deterministic and

probabilistic approaches. In Section 3we review symbolic methods for data-driven discov-

ery. In Section 4we review deep modeling approaches for data-driven discovery, which are

sub-divided into methods approximating and discovering the underlying dynamics. In Section

5we show how the problem can be formulated in a statistical paradigm and in Section 6we

review a possible method of data-driven discovery using a fully probablistic approach. Section

7concludes the paper.

2 Sparse Regression

Sparse regression approaches for dynamic discovery of ODEs and PDEs are fundamentally

the same. We formulate the general approach using (1), noting that the approach for (2) is

equivalent but with only one spatial location (i.e., S=1). First, consider rewriting (1) as a

linear (in parameters) system

ut(J)(s,t) = f(u(s,t),...)M,

where Mis a D×N sparse matrix of coefﬁcients and f(·)is a vector-valued nonlinear transfor-

mation function of length Dtermed the feature library. The input of the arguments for f(·)are

general and contain terms that potentially relates to the system (e.g., advection term, polyno-

mial terms, interactions). Sparse identiﬁcation seeks to identify relevant terms of M, thereby

identifying the components of fthat drive the system and discovering the governing dynamics.

Denote the matrix of all data (all components at all time points) for the jth derivative of the

system as

Ut(j)=







ut(j)(s1,1,1)ut(j)(s1,1,2)··· ut(j)(s1,1,N)

ut(j)(s1,2,1)ut(j)(s1,2,2)··· ut(j)(s1,2,N)

ut(j)(sS,T,1)ut(j)(sS,T,2)··· ut(j)(sS,T,N)







The response matrix is Ut(J)of size (ST )×Nand we generically denote the feature library as

F=1,Ut(0),...,Ut(J),Ux,Uy,Uxx, ..., Ω.

where Ωare the associated covariates indexed in space and time and Fis a (ST )×Dmatrix.

The library may also contain interactions of the components, partial derivatives, and covariates.

We can write the linear system

Ut(J)=FM,(3)

whereby identifying the terms of Mthat are non-zero, the DE is identiﬁed.

The derivatives of the system are rarely observed (i.e., only Ut(0)(t)is measured). To ob-

tain derivatives in space and time, numerical techniques are used to approximate the deriva-

tives. There are multiple methods to approximate derivatives numerically, and the choice of

approximation has the potential to impact the discovered equation (de Silva et al., 2020). Orig-

inally, a ﬁnite difference approach was suggested, but this approach is sensitive to noise (Char-

trand, 2011). When measurement noise is present, data are either smoothed a priori and then

derivatives are computed, or derivatives are computed using either total variation regularization

(Chartrand, 2011) or polynomial interpolation (Knowles and Renka, 2012).

Due to both the numerical approximation of the derivative and the potential for noise in the

observed data, (3) does not hold exactly. Instead,

Ut(J)=FM +,(4)

where i.i.d.

∼N(0,σ2IN)and σ2is the variance associated with the model approximation and the

numerical differentiation. To induce sparsity, and thereby identify the relevant terms governing

the system, solutions to (4) of the form

M=argmin

kUt(J)−Fb

Mk2

2+Penθ(b

M),(5)

are sought, where Penθ(b

M)generically denotes some penalty term based on parameters θ(i.e.,

Penθ(b

M) = λkb

Mk1where θ=λfor the LASSO penalty).

2.1 Deterministic Approaches

The majority of deterministic approaches are composed of three steps – denoising and differ-

entiation, construction of a feature library, and sparse regression. Assuming data have been

properly differentiated and a library has been proposed, the deterministic approach seeks solu-

tions of the form (5). The original sparse regression approach to data-driven discovery, Sparse

Identiﬁcation of Nonlinear Dynamics (SINDy; Brunton et al., 2016), uses sequential threshold

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

AReviewofData-DrivenDiscoveryforDynamicSystemsJoshuaS.North1,*,ChristopherK.Wikle2,andErinM.Schliep31EarthandEnvironmentalSciences,LawrenceBerkeleyNationalLaboratory,Berkeley,CA,1CyclotronRoad2DepartmentofStatistics,UniversityofMissouri,Columbia,MO,146MiddlebushHall3DepartmentofStatistics,NorthCarol...

展开>> 收起<<

A Review of Data-Driven Discovery for Dynamic Systems Joshua S. North1 Christopher K. Wikle2 and Erin M. Schliep3.pdf

共38页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

A Review of Data-Driven Discovery for Dynamic Systems Joshua S. North1 Christopher K. Wikle2 and Erin M. Schliep3

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: