Statistical and machine learning approaches for prediction of long-time excitation energy transfer dynamics Kimara Naicker1 2Ilya Sinayskiy1 2and Francesco Petruccione1 2 3

2025-05-03 0 0 709.58KB 11 页 10玖币
侵权投诉
Statistical and machine learning approaches for prediction of long-time excitation
energy transfer dynamics
Kimara Naicker,1, 2, Ilya Sinayskiy,1, 2 and Francesco Petruccione1, 2, 3
1Quantum Research Group, School of Chemistry and Physics,
University of KwaZulu-Natal, Durban, KwaZulu-Natal, 4001, South Africa
2National Institute for Theoretical and Computational Sciences (NITheCS), South Africa
3School of Data Science and Computational Thinking and Department of Physics,
Stellenbosch University, Stellenbosch, 7600, South Africa
(Dated: October 28, 2022)
One of the approaches used to solve for the dynamics of open quantum systems is the hierarchical
equations of motion (HEOM). Although it is numerically exact, this method requires immense
computational resources to solve. The objective here is to demonstrate whether models such as
SARIMA, CatBoost, Prophet, convolutional and recurrent neural networks are able to bypass this
requirement. We are able to show this successfully by first solving the HEOM to generate a data set
of time series that depict the dissipative dynamics of excitation energy transfer in photosynthetic
systems then, we use this data to test the models ability to predict the long-time dynamics when
only the initial short-time dynamics is given. Our results suggest that the SARIMA model can serve
as a computationally inexpensive yet accurate way to predict long-time dynamics.
I. INTRODUCTION
Time series analysis involves methods of analysing a
series of data points that are indexed in time order. The
objective of the analysis is to collect and study the past
observations of a time series to develop an appropriate
model which describes the inherent structure of the se-
ries. This model is then used to generate future values
for the series, i.e. to make forecasts [1]. In this work, the
data being analysed is relevant to the dissipative dynam-
ics of excitation energy transfer (EET) in systems similar
to the photosynthetic open quantum system regime.
In some cases, information about the underlying dy-
namical correlations in open quantum systems can be
encoded at the initial stages of their evolution. There-
fore, it may be possible to obtain long-time dynamics of
open quantum systems from the knowledge of their short-
time evolution. This conjecture allows the bypass of the
need for direct long-time simulations. The simulation of
numerically exact methods to describe the dynamics of
open quantum systems often require immense computa-
tional resources that scale exponentially with the size of
the system under study, hence, it is desirable to develop
an approach that can accurately predict long-time dy-
namics of open quantum systems along with eliminating
the need for direct calculations to some extent.
Various numerical solutions for the dynamics of open
quantum systems have been developed considering the
complexity of system-bath interactions. The dynamics
of an open quantum system that are dependent on the
Hamiltonian of the system can be described through den-
sity matrix-based approaches in the Liouville space of the
system. The numerically exact formalism adopted in this
study is the hierarchical equations of motion (HEOM) de-
kimaranaicker@gmail.com
veloped by Tanimura and Kubo, and later adapted to bi-
ological light harvesting complexes by Ishizaki and Flem-
ing [2–4]. Machine learning (ML) has been applied to this
focus area in many relevant cases [5–10]. L. E. Herrera
et al. conducted a comparative study where they bench-
marked ML models based on their efficiency in predict-
ing long-time dynamics of a two-level quantum system
linearly coupled to harmonic bath [9].
Successful time series forecasting depends on an appro-
priate model fitting. The development of efficient models
to improve forecasting accuracy has evolved in literature.
A comparison of the predictive capabilities of a standard
statistical, an additive regression and a tree-based model
against more structurally complex neural network models
to simulate the open quantum system dynamics is car-
ried out using Python. The first stage of the dynamics
is obtained by solving the HEOM for a sufficiently large
theoretical system, thereafter, we train and test suitable
models to determine the validity of our approach. That
is to predict a time series from that series past values
efficiently.
This paper contains several sections which are orga-
nized as follows: in Sec II we describe the formalism
used, Sec III describes the data pre-processing procedure,
Sec IV covers the various time series models used, Sec V
presents our experimental forecasting results in terms of
MSE obtained on relevant datasets and a brief conclusion
of our work as well as the prospective future aim in this
field.
II. THE THEORY AND THE DATA
A time series is a sequential set of data points measured
over successive times. It is mathematically defined as a
set of vectors x(t), t = 0,1,2, . . . where t represents the
time elapsed [11]. The variable x(t) can be treated as
a random variable. The measurements taken during an
arXiv:2210.14160v2 [quant-ph] 27 Oct 2022
2
event in a time series are arranged in chronological order.
Quantum systems faced in the real world are rarely
entirely isolated, hence, it is important to consider the
influence of the surrounding environment (bath) when
studying the dynamical behaviour of a system. In the
process of an open quantum system, such as a photosyn-
thetic pigment-protein complex, evolving over time we
can generate a set of time dependent observables that
depict the coherent movement of electronic excitations
through the system by solving the HEOM. This section
describes the theoretical background used to generate the
data sets used in the study.
The total Hamiltonian is composed of the Hamiltonian
of the system, bath and system-bath interaction,
ˆ
HT ot =ˆ
HS+ˆ
HB+ˆ
HSB .(1)
We focus on the simplest electronic energy transfer sys-
tem, a dimer, where Hamiltonian of the system refers to
the electronic states of a complex containing 2 pigments,
ˆ
HS=
2
X
j=1
|jijhj|+J12(|1ih2|+|2ih1|),(2)
where jis the excited state energy of the jth site and J12
denotes the electronic coupling between both sites. Here
we consider that each pigment is coupled to a separate
bath. The bath Hamiltonian represents the environmen-
tal phonons,
ˆ
HB=
2
X
j=1
ˆ
HBj,ˆ
HBj=X
α
~ωj,αˆp2
j,α + ˆq2
j,α
2,(3)
where pis the conjugate momentum, qis the dimension-
less coordinate and ωj,α is the frequency of the jth site
and αth phonon mode, respectively. The last term of
Eq. (1) represents the fluctuations in the site energies
caused by the phonon dynamics,
ˆ
HSB =
2
X
j=1
ˆuj|ji hj|,ˆuj=X
α
gj,α ˆqj,α,(4)
where gj,α is the coupling constant between the jth site
and αth phonon mode.
The spectral density Jj(ω) specifies the coupling of
an electronic transition of the jth pigment to the envi-
ronmental phonons through the reorganization energy λj
and the timescale of the phonon relaxation γj. Here it
is expressed as the Ohmic spectral density with Lorentz-
Drude cut-off, Jj(ω)=2λjγjω/(ω2+γ2
j).
We focus on the application of this theory to EET at
physiological temperatures of around 300 K, hence, the
high-temperature condition characterized by ~γj/kBT
1 is imposed and the following hierarchically coupled
equations of motion are given [4],
t ˆσ(n1,n2)(t) = iˆ
Le+n1γ1+n2γ2ˆσ(n1,n2)(t)
+ˆ
Φ1ˆσ(n1+1,n2)(t) + n1ˆ
Θ1ˆσ(n11,n2)(t)
+ˆ
Φ2ˆσ(n1,n2+1)(t) + n2ˆ
Θ2ˆσ(n1,n21)(t).(5)
In Eq. (5), the element ˆσ(0, t) is identical to the reduced
density operator ˆρ(t), while the rest are auxiliary density
operators. The Liouvillian corresponding to the Hamil-
tonian ˆ
HSis denoted by ˆ
Leand the relaxation operators
ˆ
Φjand and ˆ
Θjare given by Eqs. (6), (7) and (8),
ˆ
Le= [ ˆ
HS,ˆρS],(6)
ˆ
Φj=iV ×
j, V ×
jy= [Vj, y],(7)
ˆ
Θj=i2λj
β~2V×
jiλj
~γjV
j, V
jy={Vj, y}.(8)
Formally the hierarchy in Eq. (5) is infinite and cannot
be numerically integrated. In order to make this problem
tractable, the hierarchy can be terminated at a certain
depth. There are several methods of doing so and in this
work we have chosen the following termination condition
following Ishizaki and Fleming [12]. For the integers n=
(n1, n2) and for characteristic frequency ωeof ˆ
Lewhere
N ≡
2
X
j=1
njωe
min(γ1, γ2),(9)
Eq. (5) is replaced by
t ˆσ(n, t) = iˆ
Leˆσ(n, t).(10)
t ˆσ(n, t) =
iˆ
Le+
N
X
j=1
njγj
ˆσ(n, t)
+
N
X
j=1 ˆ
Φjˆσ(nj+, t) + njˆ
Θjˆσ(nj, t).(11)
Eq. (11) is the general form of the reduced hierarchy
equation Eq. (5). The general form is employed in the
following section as it is solved to generate the time series
used in training and testing the models in the subsequent
sections for systems containing more than two sites.
III. DATA PRE-PROCESSING
Building a representative data set is an important first
step in every machine learning project. Though the ap-
proach developed in this work can be generalized, we
discuss the simplest electronic energy transfer system -
3
FIG. 1: An example of a sequence generated by the
HEOM for a dimer to be split to form the input and
output data for the models.
a dimer (a spin-boson-type model [13]) or two-level sys-
tem as well as three- and four-level systems where lin-
ear chain configurations were imposed. The total system
under study can be fully determined by the five inde-
pendent energy scales: the site energy j, the coupling
strength Jjk, the reorganization energy λ, the cut-off fre-
quency ωcand thermal energy kBTof the bath. In or-
der to create a data set suitable for a framework aimed
at predicting quantum dynamics in all physically realiz-
able non-Markovian regimes, three parameters were ex-
tensively sampled while fixing two parameters: the cut-
off frequency ωc= 53cm1and thermal energy kBTof
the bath where T= 300K. These are the typical param-
eters of photosynthetic EET [14, 15] as seen in Table I.
Parameter Lower limit cm1Upper limit cm1
j-100 100
Jjk -100 100
λ1 100
TABLE I: The data set containing time-evolved reduced
density matrices is generated for all combinations of the
following parameters: the site energy j, the coupling
strength Jjk and the reorganization energy λ.
The HEOM method implemented in Python script is
used to solve equation (5). The hierarchy truncation is
set to 20 which is a sufficient depth based on the chosen
cut-off frequency [12]. To make sense of time series data,
it has to be collected over time in the same intervals.
The total propagation time is set to 1.0 ps as is sufficient
for observing coherent dynamics in photosynthetic EET
[12]. For each of the 40000 samples, observables based
on the diagonal elements of the time dependent density
matrices i.e. the time evolution of the site populations
are collected. Both of the generated observables are di-
vided into shorter trajectories or sequences to test the
capability of the models for varying output times to be
predicted.
One of our objectives is to determine the trade-off be-
tween how far ahead the model can forecast and the
shortest input time required to maintain high accuracy
in the forecast. As the total propagation time is fixed to
1.0 ps, the actual size of the dataset used varies depend-
ing on the lengths of the input and output times. We
have performed a grid search in our model testing across
varying lengths of input times up to 0.2 ps and output
times between 0.01 ps and 0.6 ps.
Before testing, each sample time series was split into
multiple shorter slices. For example, a single series split
to produce inputs that are 0.2 ps long and outputs 0.6 ps
long would generate 1001 shorter sequences. This sliding
window technique is pictured in Figure 2
FIG. 2: The use of preceding time steps to predict the
following time steps is referred to as the sliding window
process. In the figure, the blue portion represents the
points in the series that will be used for training and
the yellow portion will be used in testing the models.
Each training/ testing set differs as the ”window” is
moved forward over the entire time series by keeping
the length of the portions fixed.
The data set is partitioned into a training set of 70%
of the data and a validation set of 10% of the data. Ad-
ditionally, 20% of the data is held out during the training
procedure and is used for testing.
IV. METHODOLOGY
In practice, a suitable model is fitted to a given time
series and the corresponding parameters of the under-
lying are estimated using the known data values. The
procedure of fitting a time series to a proper model is
known as Time Series Analysis. It comprises of methods
that attempt to understand the nature of the series and
is often useful for future forecasting and simulation. Past
observations are collected and analyzed to build a suit-
able mathematical model which captures the underlying
data generating process for the series. Then in time se-
ries forecasting, the future events are predicted using the
model [16, 17].
Competent time series analysis remains dominated by
traditional statistical methods as well as simpler machine
learning techniques such as ensembles of trees and linear
fits.
摘要:

Statisticalandmachinelearningapproachesforpredictionoflong-timeexcitationenergytransferdynamicsKimaraNaicker,1,2,IlyaSinayskiy,1,2andFrancescoPetruccione1,2,31QuantumResearchGroup,SchoolofChemistryandPhysics,UniversityofKwaZulu-Natal,Durban,KwaZulu-Natal,4001,SouthAfrica2NationalInstituteforTheoret...

展开>> 收起<<
Statistical and machine learning approaches for prediction of long-time excitation energy transfer dynamics Kimara Naicker1 2Ilya Sinayskiy1 2and Francesco Petruccione1 2 3.pdf

共11页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:11 页 大小:709.58KB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 11
客服
关注