Characterization of anomalous diusion through convolutional transformers Nicol as Firbas1yOscar Garibo-i-Orts2 Miguel Angel_2

2025-04-30 0 0 5.86MB 20 页 10玖币
侵权投诉
Characterization of anomalous diffusion through
convolutional transformers
Nicol´as Firbas1,,`
Oscar Garibo-i-Orts2, Miguel ´
Angel
Garcia-March3, J. Alberto Conejero3
1DBS - Department of Biological Sciences, National University of Singapore, 16
Science Drive 4, Singapore 117558, Singapore.
2VRAIN - Valencian Research Institute for Artificial Intelligence, Universitat
Polit`ecnica de Val`encia, 46022 Val`encia, Spain.
3IUMPA - Instituto Universitario de Matem´atica Pura y Aplicada, Universitat
Polit`ecnica de Val`encia, 46022 Val`encia, Spain.
E-mail: aconejero@upv.es (Corresponding author).
October 20, 2022
Abstract.
The results of the Anomalous Diffusion Challenge (AnDi Challenge) [30] have shown
that machine learning methods can outperform classical statistical methodology at the
characterization of anomalous diffusion in both the inference of the anomalous diffusion
exponent αassociated with each trajectory (Task 1), and the determination of the
underlying diffusive regime which produced such trajectories (Task 2). Furthermore,
of the five teams that finished in the top three across both tasks of the AnDi challenge,
three of those teams used recurrent neural networks (RNNs). While RNNs, like
the long short-term memory (LSTM) network, are effective at learning long-term
dependencies in sequential data, their key disadvantage is that they must be trained
sequentially. In order to facilitate training with larger data sets, by training in
parallel, we propose a new transformer based neural network architecture for the
characterization of anomalous diffusion. Our new architecture, the Convolutional
Transformer (ConvTransformer) uses a bi-layered convolutional neural network to
extract features from our diffusive trajectories that can be thought of as being words
in a sentence. These features are then fed to two transformer encoding blocks that
perform either regression (Task 1) or classification (Task 2). To our knowledge, this
is the first time transformers have been used for characterizing anomalous diffusion.
Moreover, this may be the first time that a transformer encoding block has been used
with a convolutional neural network and without the need for a transformer decoding
block or positional encoding. Apart from being able to train in parallel, we show that
the ConvTransformer is able to outperform the previous state of the art at determining
the underlying diffusive regime (Task2) in short trajectories (length 10-50 steps), which
are the most important for experimental researchers.
Keywords: anomalous diffusion, machine learning, recurrent neural networks,
convolutional networks, transformers, attention
arXiv:2210.04959v1 [cs.LG] 10 Oct 2022
Anomalous Diffusion through Convolutional Transformers 2
1. Introduction
It could be said that the study of diffusion began in 1827 when Brown first observed the
motion, which now carries his namesake, of pollen from Clarkia pulchella suspended in
water [5]. This movement results from small particles being bombarded by the molecules
of the liquid in which they are suspended, as was first conjectured by Einstein and later
verified by Perrin [32]. Though Brown never managed to explain the movement he
observed, we now know that Brownian motion is a kind of normal diffusion.
To describe diffusion, we can consider the following analogy: Let us imagine a
particle being an ant, or some other diminutive explorer, we can then think of mean
squared displacement (MSD), which can be written as hx2i, as the portion of the
system that it has explored. For normal diffusion such as Brownian motion, the relation
between the portion of explored region and time is linear, hx2i ∼ t. As time progresses,
the expected value of distance explored by our ant (MSD) will remain constant. In
contrast to normal diffusion, anomalous diffusion is characterized by hx2i ∼ tα, α 6= 1.
Anomalous diffusion can be further subdivided into super-diffusion and sub-diffusion,
when α > 1 or α < 1, respectively. To continue using the analogy of our ant, an intuitive
example of sub-diffusion would be diffusion on a fractal. In this case, it is easy to see
how, as time progresses and our ant ventures into zones of increasing complexity, its
movement will in turn be slowed. Thus the relationship of space explored and time will
be hx2i ∼ tα, α < 1. Conversely, if we give our ant wings and have it randomly take
flight at random times tisampled from tσ1with flight times positively correlated to
the wait time, then for σ(0,2) we would have a super-diffusive L´evy flight trajectory.
Since the discovery of Brownian motion, many systems have shown diffusive
behavior that deviates from the normal one, where MSD scales linearly with time. These
systems can range from the atomic scale to complex organisms such as birds. Examples
of such diffusive systems include ultra-cold atoms [33], telomeres in the nuclei of cells [4],
moisture transport in cement-based materials, the free movement of arthropods [31],
and the migration patterns of birds [39]. Anomalous diffusive patterns can even be
observed in signals that are not directly related to movement, such as heartbeat intervals
and DNA [6, pg. 49-89]. The interdisciplinary scope of anomalous diffusion highlights
the need for modeling frameworks that are able to quickly and accurately characterize
diffusion in real-life scenarios, where data is often limited and noisy.
Despite the importance of anomalous diffusion in many fields of study [23],
detection and characterization remain difficult to this day. Traditionally, mean squared
displacement (MSD(t)tα) and its anomalous diffusion exponent αhave been used to
characterize diffusion. In practice, computation of MSD is often challenging as we often
work with a limited number of points in the trajectories, which may be short and/or
noisy, highlighting a need for a robust method for real-world conditions. The problem
with using αto characterize anomalous diffusion is that trajectories often have the same
anomalous diffusion exponent while having different underlying diffusive regimes. An
Anomalous Diffusion through Convolutional Transformers 3
example would be the motion messenger RNA (mRNA) in a living E. coli cell. The
individual trajectories of the mRNA share roughly the same αdespite their trajectories
being quite distinct [26].
Being able to classify trajectories based on their underlying diffusive regime is
useful because it can shed light on the underlying behavior of the particles undergoing
diffusion. This could be more important for experimental researchers, which may be
more concerned with how a particle moves not necessarily how much it has moved.
In this vein, the AnDi (Anomalous Diffusion) Challenge organizers identified the
following five diffusive models [28]: the continuous-time random walk (CTRW) [34],
fractional Brownian motion (FBM) [22], the L´evy walk (LW) [15], annealed transient
motion (ATTM) [24], and scaled Brownian motion (SBM) [18], with which to classify
trajectories. This information is not meant to supplant traditional MSD-based analysis,
rather, it is meant to give us additional information about the underlying stochastic
process behind the trajectory. For example, for a particular exponent α, one may not
have access to an ensemble of homogeneous trajectories. Moreover, one cannot assure
that all measured trajectories have the same behavior and can therefore be associated
with the same anomalous exponent α. In these cases, it may be possible to explain the
behavior of the diffusing particles by using what we know about five models mentioned
above.
The first applications of machine learning methods to the study of diffusion aimed
to discriminate among confined, anomalous, normal qualitatively, and directed motion
[8, 16]. These ML models did not extract quantitative information nor determining did
they determine the underlying physical model. At first long short-term memory (LSTM)
recurrent neural networks [13] were considered for the analysis of anomalous diffusion
trajectories from experimental data in [3]. Later, Mu˜noz-Gil et al. [27] computed the
distances between consecutive positions in raw trajectories and normalized them by
dividing by the standard deviation. Then, their cumulative sums fed random forest
algorithms that permit to infer the anomalous exponent αand to classify the trajectory
in one of these models, CTRW, FBM, or LW. Random forests and gradient boosting
methods were already considered for the study of fractional anomalous diffusion of
single-particle trajectories in [14,20].
The results of the AnDi Challenge [30] showed that machine learning (ML)
algorithms outperform traditional statistical techniques in the inference of the
anomalous diffusion exponent (Task 1) and in the classification of the underlying
diffusion model (Task 2), across one, two, and three dimensions. Some of the most
successful techniques consisted of: a couple of convolutional layers combined with some
bidirectional LSTM layers and a final dense layer [9], two LSTM layers of decreasing
size with a final dense layer [1], a WaveNet encoder with LSTM layers [17] , or the
extraction of classical statistics features combined with three deep feed-forward neural
networks [10].
As we can see, the best performing methods from the AnDi Challenge were either
Anomalous Diffusion through Convolutional Transformers 4
entirely based on LSTMs recurrent neural networks or incorporated them as part of
a larger architecture. For many years, LSTM have been one of the most successful
techniques in natural language processing (NLP) and time series analysis. As a matter
of fact, Google Translate algorithm is a stack of just seven large LSTM layers [41].
However, since the landmark paper Attention is All You Need [38], transformers have
become the dominant architecture in NLP, where they have surpassed previous models
based on convolutions and recurrent neural networks [40]. Inspired by the transformers’
success and by drawing a parallel between the sequential nature of language and the
diffusion of a single particle, we propose a new architecture combining convolutional
layers with transformers, that we will call a convolutional transformer: the Convolutional
Transformer (ConvTransformer)
1.1. The Convolutional Transformer
The ConvTransformers method has been applied to both the inference the anomalous
diffusion exponent α(Task 1) and the determination of the underlying diffusion model
(Task 2). As the name suggests, the ConvTransformer uses two convolutional layers
followed by a transformer encoding block. However, unlike the transformer in [38], our
method uses only two transformer encoding blocks in sequence without a transformer
decoding block or positional encoding. The convolutional layers behave as an encoder
extracting both linear and non-linear features from the trajectory while retaining
spatiotemporal awareness eliminating the need for positional encoding. These features
are then passed to the transformer encoding layers where attention is performed upon
them. The ConvTransformer structure can be intuitively understood if we consider a
single trajectory is akin to a sentence. In this analogy the CNNs are used to create
pseudo-words, which are the features produced by the CNNs. Finally, we perform
attention twice on the pseudo-words with our transformer encoder, which allows us to
determine which features are the most important, and from there we are able to obtain
either our αor the underlying diffusive regime model.
The ConvTransformer does not require positional encoding because the CNN
kernel moves across the trajectory to create the features. As the CNN kernel moves
along the trajectory, it learns positional information, negating the need for positional
encoding prior to the transformer encoding block. This was assessed by testing the
ConvTransformer on the Task 2 with five-fold validation on a training set of size 50K
(32K for training, 8K for validation, and 10K for testing), using the same set of hyper-
parameters, with and without the trigonometric encoding scheme used in Vaswani et al.
2017 [38]. A five-fold validation of models trained with and without positional encoding
showed showed that model classification accuracy decreased with positional encoding to
a mean 72.39% and standard deviation of 4.86 from 75.66% and standard deviation of
1.54. Thus, positional encoding did not improve ConvTransformer performance, and it
was omitted from the model.
摘要:

Characterizationofanomalousdi usionthroughconvolutionaltransformersNicolasFirbas1;y,OscarGaribo-i-Orts2,MiguelAngelGarcia-March3,J.AlbertoConejero31DBS-DepartmentofBiologicalSciences,NationalUniversityofSingapore,16ScienceDrive4,Singapore117558,Singapore.2VRAIN-ValencianResearchInstituteforArti c...

展开>> 收起<<
Characterization of anomalous diusion through convolutional transformers Nicol as Firbas1yOscar Garibo-i-Orts2 Miguel Angel_2.pdf

共20页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:20 页 大小:5.86MB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 20
客服
关注