Characterization of anomalous diusion through convolutional transformers Nicol as Firbas1yOscar Garibo-i-Orts2 Miguel Angel_2

2025-04-30 0 0 5.86MB 20 页 10玖币

Characterization of anomalous diﬀusion through

convolutional transformers

Nicol´as Firbas1,†,`

Oscar Garibo-i-Orts2, Miguel ´

Angel

Garcia-March3, J. Alberto Conejero3

1DBS - Department of Biological Sciences, National University of Singapore, 16

Science Drive 4, Singapore 117558, Singapore.

2VRAIN - Valencian Research Institute for Artiﬁcial Intelligence, Universitat

Polit`ecnica de Val`encia, 46022 Val`encia, Spain.

3IUMPA - Instituto Universitario de Matem´atica Pura y Aplicada, Universitat

Polit`ecnica de Val`encia, 46022 Val`encia, Spain.

E-mail: †aconejero@upv.es (Corresponding author).

October 20, 2022

Abstract.

The results of the Anomalous Diﬀusion Challenge (AnDi Challenge) [30] have shown

that machine learning methods can outperform classical statistical methodology at the

characterization of anomalous diﬀusion in both the inference of the anomalous diﬀusion

exponent αassociated with each trajectory (Task 1), and the determination of the

underlying diﬀusive regime which produced such trajectories (Task 2). Furthermore,

of the ﬁve teams that ﬁnished in the top three across both tasks of the AnDi challenge,

three of those teams used recurrent neural networks (RNNs). While RNNs, like

the long short-term memory (LSTM) network, are eﬀective at learning long-term

dependencies in sequential data, their key disadvantage is that they must be trained

sequentially. In order to facilitate training with larger data sets, by training in

parallel, we propose a new transformer based neural network architecture for the

characterization of anomalous diﬀusion. Our new architecture, the Convolutional

Transformer (ConvTransformer) uses a bi-layered convolutional neural network to

extract features from our diﬀusive trajectories that can be thought of as being words

in a sentence. These features are then fed to two transformer encoding blocks that

perform either regression (Task 1) or classiﬁcation (Task 2). To our knowledge, this

is the ﬁrst time transformers have been used for characterizing anomalous diﬀusion.

Moreover, this may be the ﬁrst time that a transformer encoding block has been used

with a convolutional neural network and without the need for a transformer decoding

block or positional encoding. Apart from being able to train in parallel, we show that

the ConvTransformer is able to outperform the previous state of the art at determining

the underlying diﬀusive regime (Task2) in short trajectories (length 10-50 steps), which

are the most important for experimental researchers.

Keywords: anomalous diﬀusion, machine learning, recurrent neural networks,

convolutional networks, transformers, attention

arXiv:2210.04959v1 [cs.LG] 10 Oct 2022

Anomalous Diﬀusion through Convolutional Transformers 2

1. Introduction

It could be said that the study of diﬀusion began in 1827 when Brown ﬁrst observed the

motion, which now carries his namesake, of pollen from Clarkia pulchella suspended in

water [5]. This movement results from small particles being bombarded by the molecules

of the liquid in which they are suspended, as was ﬁrst conjectured by Einstein and later

veriﬁed by Perrin [32]. Though Brown never managed to explain the movement he

observed, we now know that Brownian motion is a kind of normal diﬀusion.

To describe diﬀusion, we can consider the following analogy: Let us imagine a

particle being an ant, or some other diminutive explorer, we can then think of mean

squared displacement (MSD), which can be written as hx2i, as the portion of the

system that it has explored. For normal diﬀusion such as Brownian motion, the relation

between the portion of explored region and time is linear, hx2i ∼ t. As time progresses,

the expected value of distance explored by our ant (MSD) will remain constant. In

contrast to normal diﬀusion, anomalous diﬀusion is characterized by hx2i ∼ tα, α 6= 1.

Anomalous diﬀusion can be further subdivided into super-diﬀusion and sub-diﬀusion,

when α > 1 or α < 1, respectively. To continue using the analogy of our ant, an intuitive

example of sub-diﬀusion would be diﬀusion on a fractal. In this case, it is easy to see

how, as time progresses and our ant ventures into zones of increasing complexity, its

movement will in turn be slowed. Thus the relationship of space explored and time will

be hx2i ∼ tα, α < 1. Conversely, if we give our ant wings and have it randomly take

ﬂight at random times tisampled from t−σ−1with ﬂight times positively correlated to

the wait time, then for σ∈(0,2) we would have a super-diﬀusive L´evy ﬂight trajectory.

Since the discovery of Brownian motion, many systems have shown diﬀusive

behavior that deviates from the normal one, where MSD scales linearly with time. These

systems can range from the atomic scale to complex organisms such as birds. Examples

of such diﬀusive systems include ultra-cold atoms [33], telomeres in the nuclei of cells [4],

moisture transport in cement-based materials, the free movement of arthropods [31],

and the migration patterns of birds [39]. Anomalous diﬀusive patterns can even be

observed in signals that are not directly related to movement, such as heartbeat intervals

and DNA [6, pg. 49-89]. The interdisciplinary scope of anomalous diﬀusion highlights

the need for modeling frameworks that are able to quickly and accurately characterize

diﬀusion in real-life scenarios, where data is often limited and noisy.

Despite the importance of anomalous diﬀusion in many ﬁelds of study [23],

detection and characterization remain diﬃcult to this day. Traditionally, mean squared

displacement (MSD(t)∼tα) and its anomalous diﬀusion exponent αhave been used to

characterize diﬀusion. In practice, computation of MSD is often challenging as we often

work with a limited number of points in the trajectories, which may be short and/or

noisy, highlighting a need for a robust method for real-world conditions. The problem

with using αto characterize anomalous diﬀusion is that trajectories often have the same

anomalous diﬀusion exponent while having diﬀerent underlying diﬀusive regimes. An

Anomalous Diﬀusion through Convolutional Transformers 3

example would be the motion messenger RNA (mRNA) in a living E. coli cell. The

individual trajectories of the mRNA share roughly the same αdespite their trajectories

being quite distinct [26].

Being able to classify trajectories based on their underlying diﬀusive regime is

useful because it can shed light on the underlying behavior of the particles undergoing

diﬀusion. This could be more important for experimental researchers, which may be

more concerned with how a particle moves not necessarily how much it has moved.

In this vein, the AnDi (Anomalous Diﬀusion) Challenge organizers identiﬁed the

following ﬁve diﬀusive models [28]: the continuous-time random walk (CTRW) [34],

fractional Brownian motion (FBM) [22], the L´evy walk (LW) [15], annealed transient

motion (ATTM) [24], and scaled Brownian motion (SBM) [18], with which to classify

trajectories. This information is not meant to supplant traditional MSD-based analysis,

rather, it is meant to give us additional information about the underlying stochastic

process behind the trajectory. For example, for a particular exponent α, one may not

have access to an ensemble of homogeneous trajectories. Moreover, one cannot assure

that all measured trajectories have the same behavior and can therefore be associated

with the same anomalous exponent α. In these cases, it may be possible to explain the

behavior of the diﬀusing particles by using what we know about ﬁve models mentioned

above.

The ﬁrst applications of machine learning methods to the study of diﬀusion aimed

to discriminate among conﬁned, anomalous, normal qualitatively, and directed motion

[8, 16]. These ML models did not extract quantitative information nor determining did

they determine the underlying physical model. At ﬁrst long short-term memory (LSTM)

recurrent neural networks [13] were considered for the analysis of anomalous diﬀusion

trajectories from experimental data in [3]. Later, Mu˜noz-Gil et al. [27] computed the

distances between consecutive positions in raw trajectories and normalized them by

dividing by the standard deviation. Then, their cumulative sums fed random forest

algorithms that permit to infer the anomalous exponent αand to classify the trajectory

in one of these models, CTRW, FBM, or LW. Random forests and gradient boosting

methods were already considered for the study of fractional anomalous diﬀusion of

single-particle trajectories in [14,20].

The results of the AnDi Challenge [30] showed that machine learning (ML)

algorithms outperform traditional statistical techniques in the inference of the

anomalous diﬀusion exponent (Task 1) and in the classiﬁcation of the underlying

diﬀusion model (Task 2), across one, two, and three dimensions. Some of the most

successful techniques consisted of: a couple of convolutional layers combined with some

bidirectional LSTM layers and a ﬁnal dense layer [9], two LSTM layers of decreasing

size with a ﬁnal dense layer [1], a WaveNet encoder with LSTM layers [17] , or the

extraction of classical statistics features combined with three deep feed-forward neural

networks [10].

As we can see, the best performing methods from the AnDi Challenge were either

Anomalous Diﬀusion through Convolutional Transformers 4

entirely based on LSTMs recurrent neural networks or incorporated them as part of

a larger architecture. For many years, LSTM have been one of the most successful

techniques in natural language processing (NLP) and time series analysis. As a matter

of fact, Google Translate algorithm is a stack of just seven large LSTM layers [41].

However, since the landmark paper Attention is All You Need [38], transformers have

become the dominant architecture in NLP, where they have surpassed previous models

based on convolutions and recurrent neural networks [40]. Inspired by the transformers’

success and by drawing a parallel between the sequential nature of language and the

diﬀusion of a single particle, we propose a new architecture combining convolutional

layers with transformers, that we will call a convolutional transformer: the Convolutional

Transformer (ConvTransformer)

1.1. The Convolutional Transformer

The ConvTransformers method has been applied to both the inference the anomalous

diﬀusion exponent α(Task 1) and the determination of the underlying diﬀusion model

(Task 2). As the name suggests, the ConvTransformer uses two convolutional layers

followed by a transformer encoding block. However, unlike the transformer in [38], our

method uses only two transformer encoding blocks in sequence without a transformer

decoding block or positional encoding. The convolutional layers behave as an encoder

extracting both linear and non-linear features from the trajectory while retaining

spatiotemporal awareness eliminating the need for positional encoding. These features

are then passed to the transformer encoding layers where attention is performed upon

them. The ConvTransformer structure can be intuitively understood if we consider a

single trajectory is akin to a sentence. In this analogy the CNNs are used to create

pseudo-words, which are the features produced by the CNNs. Finally, we perform

attention twice on the pseudo-words with our transformer encoder, which allows us to

determine which features are the most important, and from there we are able to obtain

either our αor the underlying diﬀusive regime model.

The ConvTransformer does not require positional encoding because the CNN

kernel moves across the trajectory to create the features. As the CNN kernel moves

along the trajectory, it learns positional information, negating the need for positional

encoding prior to the transformer encoding block. This was assessed by testing the

ConvTransformer on the Task 2 with ﬁve-fold validation on a training set of size 50K

(32K for training, 8K for validation, and 10K for testing), using the same set of hyper-

parameters, with and without the trigonometric encoding scheme used in Vaswani et al.

2017 [38]. A ﬁve-fold validation of models trained with and without positional encoding

showed showed that model classiﬁcation accuracy decreased with positional encoding to

a mean 72.39% and standard deviation of 4.86 from 75.66% and standard deviation of

1.54. Thus, positional encoding did not improve ConvTransformer performance, and it

was omitted from the model.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

CharacterizationofanomalousdiusionthroughconvolutionaltransformersNicolasFirbas1;y,OscarGaribo-i-Orts2,MiguelAngelGarcia-March3,J.AlbertoConejero31DBS-DepartmentofBiologicalSciences,NationalUniversityofSingapore,16ScienceDrive4,Singapore117558,Singapore.2VRAIN-ValencianResearchInstituteforArtic...

展开>> 收起<<

Characterization of anomalous diusion through convolutional transformers Nicol as Firbas1yOscar Garibo-i-Orts2 Miguel Angel_2.pdf

共20页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

相关推荐

更多

立即下载

分类：图书资源 价格：10玖币 属性：20 页 大小：5.86MB 格式：PDF 时间：2025-04-30

开通VIP享超值会员特权

多端同步记录
高速下载文档
免费文档工具
分享文档赚钱
每日登录抽奖
优质衍生服务

作者详情

MAOOA..
高级编辑

文档 14219 粉丝 0

相关内容

更多

热门标签

高考理综人际关系配电装置动力学连接体力的合成全宋诗作者索引公务员考试

/ 20

评分收藏

立即下载

关于我们联系我们隐私政策用户协议免责申明会员服务协议
本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！ Copyright ©Jiubeiyunall rights reserved SITEMAP| 备案号：渝ICP备2024044455号| 渝公网安备50010702506394 | 违法与不良信息举报方式：微信:jiubeiyun2024,QQ:264159069,电话:15523442343,邮箱:jiubeiyun@126.com

客服

关注

二维码已失效
刷新

打开微信，点击“扫一扫”

安全高效便捷

免密登录