EmbryosFormer Deformable Transformer and Collaborative Encoding-Decoding for Embryos Stage Development Classiﬁcation Tien-Phat Nguyen14 Trong-Thang Pham4 Tri Nguyen7 Hieu Le1 Dung Nguyen5

2025-04-29 1 0 1.46MB 10 页 10玖币

侵权投诉

EmbryosFormer: Deformable Transformer and Collaborative

Encoding-Decoding for Embryos Stage Development Classiﬁcation

Tien-Phat Nguyen1,4, Trong-Thang Pham*4, Tri Nguyen*7, Hieu Le*1, Dung Nguyen5,

Hau Lam6, Phong Nguyen1, Jennifer Fowler8, Minh-Triet Tran2,3,4, Ngan Le9

1FPT Software AI Center, Ho Chi Minh City, Vietnam

2University of Science, VNU-HCM; 3Vietnam National University, Ho Chi Minh City, Vietnam

4John von Neumann Institute, Vietnam National University, Ho Chi Minh City, Vietnam

5IVFMD, My Duc Phu Nhuan hospital, Ho Chi Minh City, Vietnam

6Olea Fertility, Vinmec Central Park International Hospital, Ho Chi Minh City, Vietnam

7HOPE Research Center, My Duc Hospital, Ho Chi Minh City, Vietnam

8Arkansas Economic Development Commission, Little Rock, AR USA 72202

9Department of Computer Science and Computer Engineering, University of Arkansas, Fayetteville, AR, USA 72703

*equal contribution

Abstract

The timing of cell divisions in early embryos during the

In-Vitro Fertilization (IVF) process is a key predictor of em-

bryo viability. However, observing cell divisions in Time-

Lapse Monitoring (TLM) is a time-consuming process and

highly depends on experts. In this paper, we propose Em-

bryosFormer, a computational model to automatically de-

tect and classify cell divisions from original time-lapse im-

ages. Our proposed network is designed as an encoder-

decoder deformable transformer with collaborative heads.

The transformer contracting path predicts per-image la-

bels and is optimized by a classiﬁcation head. The trans-

former expanding path models the temporal coherency be-

tween embryo images to ensure monotonic non-decreasing

constraint and is optimized by a segmentation head. Both

contracting and expanding paths are synergetically learned

by a collaboration head. We have benchmarked our pro-

posed EmbryosFormer on two datasets: a public dataset

with mouse embryos with 8-cell stage and an in-house

dataset with human embryos with 4-cell stage. Source code:

https://github.com/UARK-AICV/Embryos.

1. Introduction

Fertility impairment affects approximately 80 million

people globally, with one in every six couples undergoing

infertility issues [1, 27]. This necessitates the use of IVF

for conceiving. During the IVF procedure, a patient is stim-

ulated to produce multiple oocytes. Then, a fraction of them

fertilises, and a smaller fraction continues to grow and de-

velop normally as embryos before being transferred into the

uterus. Because of the increased maternal and fetal risks as-

sociated with multi-fetal gestation, only one embryo with

the highest viability should be chosen for implantation at a

time [31, 32]. Clinically, embryologists select potential em-

bryos manually by considering the morphological features

and rate of development. Unlike the traditional monitoring

process, where embryos are taken out of the incubators at

discrete time points, Time-Lapse Monitoring (TLM) tech-

niques offer more general and uninterrupted observations

on embryo development processes, where embryos are be-

ing kept safely in their culprits without any external inter-

vention while built-in microscope systems periodically cap-

ture data of the embryos inside [28]. However, TLM still

requires human expertise and experience. Consequently, re-

sults often come with variability and large labor expenses.

Therefore, there is an emerging demand for developing an

automated and time-effective tool to support embryologists

in the selection processes.

Embryo morphology is captured at discrete time points

in real-world settings. As a result, the characteristics

or position of the embryo can vary rapidly and unex-

pectedly from frame to frame. Recently, Deep Neural

Networks (DNNs), particular Convolutional Neural Net-

works (CNNs) have made signiﬁcant progress in providing

decision-making solutions at the human-expert level. Their

successes have been reported in different ﬁelds and modali-

ties among the diagnostic medical domain [35, 48], such as

chest x-rays abnormalities recognition [9, 19, 18], provision

arXiv:2210.04615v1 [cs.CV] 7 Oct 2022

of biomarkers of tumors on MRI images [12, 30, 46], organs

structure analysis on MRI images [17, 20, 13, 41]. DNNs

have been recently applied to the task of classifying embryo

stage development. Existing methods [21, 23, 24, 25, 26]

consider time-lapse embryos videos as sequences of images

and utilize 2D-CNNs to perform per-frame classiﬁcation,

then apply a post-processing step with dynamic program-

ming to enforce the predictions following the monotonic

non-decreasing constraint. Such approaches deal with the

high imbalance across classes as well as can not consider

temporal information. Other works [24, 23] introduce two-

stream networks that incorporate temporal information to

address the imbalance issue while incorporating the mono-

tonic constraint into the learning stage. Despite showing

promising results, these methods process a ﬁxed size of

frame sequence at a time, which could lack the global con-

text of the entire video and also increase the inference time.

In this work, we utilize deformable Transformer [49] to pro-

pose an encoder-decoder deformable Transformer network

for embryos stage development classiﬁcation. Our pro-

posed network contains three heads aiming classiﬁcation,

segmentation, and reﬁnement. Our contribution is two-fold

as follows:

•Dataset: We have conducted an Embryos Human dataset

with a total of 440 time-lapse videos of 148,918 images,

gathered from a real-world environment and collected from

a diverse number of patients. The dataset has been carefully

pre-processed, annotated and conducted by three embryol-

ogists. The data will be made available for the research

community, please contact the author.

•Methodology: We propose EmbryosFormer, an effective

framework for monitoring embryo stage development. Our

network is built based on the Unet-like architecture with de-

formable transformer blocks and contains two paths. A con-

trasting path (i.e. deformable transformer encoder) aims to

predict per-class label, whereas an expanding path (i.e. de-

formable transformer decoder) models stage-level by taking

temporal consistency into consideration. The feature en-

coding at the encoding path is optimized by a classiﬁcation

head, and the temporal coherency at the decoding path is

trained by a segmentation head. Both encoding and decod-

ing paths are cooperatively learned by a collaboration head.

We empirically validate the effectiveness of our proposed

EmbryosFormer by showing that, to the best of our knowl-

edge, it achieves superior performance to all of the current

state-of-the-art methods benchmarked on the two datasets

of Embryos Mouse and Human.

2. Related Work

2.1. Detection Transformer

The core idea behind transformer architecture [42] is

the self-attention mechanism to capture long-range relation-

ships. Transformer has been successfully applied to en-

rich global information in computer vision [47, 4, 43, 40].

When it comes to object detection, Detection Transformer

(DETR) [2] is one of the most well-known approaches,

which performs the task as a set prediction. Unlike tra-

ditional CNNs-based methods [34, 10], Detection Trans-

former (DETR) [2] performs the task as a set prediction.

Even DETR obtains good performance while providing an

efﬁcient way to represent each detected element, it suffers

from high computing complexity of quadratic growth with

image size and slow convergence of global attention mecha-

nism. The recent Deformable Transformer [49] is proposed

to address the limitations while gaining better performance

by incorporating multi-scale feature representation and at-

tending to sparse spatial locations of images. Not only in

the image domain, but DETR is also successfully applied to

the video domain e.g. dense video captioning PDVC [44].

2.2. Embryo stage development classiﬁcation

Classifying embryo development stages aims to provide

a cue for quality assessment of fertilized blastocysts, which

requires complex analyses of time-lapse imaging videos be-

sides identifying development stages. Traditionally, embry-

ologists must review the embryo images to determine the

time of division for each cell stage development. This pro-

cess does require not only expert knowledge but also ex-

perience and is time-consuming. With the emergence of

DNNs, CNNs have been used to assess embryo images.

Generally, DNNs-based embryo stage development classi-

ﬁcation can be divided into two categories: image-based

and sequence-based. In the ﬁrst group, Khan et al., [14]

utilizes CNNs (i.e., AlexNet) [16] and a Conditional Ran-

dom Field (CRF) [37] to count human embryonic cell over

the ﬁrst ﬁve cell stages. Ng et al., [29] used ResNet [11]

coupled with a dynamic programming algorithm for post-

processing to predict morphokinetic annotations in human

embryos. Later, Lau et al., extend [29] with region of in-

terest (ROI) detection and LSTM [8] for sequential clas-

siﬁcation. Rad et al., [33] proposes Cell-Net, which uses

ResNet-50 [11] to parse centroids of each cell from embryo

image. Leahy et al., [21] extracts ﬁve key features from

time-lapse videos, including stage classiﬁcation, which uti-

lizes ResNeXt101 [45] to predict per-class probability for

each image. Malmsten et al., [25] uses Inception-V3 [39]

to classify human embryo images into different cell divi-

sion stages, up to eight cells. While showing promising re-

sults on automatically classifying embryonic cell stage de-

velopment with DNNs, image-based prediction approaches

ignore temporal coherence between time-lapse images and

the monotonic development order constraint during train-

ing. In the second group, Lukyanenko et al., [24] incorpo-

rate CRFs [37] to include the monotonic condition into the

learning process for sequential stage prediction. Lockhart

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

EmbryosFormer:DeformableTransformerandCollaborativeEncoding-DecodingforEmbryosStageDevelopmentClassicationTien-PhatNguyen1,4,Trong-ThangPham*4,TriNguyen*7,HieuLe*1,DungNguyen5,HauLam6,PhongNguyen1,JenniferFowler8,Minh-TrietTran2,3,4,NganLe91FPTSoftwareAICenter,HoChiMinhCity,Vietnam2UniversityofScie...

展开>> 收起<<

EmbryosFormer Deformable Transformer and Collaborative Encoding-Decoding for Embryos Stage Development Classiﬁcation Tien-Phat Nguyen14 Trong-Thang Pham4 Tri Nguyen7 Hieu Le1 Dung Nguyen5.pdf

共10页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

EmbryosFormer Deformable Transformer and Collaborative Encoding-Decoding for Embryos Stage Development Classiﬁcation Tien-Phat Nguyen14 Trong-Thang Pham4 Tri Nguyen7 Hieu Le1 Dung Nguyen5

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: