EmbryosFormer Deformable Transformer and Collaborative Encoding-Decoding for Embryos Stage Development Classification Tien-Phat Nguyen14 Trong-Thang Pham4 Tri Nguyen7 Hieu Le1 Dung Nguyen5

2025-04-29 0 0 1.46MB 10 页 10玖币
侵权投诉
EmbryosFormer: Deformable Transformer and Collaborative
Encoding-Decoding for Embryos Stage Development Classification
Tien-Phat Nguyen1,4, Trong-Thang Pham*4, Tri Nguyen*7, Hieu Le*1, Dung Nguyen5,
Hau Lam6, Phong Nguyen1, Jennifer Fowler8, Minh-Triet Tran2,3,4, Ngan Le9
1FPT Software AI Center, Ho Chi Minh City, Vietnam
2University of Science, VNU-HCM; 3Vietnam National University, Ho Chi Minh City, Vietnam
4John von Neumann Institute, Vietnam National University, Ho Chi Minh City, Vietnam
5IVFMD, My Duc Phu Nhuan hospital, Ho Chi Minh City, Vietnam
6Olea Fertility, Vinmec Central Park International Hospital, Ho Chi Minh City, Vietnam
7HOPE Research Center, My Duc Hospital, Ho Chi Minh City, Vietnam
8Arkansas Economic Development Commission, Little Rock, AR USA 72202
9Department of Computer Science and Computer Engineering, University of Arkansas, Fayetteville, AR, USA 72703
*equal contribution
Abstract
The timing of cell divisions in early embryos during the
In-Vitro Fertilization (IVF) process is a key predictor of em-
bryo viability. However, observing cell divisions in Time-
Lapse Monitoring (TLM) is a time-consuming process and
highly depends on experts. In this paper, we propose Em-
bryosFormer, a computational model to automatically de-
tect and classify cell divisions from original time-lapse im-
ages. Our proposed network is designed as an encoder-
decoder deformable transformer with collaborative heads.
The transformer contracting path predicts per-image la-
bels and is optimized by a classification head. The trans-
former expanding path models the temporal coherency be-
tween embryo images to ensure monotonic non-decreasing
constraint and is optimized by a segmentation head. Both
contracting and expanding paths are synergetically learned
by a collaboration head. We have benchmarked our pro-
posed EmbryosFormer on two datasets: a public dataset
with mouse embryos with 8-cell stage and an in-house
dataset with human embryos with 4-cell stage. Source code:
https://github.com/UARK-AICV/Embryos.
1. Introduction
Fertility impairment affects approximately 80 million
people globally, with one in every six couples undergoing
infertility issues [1, 27]. This necessitates the use of IVF
for conceiving. During the IVF procedure, a patient is stim-
ulated to produce multiple oocytes. Then, a fraction of them
fertilises, and a smaller fraction continues to grow and de-
velop normally as embryos before being transferred into the
uterus. Because of the increased maternal and fetal risks as-
sociated with multi-fetal gestation, only one embryo with
the highest viability should be chosen for implantation at a
time [31, 32]. Clinically, embryologists select potential em-
bryos manually by considering the morphological features
and rate of development. Unlike the traditional monitoring
process, where embryos are taken out of the incubators at
discrete time points, Time-Lapse Monitoring (TLM) tech-
niques offer more general and uninterrupted observations
on embryo development processes, where embryos are be-
ing kept safely in their culprits without any external inter-
vention while built-in microscope systems periodically cap-
ture data of the embryos inside [28]. However, TLM still
requires human expertise and experience. Consequently, re-
sults often come with variability and large labor expenses.
Therefore, there is an emerging demand for developing an
automated and time-effective tool to support embryologists
in the selection processes.
Embryo morphology is captured at discrete time points
in real-world settings. As a result, the characteristics
or position of the embryo can vary rapidly and unex-
pectedly from frame to frame. Recently, Deep Neural
Networks (DNNs), particular Convolutional Neural Net-
works (CNNs) have made significant progress in providing
decision-making solutions at the human-expert level. Their
successes have been reported in different fields and modali-
ties among the diagnostic medical domain [35, 48], such as
chest x-rays abnormalities recognition [9, 19, 18], provision
arXiv:2210.04615v1 [cs.CV] 7 Oct 2022
of biomarkers of tumors on MRI images [12, 30, 46], organs
structure analysis on MRI images [17, 20, 13, 41]. DNNs
have been recently applied to the task of classifying embryo
stage development. Existing methods [21, 23, 24, 25, 26]
consider time-lapse embryos videos as sequences of images
and utilize 2D-CNNs to perform per-frame classification,
then apply a post-processing step with dynamic program-
ming to enforce the predictions following the monotonic
non-decreasing constraint. Such approaches deal with the
high imbalance across classes as well as can not consider
temporal information. Other works [24, 23] introduce two-
stream networks that incorporate temporal information to
address the imbalance issue while incorporating the mono-
tonic constraint into the learning stage. Despite showing
promising results, these methods process a fixed size of
frame sequence at a time, which could lack the global con-
text of the entire video and also increase the inference time.
In this work, we utilize deformable Transformer [49] to pro-
pose an encoder-decoder deformable Transformer network
for embryos stage development classification. Our pro-
posed network contains three heads aiming classification,
segmentation, and refinement. Our contribution is two-fold
as follows:
Dataset: We have conducted an Embryos Human dataset
with a total of 440 time-lapse videos of 148,918 images,
gathered from a real-world environment and collected from
a diverse number of patients. The dataset has been carefully
pre-processed, annotated and conducted by three embryol-
ogists. The data will be made available for the research
community, please contact the author.
Methodology: We propose EmbryosFormer, an effective
framework for monitoring embryo stage development. Our
network is built based on the Unet-like architecture with de-
formable transformer blocks and contains two paths. A con-
trasting path (i.e. deformable transformer encoder) aims to
predict per-class label, whereas an expanding path (i.e. de-
formable transformer decoder) models stage-level by taking
temporal consistency into consideration. The feature en-
coding at the encoding path is optimized by a classification
head, and the temporal coherency at the decoding path is
trained by a segmentation head. Both encoding and decod-
ing paths are cooperatively learned by a collaboration head.
We empirically validate the effectiveness of our proposed
EmbryosFormer by showing that, to the best of our knowl-
edge, it achieves superior performance to all of the current
state-of-the-art methods benchmarked on the two datasets
of Embryos Mouse and Human.
2. Related Work
2.1. Detection Transformer
The core idea behind transformer architecture [42] is
the self-attention mechanism to capture long-range relation-
ships. Transformer has been successfully applied to en-
rich global information in computer vision [47, 4, 43, 40].
When it comes to object detection, Detection Transformer
(DETR) [2] is one of the most well-known approaches,
which performs the task as a set prediction. Unlike tra-
ditional CNNs-based methods [34, 10], Detection Trans-
former (DETR) [2] performs the task as a set prediction.
Even DETR obtains good performance while providing an
efficient way to represent each detected element, it suffers
from high computing complexity of quadratic growth with
image size and slow convergence of global attention mecha-
nism. The recent Deformable Transformer [49] is proposed
to address the limitations while gaining better performance
by incorporating multi-scale feature representation and at-
tending to sparse spatial locations of images. Not only in
the image domain, but DETR is also successfully applied to
the video domain e.g. dense video captioning PDVC [44].
2.2. Embryo stage development classification
Classifying embryo development stages aims to provide
a cue for quality assessment of fertilized blastocysts, which
requires complex analyses of time-lapse imaging videos be-
sides identifying development stages. Traditionally, embry-
ologists must review the embryo images to determine the
time of division for each cell stage development. This pro-
cess does require not only expert knowledge but also ex-
perience and is time-consuming. With the emergence of
DNNs, CNNs have been used to assess embryo images.
Generally, DNNs-based embryo stage development classi-
fication can be divided into two categories: image-based
and sequence-based. In the first group, Khan et al., [14]
utilizes CNNs (i.e., AlexNet) [16] and a Conditional Ran-
dom Field (CRF) [37] to count human embryonic cell over
the first five cell stages. Ng et al., [29] used ResNet [11]
coupled with a dynamic programming algorithm for post-
processing to predict morphokinetic annotations in human
embryos. Later, Lau et al., extend [29] with region of in-
terest (ROI) detection and LSTM [8] for sequential clas-
sification. Rad et al., [33] proposes Cell-Net, which uses
ResNet-50 [11] to parse centroids of each cell from embryo
image. Leahy et al., [21] extracts five key features from
time-lapse videos, including stage classification, which uti-
lizes ResNeXt101 [45] to predict per-class probability for
each image. Malmsten et al., [25] uses Inception-V3 [39]
to classify human embryo images into different cell divi-
sion stages, up to eight cells. While showing promising re-
sults on automatically classifying embryonic cell stage de-
velopment with DNNs, image-based prediction approaches
ignore temporal coherence between time-lapse images and
the monotonic development order constraint during train-
ing. In the second group, Lukyanenko et al., [24] incorpo-
rate CRFs [37] to include the monotonic condition into the
learning process for sequential stage prediction. Lockhart
摘要:

EmbryosFormer:DeformableTransformerandCollaborativeEncoding-DecodingforEmbryosStageDevelopmentClassicationTien-PhatNguyen1,4,Trong-ThangPham*4,TriNguyen*7,HieuLe*1,DungNguyen5,HauLam6,PhongNguyen1,JenniferFowler8,Minh-TrietTran2,3,4,NganLe91FPTSoftwareAICenter,HoChiMinhCity,Vietnam2UniversityofScie...

展开>> 收起<<
EmbryosFormer Deformable Transformer and Collaborative Encoding-Decoding for Embryos Stage Development Classification Tien-Phat Nguyen14 Trong-Thang Pham4 Tri Nguyen7 Hieu Le1 Dung Nguyen5.pdf

共10页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:10 页 大小:1.46MB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 10
客服
关注