Gastrointestinal Disorder Detection with a Transformer Based Approach A.K.M. Salman Hosain1 Mynul islam1 Md Humaion Kabir Mehedi1 Irteza Enan Kabir2 and

2025-05-06 0 0 868.05KB 6 页 10玖币
侵权投诉
Gastrointestinal Disorder Detection with a
Transformer Based Approach
A.K.M. Salman Hosain 1, Mynul islam 1, Md Humaion Kabir Mehedi1, Irteza Enan Kabir 2, and
Zarin Tasnim Khan 3
1Department of Computer Science and Engineering, Brac University, Dhaka, Bangladesh
2University of Rochester, NY, United States
3Shaheed Tajuddin Ahmad Medical College, Dhaka, Bangladesh
1{akm.salman.hosain, mynul.islam, humaion.kabir.mehedi }@g.bracu.ac.bd
2,3 {irtezaenan, zarinkhan27}@gmail.com
Abstract—Accurate disease categorization using endoscopic
images is a significant problem in Gastroenterology. This paper
describes a technique for assisting medical diagnosis procedures
and identifying gastrointestinal tract disorders based on the
categorization of characteristics taken from endoscopic pictures
using a vision transformer and transfer learning model. Vision
transformer has shown very promising results on difficult image
classification tasks. In this paper, we have suggested a vision
transformer based approach to detect gastrointestianl diseases
from wireless capsule endoscopy (WCE) curated images of
colon with an accuracy of 95.63%. We have compared this
transformer based approach with pretrained convolutional neural
network (CNN) model DenseNet201 and demonstrated that
vision transformer surpassed DenseNet201 in various quantitative
performance evaluation metrics.
Keywords—Vision transformer, Gastrointestinal Disorder,
Transfer Learning, DenseNet201, ViT, Colon
I. INTRODUCTION
The gastrointestinal (GI) tract, also known as digestive tract
is prone to several diseases such as polyps, ulcer, colorectal
cancer, etc [1]. Common symptoms include pain or discom-
fort in the abdomen, loss of appetite, nausea and vomiting,
abdominal discomfort and fatigue. Some of the GI diseases
often lead to GI cancer, which is considered the second most
common cancer worldwide [2]. One of the common diseases
of the gastro-intenstine is the muco-submucosal polyps, which
are the results of chronic prolapse of the mucosa in intestine.
[3]. Polyps often don’t show a lot of symptoms in the early
stages, but as it enlarges, it can block the opening to the small
intestine. The symptoms for polyps might include blood in
stool thus anemia, tenderness when the stomach is touched and
nausea. These appear as polypoid mass in endoscopic imaging,
and has an increased risk of cancer. Esophagitis is another
common GI condition which is caused from the inflammation
of the tube connecting the throat to the stomach. Esophagitis
mainly causes difficulties in swallowing, chest pain, heart burn,
swallowed food being stuck in esophagus [4]. Endoscopy
usually shows rings of abnormal tissue. Ulcerative colitis, an
inflammatory bowel disease, is also a frequently occurring
condition, which causes inflammation in the GI tract along
with abdominal pain, diarrhoea, fatigue and bloody stool.
These GI diseases often have overlapping symptoms, thus
difficult to identify. Initial diagnosis of these diseases may lead
to cure or prevention from developing fatal cancer. Although
visual assessment of endoscopy images give an initial diagno-
sis, this is often time consuming and highly subjective [5].
Moreover, there might be radiologist deficiencies and other
human factors which often lead to false positive or even false
negative diagnosis, which can be detrimental for the patient
[6]. Thus, a computer aided diagnosis would be valuable for
high accuracy detection at the early stages.
In this paper, we classify endoscopic images for subjects
with gastrointestinal diseases. For the classification task, we
undertook two different approaches. We used vision trans-
former and transfer learning method with pretrained CNN
architecture for the classification, and compared the results be-
tween these the two classification models. The gastrointestinal
diseases for our data set consists of four classes:
Healthy control, or normal class
Ulcerative colitis
Polyps and
Esophagitis
Our contributions in this work are -
We have utilized vision transformer based model (ViT)
and pretrained CNN model DenseNet201 to detect three
gastrointestinal diseases along with healthy colon from
wireless capsule endoscopy images (WCE) curated im-
ages of colon
We have conducted comparative analysis between the two
models on various quantitative performance evaluation
metrics and demonstrated the superior classifier
II. RELATED WORKS
Machine learning techniques have been previously used in
the area of medicine for diagnosis purposes, such as using
neural networks for classification of stomach cancer [7], deep
learning [8] for stomach abnormality classification, etc.
978-1-6654-6316-4/22/$31.00 © 2022 IEEE
arXiv:2210.03168v1 [cs.CV] 6 Oct 2022
In the paper by Escober et al. [9], they provided a method
for classifying illnesses and abnormalities of the gastrointesti-
nal tract in endoscopic pictures that outperformed existing
approaches. The suggested technique is primarily focused on
transfer learning via VGG16 convolutional neural network,
which had previously been trained using the ImageNet dataset.
CNNs [10], [11] have a number of distinct hidden layers,
and one of their strongest skills is learning hierarchical con-
cept representation layers that match to various degrees of
abstraction. These networks perform best when the weights
that fundamentally determine how the network operates are
calculated using huge data. Unfortunately, because it is a costly
operation, these big data sets are typically not accessible in the
medical profession. Due to this, the authors proposed a transfer
learning method for detecting gastrointestinal abnormalities
and disorders in endoscopic images using the VGG16 [12]
CNN which had already been trained using the ImageNet
dataset.
Alexey Dosovitskiy et al. [13] looked into how Transformers
might be used directly for image classification. They have
developed a method for creating images as a series of patches
that is then processed by a common Transformer encoder
used in NLP. When combined with pre-training on substantial
datasets, this method performs quite well. Vision Transformer
(ViT) performs exceptionally well when the computational
complexity of pre-training the model is taken into account,
reaching the final state on most reduced pre-training cost.
As a result, Vision Transformer is reasonably inexpensive
to pre-train and meets or outperforms on numerous image
classification datasets.
Scaling Vision Transformer [14] claims that huge models
utilize high computation resources more effectively in addition
to performing better with appropriate scaling of Transformers
in NLP. Understanding a model’s scaling features is essential
to properly developing subsequent generations since scale is a
vital component in achieving outstanding outcomes. For ViT
models with sufficient training data, the efficiency compute
frontier typically resembles a power law. Importantly, in order
to remain on this, one must concurrently scale computation
and model capacity. If it fails to do so then additional compute
becomes available which is not the best course of action.
Vision Transformers with Patch Diversification [15] uti-
lized special loss algorithms in vision transformer training to
successfully promote diversity among patch representations
for enhanced discriminative feature extraction. Because it
enables for training to be stabilized, we can now develop
vision transformers that are wider and deeper. We could
improve vision transformer performance by modifying the
transformer architecture to include convolution layers. Data
loss and performance loss occur as a result of the self-centered
blocks’ preference to map different patches into equivalent
latent models for visual transformers. Furthermore, without
changing the transformer model structure, it is possible to train
larger, deeper models and enhance performance on picture
classification tasks by diversifying patch representations.
III. METHODOLOGY
In this paper, we have proposed a novel framework to de-
tect gastrointestinal diseases from wireless capsule endoscopy
(WCE) curated images with vision transformer (ViT) based
model, and pretrained DenseNet201 [16]. The proposed frame-
work is depicted in Fig. 1.
Fig. 1: Proposed gastrointestinal disease detection framework
using ViT and DenseNet
A. Dataset Description
We have collected our dataset from Kaggle [17]. The dataset
contained WCE images from inside the gastrointestinal (GI)
tract. This dataset originally contained photos of 720 x 576
pixels of four classes: normal, ulcerative colitis, polyps, and
esophagitis. We have used our machine learning models to
classify this dataset into above mentioned four classes. Sample
images from dataset is presented in Fig. 2. Training and test
data distribution is presented in Fig. 3.
Fig. 2: Sample images from dataset. Top left is a normal
colon image, top right is a ulcerative colitis diseased colon
image, bottom left is a polyps, and bottom right is esophagitis
diseased colon WCE image
摘要:

GastrointestinalDisorderDetectionwithaTransformerBasedApproachA.K.M.SalmanHosain1,Mynulislam1,MdHumaionKabirMehedi1,IrtezaEnanKabir2,andZarinTasnimKhan31DepartmentofComputerScienceandEngineering,BracUniversity,Dhaka,Bangladesh2UniversityofRochester,NY,UnitedStates3ShaheedTajuddinAhmadMedicalCollege,...

收起<<
Gastrointestinal Disorder Detection with a Transformer Based Approach A.K.M. Salman Hosain1 Mynul islam1 Md Humaion Kabir Mehedi1 Irteza Enan Kabir2 and.pdf

共6页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:6 页 大小:868.05KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 6
客服
关注