Gastrointestinal Disorder Detection with a Transformer Based Approach A.K.M. Salman Hosain1 Mynul islam1 Md Humaion Kabir Mehedi1 Irteza Enan Kabir2 and

2025-05-06 0 0 868.05KB 6 页 10玖币

侵权投诉

Gastrointestinal Disorder Detection with a

Transformer Based Approach

A.K.M. Salman Hosain 1, Mynul islam 1, Md Humaion Kabir Mehedi1, Irteza Enan Kabir 2, and

Zarin Tasnim Khan 3

1Department of Computer Science and Engineering, Brac University, Dhaka, Bangladesh

2University of Rochester, NY, United States

3Shaheed Tajuddin Ahmad Medical College, Dhaka, Bangladesh

1{akm.salman.hosain, mynul.islam, humaion.kabir.mehedi }@g.bracu.ac.bd

2,3 {irtezaenan, zarinkhan27}@gmail.com

Abstract—Accurate disease categorization using endoscopic

images is a signiﬁcant problem in Gastroenterology. This paper

describes a technique for assisting medical diagnosis procedures

and identifying gastrointestinal tract disorders based on the

categorization of characteristics taken from endoscopic pictures

using a vision transformer and transfer learning model. Vision

transformer has shown very promising results on difﬁcult image

classiﬁcation tasks. In this paper, we have suggested a vision

transformer based approach to detect gastrointestianl diseases

from wireless capsule endoscopy (WCE) curated images of

colon with an accuracy of 95.63%. We have compared this

transformer based approach with pretrained convolutional neural

network (CNN) model DenseNet201 and demonstrated that

vision transformer surpassed DenseNet201 in various quantitative

performance evaluation metrics.

Keywords—Vision transformer, Gastrointestinal Disorder,

Transfer Learning, DenseNet201, ViT, Colon

I. INTRODUCTION

The gastrointestinal (GI) tract, also known as digestive tract

is prone to several diseases such as polyps, ulcer, colorectal

cancer, etc [1]. Common symptoms include pain or discom-

fort in the abdomen, loss of appetite, nausea and vomiting,

abdominal discomfort and fatigue. Some of the GI diseases

often lead to GI cancer, which is considered the second most

common cancer worldwide [2]. One of the common diseases

of the gastro-intenstine is the muco-submucosal polyps, which

are the results of chronic prolapse of the mucosa in intestine.

[3]. Polyps often don’t show a lot of symptoms in the early

stages, but as it enlarges, it can block the opening to the small

intestine. The symptoms for polyps might include blood in

stool thus anemia, tenderness when the stomach is touched and

nausea. These appear as polypoid mass in endoscopic imaging,

and has an increased risk of cancer. Esophagitis is another

common GI condition which is caused from the inﬂammation

of the tube connecting the throat to the stomach. Esophagitis

mainly causes difﬁculties in swallowing, chest pain, heart burn,

swallowed food being stuck in esophagus [4]. Endoscopy

usually shows rings of abnormal tissue. Ulcerative colitis, an

inﬂammatory bowel disease, is also a frequently occurring

condition, which causes inﬂammation in the GI tract along

with abdominal pain, diarrhoea, fatigue and bloody stool.

These GI diseases often have overlapping symptoms, thus

difﬁcult to identify. Initial diagnosis of these diseases may lead

to cure or prevention from developing fatal cancer. Although

visual assessment of endoscopy images give an initial diagno-

sis, this is often time consuming and highly subjective [5].

Moreover, there might be radiologist deﬁciencies and other

human factors which often lead to false positive or even false

negative diagnosis, which can be detrimental for the patient

[6]. Thus, a computer aided diagnosis would be valuable for

high accuracy detection at the early stages.

In this paper, we classify endoscopic images for subjects

with gastrointestinal diseases. For the classiﬁcation task, we

undertook two different approaches. We used vision trans-

former and transfer learning method with pretrained CNN

architecture for the classiﬁcation, and compared the results be-

tween these the two classiﬁcation models. The gastrointestinal

diseases for our data set consists of four classes:

•Healthy control, or normal class

•Ulcerative colitis

•Polyps and

•Esophagitis

Our contributions in this work are -

•We have utilized vision transformer based model (ViT)

and pretrained CNN model DenseNet201 to detect three

gastrointestinal diseases along with healthy colon from

wireless capsule endoscopy images (WCE) curated im-

ages of colon

•We have conducted comparative analysis between the two

models on various quantitative performance evaluation

metrics and demonstrated the superior classiﬁer

II. RELATED WORKS

Machine learning techniques have been previously used in

the area of medicine for diagnosis purposes, such as using

neural networks for classiﬁcation of stomach cancer [7], deep

learning [8] for stomach abnormality classiﬁcation, etc.

arXiv:2210.03168v1 [cs.CV] 6 Oct 2022

In the paper by Escober et al. [9], they provided a method

for classifying illnesses and abnormalities of the gastrointesti-

nal tract in endoscopic pictures that outperformed existing

approaches. The suggested technique is primarily focused on

transfer learning via VGG16 convolutional neural network,

which had previously been trained using the ImageNet dataset.

CNNs [10], [11] have a number of distinct hidden layers,

and one of their strongest skills is learning hierarchical con-

cept representation layers that match to various degrees of

abstraction. These networks perform best when the weights

that fundamentally determine how the network operates are

calculated using huge data. Unfortunately, because it is a costly

operation, these big data sets are typically not accessible in the

medical profession. Due to this, the authors proposed a transfer

learning method for detecting gastrointestinal abnormalities

and disorders in endoscopic images using the VGG16 [12]

CNN which had already been trained using the ImageNet

dataset.

Alexey Dosovitskiy et al. [13] looked into how Transformers

might be used directly for image classiﬁcation. They have

developed a method for creating images as a series of patches

that is then processed by a common Transformer encoder

used in NLP. When combined with pre-training on substantial

datasets, this method performs quite well. Vision Transformer

(ViT) performs exceptionally well when the computational

complexity of pre-training the model is taken into account,

reaching the ﬁnal state on most reduced pre-training cost.

As a result, Vision Transformer is reasonably inexpensive

to pre-train and meets or outperforms on numerous image

classiﬁcation datasets.

Scaling Vision Transformer [14] claims that huge models

utilize high computation resources more effectively in addition

to performing better with appropriate scaling of Transformers

in NLP. Understanding a model’s scaling features is essential

to properly developing subsequent generations since scale is a

vital component in achieving outstanding outcomes. For ViT

models with sufﬁcient training data, the efﬁciency compute

frontier typically resembles a power law. Importantly, in order

to remain on this, one must concurrently scale computation

and model capacity. If it fails to do so then additional compute

becomes available which is not the best course of action.

Vision Transformers with Patch Diversiﬁcation [15] uti-

lized special loss algorithms in vision transformer training to

successfully promote diversity among patch representations

for enhanced discriminative feature extraction. Because it

enables for training to be stabilized, we can now develop

vision transformers that are wider and deeper. We could

improve vision transformer performance by modifying the

transformer architecture to include convolution layers. Data

loss and performance loss occur as a result of the self-centered

blocks’ preference to map different patches into equivalent

latent models for visual transformers. Furthermore, without

changing the transformer model structure, it is possible to train

larger, deeper models and enhance performance on picture

classiﬁcation tasks by diversifying patch representations.

III. METHODOLOGY

In this paper, we have proposed a novel framework to de-

tect gastrointestinal diseases from wireless capsule endoscopy

(WCE) curated images with vision transformer (ViT) based

model, and pretrained DenseNet201 [16]. The proposed frame-

work is depicted in Fig. 1.

Fig. 1: Proposed gastrointestinal disease detection framework

using ViT and DenseNet

A. Dataset Description

We have collected our dataset from Kaggle [17]. The dataset

contained WCE images from inside the gastrointestinal (GI)

tract. This dataset originally contained photos of 720 x 576

pixels of four classes: normal, ulcerative colitis, polyps, and

esophagitis. We have used our machine learning models to

classify this dataset into above mentioned four classes. Sample

images from dataset is presented in Fig. 2. Training and test

data distribution is presented in Fig. 3.

Fig. 2: Sample images from dataset. Top left is a normal

colon image, top right is a ulcerative colitis diseased colon

image, bottom left is a polyps, and bottom right is esophagitis

diseased colon WCE image

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

GastrointestinalDisorderDetectionwithaTransformerBasedApproachA.K.M.SalmanHosain1,Mynulislam1,MdHumaionKabirMehedi1,IrtezaEnanKabir2,andZarinTasnimKhan31DepartmentofComputerScienceandEngineering,BracUniversity,Dhaka,Bangladesh2UniversityofRochester,NY,UnitedStates3ShaheedTajuddinAhmadMedicalCollege,...

收起<<

Gastrointestinal Disorder Detection with a Transformer Based Approach A.K.M. Salman Hosain1 Mynul islam1 Md Humaion Kabir Mehedi1 Irteza Enan Kabir2 and.pdf

共6页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Gastrointestinal Disorder Detection with a Transformer Based Approach A.K.M. Salman Hosain1 Mynul islam1 Md Humaion Kabir Mehedi1 Irteza Enan Kabir2 and

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: