Gastrointestinal Disorder Detection with a
Transformer Based Approach
A.K.M. Salman Hosain 1, Mynul islam 1, Md Humaion Kabir Mehedi1, Irteza Enan Kabir 2, and
Zarin Tasnim Khan 3
1Department of Computer Science and Engineering, Brac University, Dhaka, Bangladesh
2University of Rochester, NY, United States
3Shaheed Tajuddin Ahmad Medical College, Dhaka, Bangladesh
1{akm.salman.hosain, mynul.islam, humaion.kabir.mehedi }@g.bracu.ac.bd
2,3 {irtezaenan, zarinkhan27}@gmail.com
Abstract—Accurate disease categorization using endoscopic
images is a significant problem in Gastroenterology. This paper
describes a technique for assisting medical diagnosis procedures
and identifying gastrointestinal tract disorders based on the
categorization of characteristics taken from endoscopic pictures
using a vision transformer and transfer learning model. Vision
transformer has shown very promising results on difficult image
classification tasks. In this paper, we have suggested a vision
transformer based approach to detect gastrointestianl diseases
from wireless capsule endoscopy (WCE) curated images of
colon with an accuracy of 95.63%. We have compared this
transformer based approach with pretrained convolutional neural
network (CNN) model DenseNet201 and demonstrated that
vision transformer surpassed DenseNet201 in various quantitative
performance evaluation metrics.
Keywords—Vision transformer, Gastrointestinal Disorder,
Transfer Learning, DenseNet201, ViT, Colon
I. INTRODUCTION
The gastrointestinal (GI) tract, also known as digestive tract
is prone to several diseases such as polyps, ulcer, colorectal
cancer, etc [1]. Common symptoms include pain or discom-
fort in the abdomen, loss of appetite, nausea and vomiting,
abdominal discomfort and fatigue. Some of the GI diseases
often lead to GI cancer, which is considered the second most
common cancer worldwide [2]. One of the common diseases
of the gastro-intenstine is the muco-submucosal polyps, which
are the results of chronic prolapse of the mucosa in intestine.
[3]. Polyps often don’t show a lot of symptoms in the early
stages, but as it enlarges, it can block the opening to the small
intestine. The symptoms for polyps might include blood in
stool thus anemia, tenderness when the stomach is touched and
nausea. These appear as polypoid mass in endoscopic imaging,
and has an increased risk of cancer. Esophagitis is another
common GI condition which is caused from the inflammation
of the tube connecting the throat to the stomach. Esophagitis
mainly causes difficulties in swallowing, chest pain, heart burn,
swallowed food being stuck in esophagus [4]. Endoscopy
usually shows rings of abnormal tissue. Ulcerative colitis, an
inflammatory bowel disease, is also a frequently occurring
condition, which causes inflammation in the GI tract along
with abdominal pain, diarrhoea, fatigue and bloody stool.
These GI diseases often have overlapping symptoms, thus
difficult to identify. Initial diagnosis of these diseases may lead
to cure or prevention from developing fatal cancer. Although
visual assessment of endoscopy images give an initial diagno-
sis, this is often time consuming and highly subjective [5].
Moreover, there might be radiologist deficiencies and other
human factors which often lead to false positive or even false
negative diagnosis, which can be detrimental for the patient
[6]. Thus, a computer aided diagnosis would be valuable for
high accuracy detection at the early stages.
In this paper, we classify endoscopic images for subjects
with gastrointestinal diseases. For the classification task, we
undertook two different approaches. We used vision trans-
former and transfer learning method with pretrained CNN
architecture for the classification, and compared the results be-
tween these the two classification models. The gastrointestinal
diseases for our data set consists of four classes:
•Healthy control, or normal class
•Ulcerative colitis
•Polyps and
•Esophagitis
Our contributions in this work are -
•We have utilized vision transformer based model (ViT)
and pretrained CNN model DenseNet201 to detect three
gastrointestinal diseases along with healthy colon from
wireless capsule endoscopy images (WCE) curated im-
ages of colon
•We have conducted comparative analysis between the two
models on various quantitative performance evaluation
metrics and demonstrated the superior classifier
II. RELATED WORKS
Machine learning techniques have been previously used in
the area of medicine for diagnosis purposes, such as using
neural networks for classification of stomach cancer [7], deep
learning [8] for stomach abnormality classification, etc.
978-1-6654-6316-4/22/$31.00 © 2022 IEEE
arXiv:2210.03168v1 [cs.CV] 6 Oct 2022