Multi-stream Fusion for Class Incremental Learning in Pill Image Classication Trong-Tung Nguyen12 Hieu H. Pham13 Phi Le Nguyen4 Thanh Hung

2025-05-02 0 0 1.26MB 16 页 10玖币
侵权投诉
Multi-stream Fusion for Class Incremental
Learning in Pill Image Classification
Trong-Tung Nguyen1,2, Hieu H. Pham1,3,, Phi Le Nguyen4, Thanh Hung
Nguyen4, and Minh Do1,3,5
1VinUni-Illinois Smart Health Center, VinUniversity, Hanoi, Vietnam;
{tung.nt,hieu.ph,minh.do}@vinuni.edu.vn
2John von Neumann Institute, University of Science, VNU-HCM, Vietnam;
3College of Engineering & Computer Science, VinUniversity, Hanoi, Vietnam;
4School of Information and Communication Technology, Hanoi University of Science
and Technology, Vietnam;
{lenp,hungnt}@soict.hust.edu.vn
5University of Illinois at Urbana-Champaign, US;minhdo@illinois.edu
*Corresponding author
Abstract. Classifying pill categories from real-world images is crucial
for various smart healthcare applications. Although existing approaches
in image classification might achieve a good performance on fixed pill
categories, they fail to handle novel instances of pill categories that are
frequently presented to the learning algorithm. To this end, a trivial so-
lution is to train the model with novel classes. However, this may result
in a phenomenon known as catastrophic forgetting, in which the system
forgets what it learned in previous classes. In this paper, we address
this challenge by introducing the class incremental learning (CIL) abil-
ity to traditional pill image classification systems. Specifically, we pro-
pose a novel incremental multi-stream intermediate fusion framework en-
abling incorporation of an additional guidance information stream that
best matches the domain of the problem into various state-of-the-art
CIL methods. From this framework, we consider color-specific infor-
mation of pill images as a guidance stream and devise an approach,
namely “Color Guidance with Multi-stream intermediate fusion”(CG-
IMIF) for solving CIL pill image classification task. We conduct compre-
hensive experiments on real-world incremental pill image classification
dataset, namely VAIPE-PCIL, and find that the CG-IMIF consistently
outperforms several state-of-the-art methods by a large margin in dif-
ferent task settings. Our code, data, and trained model are available at
https://github.com/vinuni-vishc/CG-IMIF.
1 Introduction
Pill image recognition task has attracted various studies recently with the aim to
design high-quality algorithm for visual-based assistance system on pill images.
This can help the healthcare community automatically identify unknown pill
categories by taking several real-world pictures with mobile devices. It is note-
worthy that real-world scenarios of pill images are often challenging due to the
arXiv:2210.02313v1 [cs.CV] 5 Oct 2022
2 Trong-Tung Nguyen et al.
changing background as well as variances of pill instances in terms of shape, color,
and texture. There have been several works that are developed to mitigate such
challenges, most of them are based on hand-crafted features [3, 5, 6, 10]. These
works are then utilized by Ling et al. [16] and combined with a two-stage training
strategy to create a novel framework for the pill recognition model in few-shot
learning. Another approach is to explore external knowledge from medical text
data (e.g. prescription) to improve the detection performance of visual-based
models [18, 19]. However, existing models are often limited by novel instances
of pill categories which frequently arrive at a pill recognition system. This often
happens when a novel class of pill instance is introduced by images uploaded
from the end-user using mobile devices or from the healthcare community. A
report in [1] shows that there are roughly 40-50 novel drugs being approved each
year. In such a scenario, the core learning model of the system, which is often
deployed in a lightweight device (e.g, mobile phones), might need to rewind the
training process on the whole training data (in which novel categories partici-
pate). This is not an effective strategy for many reasons. Memory allocated for
such extensively training data is often limited. Acquiring novel knowledge while
maintaining what the model has learned so far requires the system to store a
huge amount of samples for both old and new classes, which is infeasible. An-
other solution for this is to provide an initial training dataset for the model.
The model is then fine-tuned on novel categories to update the model’s knowl-
edge about new pill instances. However, this fine-tuning scheme suffers from a
serious behavior of the learning system which is widely known as catastrophic
forgetting [8, 9] (degrading performance on old tasks while accessing data of
novel tasks). This system, therefore, is in need of a flexible and effective strategy
to handle the novel real-world object categorization of pill image instances. In
this way, it would be able to incrementally learn from new classes without ex-
haustively storing old category samples. This scenario is called class-incremental
learning (CIL).
The progress of studies on class incremental learning (CIL) for visual tasks
has been developed significantly for many years. The general setting of CIL is
that the disjoint sets of different classes arrive at the learning algorithm gradu-
ally. Many works such as [4,13,21–23] have proposed several methods which em-
ployed available techniques to tackle the mutual challenge: catastrophic forget-
ting. Knowledge distillation [12] is the most common technique which is widely
adopted to tackle catastrophic forgetting and was first applied to the CIL set-
ting by Li et al. [15]. After that, a derived version [21] with additional usage of
representation learning was proposed, in which valuable herding exemplars are
replayed frequently to keep track of the old knowledge. The strategy of herding is
to pick those neighbors which are nearest to the mean sample of the class. Using
this herding strategy, Castro et al. [4] managed to build an end-to-end framework
with an additionally balanced fine-tuning strategy. On the other hand, Wu et
al. [22] introduced a bias correction approach by adding a bias correction layer.
This is conducted at the last layer of each incremental learning task to refine the
overall scores for the final prediction. Meanwhile, Hou et al. [13] identified the
Multi-stream Fusion for CIL in Pill Image Classification 3
imbalance between previous and new data as the main issue leading to catas-
trophic forgetting. They tackled this imbalanced scenario by incorporating three
main components: cosine normalization, less-forget constraint, and inter-class
separation.
In this research, we aim to investigate the application of CIL methods in a pill
classification system. Fig.1 illustrates the effect of such a system with and with-
out class incremental learning capability. To the best of our knowledge, we are
the first to explore incremental learning on the pill classification system. Exist-
ing single stream incremental learning methods [4,13,21–23], when being applied
to a domain of application for practical usage, can be improved with the help
of some domain-specific knowledge. This serves as additional information which
might collaborate well with the original RGB image to alleviate catastrophic
forgetting. The introduction of a supplementary information stream requires a
prudent strategy to incorporate such information. Based on this motivation, we
propose a novel integration framework that serves as a plug-in technique for any
available class incremental learning algorithms. Our fusion framework enables
the incremental learning methods to receive additional information streams as
cues. This will then help to flexibly update corresponding feature representa-
tions in an optimal way for each learning task through the intermediate stage.
To demonstrate the usage of such an integration framework, we consider color
information as additional stream and devise an approach, named “Color Guid-
ance with Multi-stream intermediate fusion”(CG-IMIF). Experimental results
on a real-world incremental pill image classification dataset called VAIPE-PCIL
show that the proposed learning framework consistently surpasses most metric
scores of various state-of-the-art methods in different task settings.
Deep Classification
Model
Test samples
Category A
Category B
Category D
Category E
a) Traditional pipeline of Pill Classification problem
b) Incremental Learning for Pill Classification problem
Growing Deep
Classification Model
Fixed Training
Pill dataset
Exemplars of
previous categories
Incoming Novel Disjoint
Set of Pill Categories
Growing Training Pill dataset
+
Classification Results
Fig. 1: The pipeline for a learning algorithm to acquire knowledge of pill cate-
gories could be divided into two options: (a) feeding a fixed pill images database
to an off-the-shelf deep learning algorithm; (b) maintaining a few samples of old
categories as exemplars, combining with novel categories to form a growing pill
image dataset, and finally feeding into a growing deep classification model.
4 Trong-Tung Nguyen et al.
Our contributions can be summarized in the following three aspects:
1. We introduce CG-IMIF, a novel incremental learning framework based on
multiple streams for the task of pill classification from images. To the best
of our knowledge, we are the first to introduce the incremental learning
capability to this task and provide a new approach to tackle challenges in
learning novel pill classes.
2. We conduct thorough experiments and in-depth ablation studies to demon-
strate the effectiveness of the proposed approach on a real-world incremental
pill image classification dataset. Experimental results show that the CG-
IMIF consistently outperforms previous state-of-the-art methods by a large
margin.
The rest of this paper is organized as follows. We briefly formulate the problem
setting of pill CIL, which we aim to solve in Section 2. Details of our proposed
CG-IMIF framework are described in Section 3. Experimental results and further
analysis are presented in Section 4 and 5. Finally, we conclude the paper with
our discussion on strengths and limitations in Section 6, and 7.
2 Preliminaries
2.1 Problem Definition and Notation
Generally, the Class Incremental Learning (CIL) problem represented by τ
consists of a sequence of nimage classification learning tasks
τ= [(C1, P 1
train, P 1
test),(C2, P 2
train, P 2
test), ..., (Cn, P n
train, P n
test)],(1)
where each tuple (Ct, P t
train, P t
test) depicts a task t.Ctis a set of mtcategories,
i.e.,Ct={ct
1, ct
2, ..., ct
mt},Pt
train and Pt
test denote the training and testing data,
respectively. To represent the total number of classes up to the current task,
we define Mt=Pt
i=1
Ci
. The training, and testing data is defined as Pt=
{(Xt, Y t)}where Xtand Ytdenote the training images and their corresponding
labels, respectively. During the training phase, the learning model at stage t
is presented with categories set Ct, training samples Pt
train, and an exemplar
set Kt. In practice, Ktis a fixed-size set acting as a support set which helps
to retain a partial set of images and the corresponding labels from previous
training data, i.e.,KtP1
train P2
train ... Pt1
train. Therefore, a revised version
of training samples at stage tcan be obtained by combining Ktand Pt
train,
KtPt
train =Vt
train. It is also assumed that categories of different learning tasks
do not overlap (i.e CiCj=where i6=j). At testing time, the performance of
learner tis evaluated on all of the previous seen categories St
i=1 Ctwith samples
from St
i=1 Pt
test.
2.2 Conventional CIL Methods
Several CIL methods have been proposed which consider various properties
of CIL problem to tackle mutual challenge: catastrophic forgetting. Most CIL
摘要:

Multi-streamFusionforClassIncrementalLearninginPillImageClassi cationTrong-TungNguyen1;2,HieuH.Pham1;3;,PhiLeNguyen4,ThanhHungNguyen4,andMinhDo1;3;51VinUni-IllinoisSmartHealthCenter,VinUniversity,Hanoi,Vietnam;ftung.nt,hieu.ph,minh.dog@vinuni.edu.vn2JohnvonNeumannInstitute,UniversityofScience,VNU-H...

展开>> 收起<<
Multi-stream Fusion for Class Incremental Learning in Pill Image Classication Trong-Tung Nguyen12 Hieu H. Pham13 Phi Le Nguyen4 Thanh Hung.pdf

共16页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:16 页 大小:1.26MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 16
客服
关注