Multi-stream Fusion for Class Incremental Learning in Pill Image Classication Trong-Tung Nguyen12 Hieu H. Pham13 Phi Le Nguyen4 Thanh Hung

2025-05-02 0 0 1.26MB 16 页 10玖币

侵权投诉

Multi-stream Fusion for Class Incremental

Learning in Pill Image Classiﬁcation

Trong-Tung Nguyen1,2, Hieu H. Pham1,3,∗, Phi Le Nguyen4, Thanh Hung

Nguyen4, and Minh Do1,3,5

1VinUni-Illinois Smart Health Center, VinUniversity, Hanoi, Vietnam;

{tung.nt,hieu.ph,minh.do}@vinuni.edu.vn

2John von Neumann Institute, University of Science, VNU-HCM, Vietnam;

3College of Engineering & Computer Science, VinUniversity, Hanoi, Vietnam;

4School of Information and Communication Technology, Hanoi University of Science

and Technology, Vietnam;

{lenp,hungnt}@soict.hust.edu.vn

5University of Illinois at Urbana-Champaign, US;minhdo@illinois.edu

*Corresponding author

Abstract. Classifying pill categories from real-world images is crucial

for various smart healthcare applications. Although existing approaches

in image classiﬁcation might achieve a good performance on ﬁxed pill

categories, they fail to handle novel instances of pill categories that are

frequently presented to the learning algorithm. To this end, a trivial so-

lution is to train the model with novel classes. However, this may result

in a phenomenon known as catastrophic forgetting, in which the system

forgets what it learned in previous classes. In this paper, we address

this challenge by introducing the class incremental learning (CIL) abil-

ity to traditional pill image classiﬁcation systems. Speciﬁcally, we pro-

pose a novel incremental multi-stream intermediate fusion framework en-

abling incorporation of an additional guidance information stream that

best matches the domain of the problem into various state-of-the-art

CIL methods. From this framework, we consider color-speciﬁc infor-

mation of pill images as a guidance stream and devise an approach,

namely “Color Guidance with Multi-stream intermediate fusion”(CG-

IMIF) for solving CIL pill image classiﬁcation task. We conduct compre-

hensive experiments on real-world incremental pill image classiﬁcation

dataset, namely VAIPE-PCIL, and ﬁnd that the CG-IMIF consistently

outperforms several state-of-the-art methods by a large margin in dif-

ferent task settings. Our code, data, and trained model are available at

https://github.com/vinuni-vishc/CG-IMIF.

1 Introduction

Pill image recognition task has attracted various studies recently with the aim to

design high-quality algorithm for visual-based assistance system on pill images.

This can help the healthcare community automatically identify unknown pill

categories by taking several real-world pictures with mobile devices. It is note-

worthy that real-world scenarios of pill images are often challenging due to the

arXiv:2210.02313v1 [cs.CV] 5 Oct 2022

2 Trong-Tung Nguyen et al.

changing background as well as variances of pill instances in terms of shape, color,

and texture. There have been several works that are developed to mitigate such

challenges, most of them are based on hand-crafted features [3, 5, 6, 10]. These

works are then utilized by Ling et al. [16] and combined with a two-stage training

strategy to create a novel framework for the pill recognition model in few-shot

learning. Another approach is to explore external knowledge from medical text

data (e.g. prescription) to improve the detection performance of visual-based

models [18, 19]. However, existing models are often limited by novel instances

of pill categories which frequently arrive at a pill recognition system. This often

happens when a novel class of pill instance is introduced by images uploaded

from the end-user using mobile devices or from the healthcare community. A

report in [1] shows that there are roughly 40-50 novel drugs being approved each

year. In such a scenario, the core learning model of the system, which is often

deployed in a lightweight device (e.g, mobile phones), might need to rewind the

training process on the whole training data (in which novel categories partici-

pate). This is not an eﬀective strategy for many reasons. Memory allocated for

such extensively training data is often limited. Acquiring novel knowledge while

maintaining what the model has learned so far requires the system to store a

huge amount of samples for both old and new classes, which is infeasible. An-

other solution for this is to provide an initial training dataset for the model.

The model is then ﬁne-tuned on novel categories to update the model’s knowl-

edge about new pill instances. However, this ﬁne-tuning scheme suﬀers from a

serious behavior of the learning system which is widely known as catastrophic

forgetting [8, 9] (degrading performance on old tasks while accessing data of

novel tasks). This system, therefore, is in need of a ﬂexible and eﬀective strategy

to handle the novel real-world object categorization of pill image instances. In

this way, it would be able to incrementally learn from new classes without ex-

haustively storing old category samples. This scenario is called class-incremental

learning (CIL).

The progress of studies on class incremental learning (CIL) for visual tasks

has been developed signiﬁcantly for many years. The general setting of CIL is

that the disjoint sets of diﬀerent classes arrive at the learning algorithm gradu-

ally. Many works such as [4,13,21–23] have proposed several methods which em-

ployed available techniques to tackle the mutual challenge: catastrophic forget-

ting. Knowledge distillation [12] is the most common technique which is widely

adopted to tackle catastrophic forgetting and was ﬁrst applied to the CIL set-

ting by Li et al. [15]. After that, a derived version [21] with additional usage of

representation learning was proposed, in which valuable herding exemplars are

replayed frequently to keep track of the old knowledge. The strategy of herding is

to pick those neighbors which are nearest to the mean sample of the class. Using

this herding strategy, Castro et al. [4] managed to build an end-to-end framework

with an additionally balanced ﬁne-tuning strategy. On the other hand, Wu et

al. [22] introduced a bias correction approach by adding a bias correction layer.

This is conducted at the last layer of each incremental learning task to reﬁne the

overall scores for the ﬁnal prediction. Meanwhile, Hou et al. [13] identiﬁed the

Multi-stream Fusion for CIL in Pill Image Classiﬁcation 3

imbalance between previous and new data as the main issue leading to catas-

trophic forgetting. They tackled this imbalanced scenario by incorporating three

main components: cosine normalization, less-forget constraint, and inter-class

separation.

In this research, we aim to investigate the application of CIL methods in a pill

classiﬁcation system. Fig.1 illustrates the eﬀect of such a system with and with-

out class incremental learning capability. To the best of our knowledge, we are

the ﬁrst to explore incremental learning on the pill classiﬁcation system. Exist-

ing single stream incremental learning methods [4,13,21–23], when being applied

to a domain of application for practical usage, can be improved with the help

of some domain-speciﬁc knowledge. This serves as additional information which

might collaborate well with the original RGB image to alleviate catastrophic

forgetting. The introduction of a supplementary information stream requires a

prudent strategy to incorporate such information. Based on this motivation, we

propose a novel integration framework that serves as a plug-in technique for any

available class incremental learning algorithms. Our fusion framework enables

the incremental learning methods to receive additional information streams as

cues. This will then help to ﬂexibly update corresponding feature representa-

tions in an optimal way for each learning task through the intermediate stage.

To demonstrate the usage of such an integration framework, we consider color

information as additional stream and devise an approach, named “Color Guid-

ance with Multi-stream intermediate fusion”(CG-IMIF). Experimental results

on a real-world incremental pill image classiﬁcation dataset called VAIPE-PCIL

show that the proposed learning framework consistently surpasses most metric

scores of various state-of-the-art methods in diﬀerent task settings.

Deep Classification

Model

Test samples

Category A

Category B

Category D

Category E

a) Traditional pipeline of Pill Classification problem

b) Incremental Learning for Pill Classification problem

Growing Deep

Classification Model

Fixed Training

Pill dataset

Exemplars of

previous categories

…

Incoming Novel Disjoint

Set of Pill Categories

Growing Training Pill dataset

Classification Results

Fig. 1: The pipeline for a learning algorithm to acquire knowledge of pill cate-

gories could be divided into two options: (a) feeding a ﬁxed pill images database

to an oﬀ-the-shelf deep learning algorithm; (b) maintaining a few samples of old

categories as exemplars, combining with novel categories to form a growing pill

image dataset, and ﬁnally feeding into a growing deep classiﬁcation model.

4 Trong-Tung Nguyen et al.

Our contributions can be summarized in the following three aspects:

1. We introduce CG-IMIF, a novel incremental learning framework based on

multiple streams for the task of pill classiﬁcation from images. To the best

of our knowledge, we are the ﬁrst to introduce the incremental learning

capability to this task and provide a new approach to tackle challenges in

learning novel pill classes.

2. We conduct thorough experiments and in-depth ablation studies to demon-

strate the eﬀectiveness of the proposed approach on a real-world incremental

pill image classiﬁcation dataset. Experimental results show that the CG-

IMIF consistently outperforms previous state-of-the-art methods by a large

margin.

The rest of this paper is organized as follows. We brieﬂy formulate the problem

setting of pill CIL, which we aim to solve in Section 2. Details of our proposed

CG-IMIF framework are described in Section 3. Experimental results and further

analysis are presented in Section 4 and 5. Finally, we conclude the paper with

our discussion on strengths and limitations in Section 6, and 7.

2 Preliminaries

2.1 Problem Deﬁnition and Notation

Generally, the Class Incremental Learning (CIL) problem represented by τ

consists of a sequence of nimage classiﬁcation learning tasks

τ= [(C1, P 1

train, P 1

test),(C2, P 2

train, P 2

test), ..., (Cn, P n

train, P n

test)],(1)

where each tuple (Ct, P t

train, P t

test) depicts a task t.Ctis a set of mtcategories,

i.e.,Ct={ct

1, ct

2, ..., ct

mt},Pt

train and Pt

test denote the training and testing data,

respectively. To represent the total number of classes up to the current task,

we deﬁne Mt=Pt

i=1 

Ci

. The training, and testing data is deﬁned as Pt=

{(Xt, Y t)}where Xtand Ytdenote the training images and their corresponding

labels, respectively. During the training phase, the learning model at stage t

is presented with categories set Ct, training samples Pt

train, and an exemplar

set Kt. In practice, Ktis a ﬁxed-size set acting as a support set which helps

to retain a partial set of images and the corresponding labels from previous

training data, i.e.,Kt⊆P1

train ∪P2

train ∪... ∪Pt−1

train. Therefore, a revised version

of training samples at stage tcan be obtained by combining Ktand Pt

train,

Kt∪Pt

train =Vt

train. It is also assumed that categories of diﬀerent learning tasks

do not overlap (i.e Ci∩Cj=∅where i6=j). At testing time, the performance of

learner tis evaluated on all of the previous seen categories St

i=1 Ctwith samples

from St

i=1 Pt

test.

2.2 Conventional CIL Methods

Several CIL methods have been proposed which consider various properties

of CIL problem to tackle mutual challenge: catastrophic forgetting. Most CIL

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Multi-streamFusionforClassIncrementalLearninginPillImageClassicationTrong-TungNguyen1;2,HieuH.Pham1;3;,PhiLeNguyen4,ThanhHungNguyen4,andMinhDo1;3;51VinUni-IllinoisSmartHealthCenter,VinUniversity,Hanoi,Vietnam;ftung.nt,hieu.ph,minh.dog@vinuni.edu.vn2JohnvonNeumannInstitute,UniversityofScience,VNU-H...

展开>> 收起<<

Multi-stream Fusion for Class Incremental Learning in Pill Image Classication Trong-Tung Nguyen12 Hieu H. Pham13 Phi Le Nguyen4 Thanh Hung.pdf

共16页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Multi-stream Fusion for Class Incremental Learning in Pill Image Classication Trong-Tung Nguyen12 Hieu H. Pham13 Phi Le Nguyen4 Thanh Hung

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: