
MULTI-VIEW CONTRASTIVE LEARNING WITH ADDITIVE MARGIN FOR ADAPTIVE
NASOPHARYNGEAL CARCINOMA RADIOTHERAPY PREDICTION
Jiabao Sheng1,2, Yuanpeng Zhang∗3, Jing Cai∗1,2, Sai-Kit Lam1,2, Zhe Li 1, Jiang Zhang1, Xinzhi Teng1
1The Hong Kong Polytechnic University
2Research Institute for Smart Ageing, The Hong Kong Polytechnic University
3Department of Medical Informatics, Nantong University
ABSTRACT
The prediction of adaptive radiation therapy (ART) prior to
radiation therapy (RT) for nasopharyngeal carcinoma (NPC)
patients is important to reduce toxicity and prolong the sur-
vival of patients. Currently, due to the complex tumor micro-
environment, a single type of high-resolution image can pro-
vide only limited information. Meanwhile, the traditional
softmax-based loss is insufficient for quantifying the discrim-
inative power of a model. To overcome these challenges, we
propose a supervised multi-view contrastive learning method
with an additive margin (MMCon). For each patient, four
medical images are considered to form multi-view positive
pairs, which can provide additional information and enhance
the representation of medical images. In addition, the embed-
ding space is learned by means of contrastive learning. NPC
samples from the same patient or with similar labels will re-
main close in the embedding space, while NPC samples with
different labels will be far apart. To improve the discrimina-
tive ability of the loss function, we incorporate a margin into
the contrastive learning. Experimental result show this new
learning objective can be used to find an embedding space
that exhibits superior discrimination ability for NPC images.
Index Terms—Medical Image Analysis, Multi-view,
Nasopharyngeal Carcinoma, Contrastive Learning, Additive
Margin
1. INTRODUCTION
Planning intensity-modulated radiotherapy (IMRT) for
NPC requires medical imaging guidance. Previous studies
have shown that the target volume (TV) and organ-at-risk
(OAR) geometry appearing in images can change signifi-
cantly during IMRT [1, 2]. To reduce unnecessary exposure
This work was supported in part by Shenzhen-Hong Kong-Macau S&T
Program (Category C) (SGDX20201103095002019), Shenzhen Basic Re-
search Program (JCYJ20210324130209023) of Shenzhen Science and Tech-
nology Innovation Committee, Project of Strategic Importance (P0035421),
Project of RISA (P0043001) of The Hong Kong Polytechnic University,
the NSF of Jiangsu Province (No. BK20201441), Jiangsu Post-doctoral
Research Funding Program (No. 2020Z020), and the NSFC (Grant No.
82072019).
embedding
Space
𝑣1
𝑖∈ 𝑉
1𝑣2
𝑖∈ 𝑉
2𝑣3
𝑖∈ 𝑉
3𝑣4
𝑖∈ 𝑉
4
positive samples
Encoder Encoder Encoder Encoder
𝑍1
𝑖𝑍2
𝑖𝑍3
𝑖𝑍4
𝑖
query
samples
𝑣1
𝑗∈ 𝑉
1𝑣2
𝑗∈ 𝑉
2
𝑣3
𝑗∈ 𝑉
3𝑣4
𝑗∈ 𝑉
4
negative
samples
Encoder
𝑍1,2,3,4
𝑗
target
region
target
region
Fig. 1: Illustration of our basic idea. V1,V2,V3, and V4are
different medical image views in the NPC-GTV dataset. vi
and vjare different patients associated with NPC samples.
ziand zjrepresent vectors. Our objective is to learn an em-
bedding space in which similar sample pairs stay close while
dissimilar ones are far apart.
during treatment, it is necessary to incorporate medical im-
age analysis to assist doctors in evaluating whether ART is
needed.
In contrast to other medical image classification tasks,
such as tumour identification [3, 4], and cancer diagnosis [5,
6], the prediction task for NPC ART is to analyze the proper-
ties of tumour to distinguish the need for radiotherapy replan-
ning in the short term. Due to the heterogeneity of tumours
[7, 8], the volume, shape, and texture of the tumour region
may vary from patient to patient, and many diverse factors
may cause these features to change.
In a previous study, [9] used artificially extracted mag-
netic resonance imaging (MRI) features to study radiation
therapy planning for NPC. However, a single manually ex-
tracted omics signature cannot fully express the information
of NPC samples [10]. The sample learning methods used
in many studies [10] showed that manually extracted multi-
omics feature representation data could not be obtained better
arXiv:2210.15201v1 [cs.CV] 27 Oct 2022