RAIS: Robust and Accurate Interactive Segmentation
via Continual Learning
Yuying Hao, Yi Liu, Juncai Peng, Haoyi Xiong, Guowei Chen,
Shiyu Tang, Zeyu Chen, Baohua Lai
Baidu Inc.
{haoyuying, liuyi22}@baidu.com
Abstract
Interactive image segmentation aims at segmenting a
target region through a way of human-computer interaction.
Recent works based on deep learning have achieved excel-
lent performance, while most of them focus on improving
the accuracy of the training set and ignore potential im-
provement on the test set. In the inference phase, they tend
to have a good performance on similar domains to the train-
ing set, and lack adaptability to domain shift, so they require
more user efforts to obtain satisfactory results. In this work,
we propose RAIS, a robust and accurate architecture for in-
teractive segmentation with continuous learning, where the
model can learn from both train and test data sets. For effi-
cient learning on the test set, we propose a novel optimiza-
tion strategy to update global and local parameters with a
basic segmentation module and adaptation module, respec-
tively. Moreover, we perform extensive experiments on sev-
eral benchmarks that show our method can handle data dis-
tribution shifts and achieves SOTA performance compared
with recent interactive segmentation methods. Besides, our
method also shows its robustness in the datasets of remote
sensing and medical imaging where the data domains are
completely different between training and testing.
1. Introduction
Deep learning methods have shown superior perfor-
mance on segmentation tasks [32, 13, 18], such as por-
trait segmentation [3], satellite image processing [10], in-
telligent driving [29]. Unusually, most of them require
large-scale annotated images to learn powerful abstraction.
However, the cost of manual annotation grows rapidly, as
the number of data increases, especially when it comes
to pixel-level segmentation tasks. To improve the effi-
ciency of the annotation process, interactive segmentation
appears to be an effective and auxiliary way, which is a
semi-automatic method utilizing human-computer interac-
tion. It allows the annotators to provide a small number
of interactive information and generates the final segmenta-
tion result progressively. Therefore, it can accelerate seg-
mentation annotation while maintaining satisfactory qual-
ity. Recently, interactive segmentation has attracted inten-
sive attention in both academia and industry. In interac-
tive segmentation, there have been a few types of inter-
active information, e.g., bounding box [30], scribbles [1]
or clicks [27, 22, 11, 17, 26], where their characteristics
have been studied well by previous works. Among them,
the click-based interactive way is the most widely used,
because it provides sufficient region-of-interest informa-
tion with minimal interaction time. In general, click-based
methods usually employ two kinds of user clicks, i.e. posi-
tive clicks and negative clicks, which indicate the target re-
gion and non-target regions, respectively. In general, most
interactive methods [27, 22, 17] train the model over a train-
ing set without updating its parameters at test time. Usually,
they do well on the test data similar to the train set. As the
difference in data distribution increases, their performance
could deteriorate significantly. Accordingly, they require
more user clicks to refine the final results, or even they need
to be re-trained on the new data, which is increasing anno-
tation costs.
In this work, we propose RAIS, a robust and accu-
rate architecture for interactive segmentation with contin-
uous learning, to address the deterioration problem. In our
method, we take interactive segmentation as a continuous
adaptation and allow the model to learn from both the train
set and test set. For the train set, we use the full-supervision
way to update the model parameters like other methods. As
for the test set, we propose a weakly-supervised method to
refine the model by utilizing the user annotations and in-
termediate output. Since the user interactions have already
provided useful hints of ground truth, the intermediate re-
sults are the potential to improve performance for subse-
quent data. Hence, our model can adapt to the new data
distribution gradually, and relieve the impact of the deteri-
oration problem. Also, to prevent the model from forget-
arXiv:2210.10984v1 [cs.CV] 20 Oct 2022