B gives up some high-level semantic information and focuses more
on high-resolution detail information. The second row of Figure
1(d) indicates that Complementary B has more reliable segmen-
tation boundaries in the challenging branch area (indicated by the
arrow in the figure). The third row of Figure 1(d) clearly shows that
the segmentation region of Complementary B is basically wrapped
by the true label. The results above show that there is rich learn-
able information between the probability maps generated by the
Complementary A and Complementary B models.
Therefore, this paper proposes a new network based on comple-
mentary consistency training by using V-Net as the main model
and constructing two complementary auxiliary models. The two
auxiliary models form a complementary symmetric structure by
changing whether to use skip connections in a certain layer of the
V-Net decoder. Drawing on cross-pseudo-supervision (Chen et al.,
2021) and mutual consistency (Wu et al., 2021), a sharpening func-
tion is used to convert the probability maps generated by the two
auxiliary models into pseudo-labels to strengthen the training of
the main model. At the same time, the high-quality probability
maps generated by the main model are also converted into pseudo-
labels to guide the training of the auxiliary models. The pertur-
bation between the main model and the auxiliary models forms
a complementary consistency training. The Dice loss is used as
the supervised loss for labeled input data, and the model consis-
tency regularization loss is used as the unsupervised loss for all
input data. After training, only the main model is used for testing,
greatly reducing the number of network parameters during testing
while achieving fine segmentation results.Consequently, the con-
tributions and novelty of this paper are summarised as follows:
•This paper utilizes two complementary auxiliary models that
are constructed by alternating the use of skip connections.
This creates a model disturbance from a complementary in-
formation perspective, effectively utilizing unlabeled data.
•The use of model consistency methods allows the main model
to learn complementary information from the auxiliary mod-
els.
•An independent encoder structure is proposed for comple-
mentary consistency learning.
This method is validated on two public datasets and the re-
sults show that it effectively increases the utilization of unlabeled
data and achieves excellent semi-supervised segmentation perfor-
mance.
2. Related work
2.1. Semi-supervised segmentation
Due to the high cost and difficulty in obtaining labeled data in
segmentation tasks, semi-supervised segmentation has been vigor-
ously developed to learn useful information from unlabeled data
as much as possible. Zhai et al. (2022) defined two asymmetric
generators and a discriminator for adversarial learning, in which
the discriminator filters the mask generated by the generators to
provide supervised information for the network to learn from unla-
beled data. Xiao et al. (2022) added a teacher model that combines
CNN and Transformer structures on the basis of the mean teacher
model, aiming to guide the student model to learn more informa-
tion. Zhang et al. (2022) proposed a dual correction method to
improve the quality of pseudo labels and obtain better segmenta-
tion performance. This paper constructs a complementary auxil-
iary model to help the main model explore the ambiguous area of
unlabeled data, and the complementary consistency between the
main model and the auxiliary model effectively learns from unla-
beled data.
2.2. Consistency regularization
Consistency regularization is a common and effective method
in semi-supervised learning. To address inherent perturbations
among related tasks, Luo et al. (2021) introduced the dual-task
consistency between the segmentation map derived from the level
set and directly predicted segmentation map. Liu et al. (2022)
added a classification model on top of the segmentation model
and constructed a contrastive consistency loss using the class vec-
tors obtained from the classification model. Wang et al. (2022a)
added spatial context perturbations to the input on top of model-
level perturbations, resulting in a dual-consistency segmentation
network. Hu et al. (2022) embedded self-attention in the mean
teacher model and encouraged the attention maps to remain con-
sistent across feature layers, forming attention-guided consistency.
In the cross-modal domain, Chen et al. (2022b) used cross-modal
consistency between non-paired CT and MRI to learn modality-
independent information. Ouali et al. (2020) proposed cross-
consistency to enforce consistency between the primary decoder
and auxiliary decoder in decision-making for low-density regions.
To address perturbations between different levels of image en-
hancement, Zhong et al. (2021) introduced pixel-wise contrastive
consistency based on label consistency property and contrastive
feature space property between pixels. Inspired by mutual con-
sistency, our approach uses complementary consistency based on
model consistency to enable the main model to learn complemen-
tary information from the auxiliary model.
2.3. Multi-view training
The purpose of multi-view training is to utilize the redundant in-
formation between different views to improve learning efficiency
(Yan et al., 2021). Chen et al. (2018) constructs three divergent
models and generates pseudo-labels using a voting mechanism.
With a large amount of unlabeled data and the introduction of
noise, this method yields good results. Xia et al. (2020) performs
multi-view collaborative training by rotating the input and using
the uncertainty estimates from each view to obtain accurate seg-
mentation. This method is simple to implement, but requires a
relatively large number of views to achieve ideal performance,
leading to redundant learning. Zheng et al. (2022) splits labeled
data into complementary subsets and trains two models with each
subset, effectively improving the network’s ability to explore am-
biguous areas. Our approach constructs two complementary aux-
iliary models to guide the main model’s attention to ambiguous
areas from two complementary views. Additionally, leveraging
the multi-view information leads to low-entropy predictions.
2.4. Uncertainty estimation
Uncertainty estimation helps reduce the randomness of predic-
tions, and is crucial for learning reliable information from un-
labeled data (Chen et al., 2022a). In (Yu et al., 2019), Monte
Carlo sampling is used to obtain uncertainty maps from the teacher
model, which guides the student model to gradually acquire reli-
able information. Zheng et al. (2022) uses uncertainty maps as
weights for the loss to learn high-confidence information. Wu
et al. (2022) constructs three different upsampling decoders and
uses mutual learning to obtain low-uncertainty predictions. Wang
et al. (2022b) uses a triple uncertainty-guided framework, allow-
ing the student model to obtain more reliable knowledge from the
teacher model for all three tasks. Our method reduces uncertainty
2