1.1 Related Work
Sample reweighting. Sample reweighting is a popular strategy for dealing with dis-
tribution/subpopulation shifts in training data where different weights are assigned to
samples from different subpopulations. In particular, the distributionally-robust opti-
mization (DRO) framework [Ben-Tal et al.,2013,Duchi and Namkoong,2018,Duchi
et al.,2016,Sagawa et al.,2020] considers a collection of training sample groups from
different distributions. With the explicit grouping of samples, the goal is to minimize
the worst-case loss over the groups. Without prior knowledge of sample grouping, im-
portance sampling [Alain et al.,2015,Gopal,2016,Katharopoulos and Fleuret,2018,
Loshchilov and Hutter,2015,Needell et al.,2014,Zhao and Zhang,2015], iterative trim-
ming [Kawaguchi and Lu,2020,Shen and Sanghavi,2019], and empirical-loss-based
reweighting [Wu et al.,2022] are commonly incorporated in the stochastic optimization
process for adaptive reweighting and separation of samples from different subpopulations.
Data augmentation consistency regularization. As a popular way of exploiting data
augmentations, consistency regularization encourages models to learn the vicinity among
augmentations of the same sample based on the assumption that data augmentations gen-
erally preserve the semantic information in data and therefore lie closely on proper mani-
folds. Beyond being a powerful building block in semi-supervised [Bachman et al.,2014,
Berthelot et al.,2019,Laine and Aila,2016,Sajjadi et al.,2016,Sohn et al.,2020] and
self-supervised [Chen et al.,2020,Grill et al.,2020,He et al.,2020,Wu et al.,2018] learn-
ing, the incorporation of data augmentation and consistency regularization also provably
improves generalization and feature learning even in the supervised learning setting [Shen
et al.,2022,Yang et al.,2022].
For medical imaging, data augmentation consistency regularization is generally lever-
aged as a semi-supervised learning tool [Basak et al.,2022,Bortsova et al.,2019,Li et al.,
2020,Wang et al.,2021a,Zhang et al.,2021,Zhao et al.,2019,Zhou et al.,2021]. In ef-
forts to incorporate consistency regularization in segmentation tasks with augmentation-
sensitive labels, Li et al. [2020] encourages transformation consistency between predic-
tions with augmentations applied to the image inputs and the segmentation outputs. Basak
et al. [2022] penalizes inconsistent segmentation outputs between teacher-student mod-
els, with MixUp [Zhang et al.,2017] applied to image inputs of the teacher model and
segmentation outputs of the student model. Instead of enforcing consistency in the seg-
mentation output space as above, we leverage the insensitivity of sparse labels to aug-
mentations and encourage consistent encodings (in the latent space of encoder outputs)
on label-sparse samples.
2 Problem Setup
Notation. For any K∈N, we denote [K] = {1, . . . , K}. We represent the elements
and subtensors of an arbitrary tensor by adapting the syntax for Python slicing on the
subscript (except counting from 1). For example, x[i,j]denotes the (i, j)-entry of the two-
dimensional tensor x, and x[i,:] denotes the i-th row. Let Ibe a function onto {0,1}such
that, for any event e,I{e}= 1 if eis true and 0otherwise. For any distribution Pand
n∈N, we let Pndenote the joint distribution of nsamples drawn i.i.d. from P. Finally,
4