
An Unsupervised Hunt for Gravitational Lenses
2015]). These will enable the detection of orders-of-
magnitude more strong lenses than with current data
(Oguri and Marshall, 2010). The early challenge of an-
alyzing these surveys is that astronomers will not have
access to lenses to build their classifiers. In this case,
there are two primary options: (1) use lenses found
from other surveys and hope the features are effective
and transferable, or (2) create simulated lenses based
on each survey and train a classifier on those. For (1),
the biggest issue is that the transferred performance
may vary significantly. This is due to the fact that
images from these other surveys are produced with
different instruments as well as different preprocessing
techniques. Therefore, the samples used for training
may be too distributionally dissimilar (i.e. covariate
shift) from their target to be useful. One possibility to
ameliorate the effects of covariate shifts is to use Cycle-
GANs (Zhu et al., 2017) to transform these images to
look like the target data distribution. However, it still
doesn’t solve the issue of the extremely small number
of known lenses with heterogeneous imaging. So (2)
is the realistic option for producing consistent perfor-
mance across surveys by simulating lenses directly on
the target set.
Using simulations for training data is quite common
in deep learning (Nikolenko, 2019). The problem with
option (2) though is that researchers will be creating
simulated lenses without a reference point for how they
look in their survey. This results in classifiers having
good performance when evaluating on held out sim-
ulations, but poor performance when classifying real
lenses. This is especially problematic for multi-channel
images (see Fig. 1) since getting the channel informa-
tion incorrect can lead to an ineffective classifier. In-
stead of trying to get all the channel information of the
arcs correct, one possibility is to simulate lenses on a
single channel and build classifiers to detect lenses in
this setting (Ca˜nameras et al., 2020). This sidesteps
the issue of getting the channel information correct,
but this workaround causes us to lose some contex-
tual information about the “coloring” of the lenses and
the surrounding objects, which may actually help the
model learn to detect lenses. As a result, we do not
explore this option in this paper. We also do not ex-
plore using pretrained networks here. Instead, we will
focus on a completely self-contained regimen for build-
ing classifiers from simulated data. Data augmenta-
tion is one way to address this issue of realism without
sacrificing this multi-channel information from the im-
age. Secondly, while we can obtain a small sample of
non-lensed images to train our classifier, the majority
of the survey remains unlabeled, so the use of semi-
supervised learning (SSL) algorithms is also a prudent
direction to boost the performance of the classifier.
By understanding the correct ways to leverage these
methods in concert, we can show that you can create
highly effective classifiers for detecting lenses even if
you only train on potentially “bad” simulated lenses.
2 SSL And Unsupervised Learning
The simplest approach to building a classifier is to use
the simulated lenses as our target and train a fully
supervised classifier. The limitations of course is that
the unlabelled data isn’t leveraged and the simulated
lens distribution may differ from that of the real lenses.
2.1 Semi-supervised Learning
We find that SSL algorithms are another indispens-
able tool for classifying lenses. In recent years, the
field of deep learning has seen significant progress in
the area of semi-supervised learning algorithms (Yang
et al., 2021; van Engelen and Hoos, 2019). Instead of
covering all of them, we will focus on a narrow collec-
tion of state-of-the-art algorithms: Pseudo-label (Lee,
2013), Π-model (Laine and Aila, 2017), Mean Teacher
(Tarvainen and Valpola, 2017), VAT (Tarvainen and
Valpola, 2017), MixMatch (Berthelot et al., 2019).
For semi-supervised learning algorithms, there are usu-
ally two primary goals: consistency regularization and
entropy minimization. Some SSL methods (e.g. consis-
tency regularization) considered here require data aug-
mentation (DA), and we summarize the DA methods
used in Table 2. These methods are chosen specifically
with this application in mind.
Consistency regularization is based on the idea that
a classifier should output the same predictions even if
the image has been augmented. This is usually carried
out by appending a regularizing term to the loss that
computes the “distance” between the outputs of the
classifier evaluated on two stochastically augmented
versions of the same image. Almost all the algorithms
we listed above utilize this in some form or another,
with the exception of pseudo-label. The set of aug-
mentations is also usually something predefined, which
means that the application isn’t domain agnostic, and
performance will largely depend on the domain-specific
augmentations. The exception of course is VAT (Miy-
ato et al., 2019), which generates the augmentations
during training instead of being predefined.
Entropy minimization is based on the idea that the
decision boundary of the classifier should lie in low-
density regions. Worded another way, if two images
x1and x2are close in a high-density region then the
predictions y1and y2should be close as well. Pseudo-
label and MixMatch both try to enforce these proper-
ties. Pseudo-label does it by assigning pseudo-labels
to unlabeled images which are determined by the class