2 S. Mohamadi et al.
semantic segmentation, and iris recognition [1,2,3] due to its emergence as a high-
performing approach, primarily conditioned on the availability of large amounts
of training data. An interesting challenge is how to efficiently incorporate data-
hungry deep learning tools into supposedly data-efficient AL frameworks.
Adjusting AL algorithms for deep neural networks has been very challeng-
ing, where extending the model complexity/capacity to that of CNNs ultimately
ended up with either a poor performance, or some minor improvements at the
cost of querying almost all samples. On the other hand, sequential training of
such expressive models as well as extending the framework to high dimensional
data injects even more complexity [4,5,6]. This challenge was relatively under-
explored, until a breakthrough work by Gal et al [7], which essentially considered
the problem of incorporating deep learning into AL for high dimensional data
as highly connected with that of uncertainty representation [8]. They thus ap-
proached the problem from the perspective of uncertainty representation in deep
learning for AL, and developed a Bayesian AL framework for image data. Later
work (such as [9]), however, argued that the approach exhibits poor scalability
to big datasets due to its limited model capacity .
Another approach that also relied on uncertainty representation, is ensemble-
based AL [9]. Here, an ensemble of classifiers is used, where the classifiers in-
dependently learn from the data in parallel. The major drawback is the poor
diversity (lack of exploration) even with larger ensembles. Our approach, while
enjoying the power of ensembles, solves this problem by offering an inherent ex-
ploration/exploitation trade-off as classifiers maintain some dependency in the
form of a shared prior. Apart from uncertainty representation, another set of
emerging methods that primarily rely on geometrical data representation [10]
showed improved performance in deep AL. However, similar to [11], we empiri-
cally observed that these geometric approaches typically suffer from performance
degradation as the class diversity (number of classes) increases. Another recent
approach is the work reported in [11] where they take advantage of adversarial
training to provide improved performance over previous methods. We empirically
find that their work provided a balanced performance on datasets at different
scales and diversity. As we will show later, our proposed model outperforms this
approach in multiple settings with significant margins, with results approaching
that of supervised learning models in some cases.
In the first part of the paper, primarily motivated to efficiently integrate
the advantages of uncertainty and geometrical representations, we propose an
approach built upon approximate Thompson sampling. On one hand, this pro-
vides an improved representation of uncertainty over unlabeled data, and on the
other hand, supports an inherent tune-able exploration/exploitation trade-off
for diverse sampling [12,13]. Unlike conventional ensemble-based methods whose
performance tend to saturate quickly, under our tuneable model, adding a few
more classifiers tends to improve the uncertainty and geometric representation.
To mitigate the general sample diversity problem of ensemble models (see [14,9]
), we use an inclusive sample selection strategy. Our framework showed a no-
ticeable improvement over the state-of-the-art, with performance approaching