
Figure 2: Positive samples in contrastive learning. In EA-MoCo, our env-aware baseline over Mo-
Cov3, for the positive samples we use: the basic augmented anchor (left) and the closest sample from
another, randomly chosen, environment (right). We compute distances over representations obtained
with an diffusion based autoencoder, learned under ERM, over samples from all environments.
features spuriously correlated with the target. Inferring these latent factors is an extremely difficult
problem, seen as a goal of representation learning [
32
]. It is impossible in the unsupervised setting
without additional inductive biases, or other information [
17
,
29
] and it is outside our scope. Instead,
we start from a weaker assumption, that we have data in which only the
Style
is changed. We aim
to use this factorization in AD to highlight directions toward building methods that are robust to
irrelevant changes (involving Style) while capable of detecting relevant changes (involving Content).
Environments
We call domains or environments [
18
,
2
] sub-groups of the data, each with a different
distribution, but all respecting some common basic rules. Namely, the Content is shared, but Style or
relations involving Style change. Examples of domains include pictures taken indoor vs. outdoor
[
51
], or in different locations [
5
], real photos vs sketches [
25
], or images of animals with changing
associations in each environments [
26
]. Our goal is to be robust to the Style differences between
different environments while identifying the Content changes as anomalies.
2.1 Out-of-distribution regimes
When dealing with real-world data, the test distributions usually differ from the training ones,
encountering changes in Style or/and Content. We provide next an in-depth characterization of
possible scenarios for AD in those regimes, linking them to common methods that work in each
category for supervised tasks. For explicit examples and details, see Appendix A.2.
A. ID setting
The default paradigm in Machine Learning, both in supervised and unsupervised
learning. Although this is the default paradigm, the usual assumption that train and test data come from
the same distribution is very strong and almost never true for real-world datasets [9, 45, 12, 27, 18].
B. Style OOD
Most works that develop methods robust to some (i.e. Style) distribution changes
reside in this category [
43
,
2
,
19
,
49
]. Environments have differences based on Style, but have the
same Content and the goal is to learn representations that are invariant across environments.
C. Content OOD
The assumption here is that environments contain changes in distribution that are
always relevant (i.e. changes in Content) for the task and should be noticed. Methods in this category
must detect such changes while optionally performing another basic task. Anomaly, novelty, or OOD
detection methods work in this regime [48].
D. Style and Content OOD
Here, environments bring changes in both Content and Style. We argue
that this is the most realistic setting and it is mainly unaddressed in the anomaly detection literature.
An ideal anomaly detection method will only detect Content anomalies, while being robust to Style
changes. Our main analyses and experiments are performed in this setting, showing the blind spots of
current approaches and possible ways forward.
We formalize and detail the distribution shifting scenarios in Appendix A.2. To the best of our
knowledge, we are the first to cover this topic for anomaly detection in particular and for unsupervised
learning in general.
3