
On Background Bias in Deep Metric Learning
Konstantin Kobs and Andreas Hotho
University of W¨urzburg, Am Hubland, 97074 W¨urzburg, Germany
ABSTRACT
Deep Metric Learning trains a neural network to map input images to a lower-dimensional embedding space such
that similar images are closer together than dissimilar images. When used for item retrieval, a query image is
embedded using the trained model and the closest items from a database storing their respective embeddings
are returned as the most similar items for the query. Especially in product retrieval, where a user searches for
a certain product by taking a photo of it, the image background is usually not important and thus should not
influence the embedding process. Ideally, the retrieval process always returns fitting items for the photographed
object, regardless of the environment the photo was taken in. In this paper, we analyze the influence of the
image background on Deep Metric Learning models by utilizing five common loss functions and three common
datasets. We find that Deep Metric Learning networks are prone to so-called background bias, which can lead to
a severe decrease in retrieval performance when changing the image background during inference. We also show
that replacing the background of images during training with random background images alleviates this issue.
Since we use an automatic background removal method to do this background replacement, no additional manual
labeling work and model changes are required while inference time stays the same. Qualitative and quantitative
analyses, for which we introduce a new evaluation metric, confirm that models trained with replaced backgrounds
attend more to the main object in the image, benefitting item retrieval systems.
Keywords: Deep Metric Learning, Background Bias, Item Retrieval
1. INTRODUCTION
Deep Metric Learning (DML) is the task of learning a neural network to embed input items (in this case, images)
such that embeddings of similar items are closer together than embeddings of dissimilar items.3This technique
is often used for face recognition, person reidentification, and item retrieval.4For instance in item retrieval, a
query image of an item is used to find semantically similar images by identifying the closest images in embedding
space. Two images are deemed similar if they show the same item. Given this definition, the background of
the images should not play a role in the embedding process, since objects can be photographed in different
environments and thus appear in front of different backgrounds. Similar desired properties can be defined for
other DML applications such as person reidentification.
Previous analytical work for the different task of content classification shows that neural networks suffer
from so-called background bias, i.e. they use information from the image background to identify the image
category. For example, image classifiers trained to identify ships often focus on the water and not on the ship
itself. This way, the classifier is not able to identify ships at land.5
Since DML does not classify images but embeds them, the findings on background bias from the literature are
not directly transferable to these models. If background bias was also present in DML models, image backgrounds
would influence the embedding process. Then, taking a picture of an object on the street or in a studio setup
could lead to different search results when searched for in item retrieval methods, resulting in performance
degradations of the item retrieval system. Figure 1shows such a situation: Placing the bike in front of a brick
wall or a studio backdrop gives completely different nearest neighbor search results. This is not desirable, since
the retrieval system should only take the main object into account.
Further author information: (Send correspondence to K.K.)
K.K.: E-mail: kobs@informatik.uni-wuerzburg.de
A.H.: E-mail: hotho@informatik.uni-wuerzburg.de
arXiv:2210.01615v1 [cs.CV] 4 Oct 2022