
2nd Place Solution to Google Universal Image Embedding
Xiaolong Huang1Qiankun Li2,3*
1School of Artificial Intelligence, Chongqing University of Technology,
2Institute of Intelligent Machines, Chinese Academy of Sciences,
3Department of Automation, University of Science and Technology of China
Abstract
This paper presents the 2nd place solution to the Google
Universal Image Embedding Competition 2022. We use the
instance-level fine-grained image classification method to
complete this competition. We focus on data processing,
model structure, and training strategies. To balance the
weights between classes, we employ sampling and resam-
pling strategies. For model selection, we choose the CLIP-
ViT-H-14 model pretrained on LAION-2B. Besides, we re-
moved the projection layer to reduce the risk of overfitting.
In addition, dynamic margin and stratified learning rate
training strategies also improve the model’s performance.
Finally, the method scored 0.713 on the public leaderboard
and 0.709 on the private leaderboard. Code is available at
https://github.com/XL-H/GUIE-2nd-Place-
Solution.
1. Introduction
The Google Universal Image Embedding Competition
[3] is part of the ECCV 2022 Instance-Level Recognition
Workshop. Unlike previous competitions [1,2], this year
does not focus on the landmark domain. Instead, it requires
that submitted models can be applied to many different ob-
ject types. Specifically, participants were asked to submit
a model that extracts a feature embedding of fewer than 64
dimensions for each image in the test set to represent its in-
formation. The submitted model should be able to retrieve
the database images associated with a given query image.
There are 200,000 index images and 5,000 query images in
the test set, covering a variety of object types such as cloth-
ing, artwork, landmarks, furniture, packaging items, etc.
In this paper, we present our 2nd place detailed solu-
tion to the Google Universal Image Embedding Competi-
tion 2022 [3]. Since the competition did not provide train-
ing data, building a generic dataset was important. Accord-
ing to the official categories, we sampled from various open
*Corresponding Author: Qiankun Li (qklee@mail.ustc.edu.cn).
benchmarks and preprocessed the data to build a univer-
sal dataset containing 7,400,000 images. For image em-
bedding, standard solutions include Contrastive Learning,
Generic Classification, Fine-grained Classification, and so
on. Considering that the test set for the competition is an-
notated at the instance level, we use a Fine-grained Clas-
sification. To obtain a strong baseline, the Transformer-
based [7–9] OpenClip-ViT-H-14 [11] model is used for fea-
ture extraction. To enhance the stability of model training,
we use pre-trained weights on a two-billion-scale image-
text pairs dataset LAION-2B [10] to initialize the Open-
CLIP [11]. In addition, dynamic margin [6] and stratified
learning rate training strategies also helped us win 2nd place
in the competition.
2. Method
2.1. Datasets
Datasets selection. Since we use the Fine-grained Clas-
sification method, the corresponding datasets should select
instance-level data. After the information collected from
major open websites, the selected datasets are as follows
[4]: Aliproducts, Art MET, DeepFashion (consumer-to-
shop), DeepFashion2(hard-triplets), Fashion200K, ICCV
2021 LargeFineFoodAI, Food Recognition 2022, JD Prod-
ucts 10K, Landmark2021, Grocery Store, rp2k, Shopee,
Stanford Cars, Stanford Products. The tasks of these
datasets are related to image retrieval or fine-grained recog-
nition.
Datasets pre-processing. The ability of the model to iden-
tify a class of instances is not directly related to the number
of images of the class. Therefore, the reasonable size of
each class may better improve the model performance and
save training time. We sampled 100 images from categories
with more than 100 images and filtered out categories with
less than 3 images. To ensure the model can extract enough
information from afferent instances in the training data, we
resampled lightly to balance the class weight. The mini-
mum number of each class was set to 20. Finally, we ob-
1
arXiv:2210.08735v2 [cs.CV] 19 Oct 2022