2nd Place Solution to Google Universal Image Embedding Xiaolong Huang1Qiankun Li23 1School of Artiﬁcial Intelligence Chongqing University of Technology

2025-04-29 0 0 1.83MB 3 页 10玖币

侵权投诉

2nd Place Solution to Google Universal Image Embedding

Xiaolong Huang1Qiankun Li2,3*

1School of Artiﬁcial Intelligence, Chongqing University of Technology,

2Institute of Intelligent Machines, Chinese Academy of Sciences,

3Department of Automation, University of Science and Technology of China

Abstract

This paper presents the 2nd place solution to the Google

Universal Image Embedding Competition 2022. We use the

instance-level ﬁne-grained image classiﬁcation method to

complete this competition. We focus on data processing,

model structure, and training strategies. To balance the

weights between classes, we employ sampling and resam-

pling strategies. For model selection, we choose the CLIP-

ViT-H-14 model pretrained on LAION-2B. Besides, we re-

moved the projection layer to reduce the risk of overﬁtting.

In addition, dynamic margin and stratiﬁed learning rate

training strategies also improve the model’s performance.

Finally, the method scored 0.713 on the public leaderboard

and 0.709 on the private leaderboard. Code is available at

https://github.com/XL-H/GUIE-2nd-Place-

Solution.

1. Introduction

The Google Universal Image Embedding Competition

[3] is part of the ECCV 2022 Instance-Level Recognition

Workshop. Unlike previous competitions [1,2], this year

does not focus on the landmark domain. Instead, it requires

that submitted models can be applied to many different ob-

ject types. Speciﬁcally, participants were asked to submit

a model that extracts a feature embedding of fewer than 64

dimensions for each image in the test set to represent its in-

formation. The submitted model should be able to retrieve

the database images associated with a given query image.

There are 200,000 index images and 5,000 query images in

the test set, covering a variety of object types such as cloth-

ing, artwork, landmarks, furniture, packaging items, etc.

In this paper, we present our 2nd place detailed solu-

tion to the Google Universal Image Embedding Competi-

tion 2022 [3]. Since the competition did not provide train-

ing data, building a generic dataset was important. Accord-

ing to the ofﬁcial categories, we sampled from various open

*Corresponding Author: Qiankun Li (qklee@mail.ustc.edu.cn).

benchmarks and preprocessed the data to build a univer-

sal dataset containing 7,400,000 images. For image em-

bedding, standard solutions include Contrastive Learning,

Generic Classiﬁcation, Fine-grained Classiﬁcation, and so

on. Considering that the test set for the competition is an-

notated at the instance level, we use a Fine-grained Clas-

siﬁcation. To obtain a strong baseline, the Transformer-

based [7–9] OpenClip-ViT-H-14 [11] model is used for fea-

ture extraction. To enhance the stability of model training,

we use pre-trained weights on a two-billion-scale image-

text pairs dataset LAION-2B [10] to initialize the Open-

CLIP [11]. In addition, dynamic margin [6] and stratiﬁed

learning rate training strategies also helped us win 2nd place

in the competition.

2. Method

2.1. Datasets

Datasets selection. Since we use the Fine-grained Clas-

siﬁcation method, the corresponding datasets should select

instance-level data. After the information collected from

major open websites, the selected datasets are as follows

[4]: Aliproducts, Art MET, DeepFashion (consumer-to-

shop), DeepFashion2(hard-triplets), Fashion200K, ICCV

2021 LargeFineFoodAI, Food Recognition 2022, JD Prod-

ucts 10K, Landmark2021, Grocery Store, rp2k, Shopee,

Stanford Cars, Stanford Products. The tasks of these

datasets are related to image retrieval or ﬁne-grained recog-

nition.

Datasets pre-processing. The ability of the model to iden-

tify a class of instances is not directly related to the number

of images of the class. Therefore, the reasonable size of

each class may better improve the model performance and

save training time. We sampled 100 images from categories

with more than 100 images and ﬁltered out categories with

less than 3 images. To ensure the model can extract enough

information from afferent instances in the training data, we

resampled lightly to balance the class weight. The mini-

mum number of each class was set to 20. Finally, we ob-

arXiv:2210.08735v2 [cs.CV] 19 Oct 2022

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

2ndPlaceSolutiontoGoogleUniversalImageEmbeddingXiaolongHuang1QiankunLi2;3*1SchoolofArticialIntelligence,ChongqingUniversityofTechnology,2InstituteofIntelligentMachines,ChineseAcademyofSciences,3DepartmentofAutomation,UniversityofScienceandTechnologyofChinaAbstractThispaperpresentsthe2ndplacesolutio...

展开>> 收起<<

2nd Place Solution to Google Universal Image Embedding Xiaolong Huang1Qiankun Li23 1School of Artiﬁcial Intelligence Chongqing University of Technology.pdf

共3页,预览1页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

2nd Place Solution to Google Universal Image Embedding Xiaolong Huang1Qiankun Li23 1School of Artiﬁcial Intelligence Chongqing University of Technology

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: