General Image Descriptors for Open World Image Retrieval using ViT CLIP Marcos V . Conde1 Ivan Aerlic2 Simon J egou3 1H2O.ai and Computer Vision Lab CAIDAS University of W urzburg Germany

2025-05-06 0 0 3.73MB 5 页 10玖币
侵权投诉
General Image Descriptors for Open World Image Retrieval using ViT CLIP
Marcos V. Conde1, Ivan Aerlic2, Simon J´
egou3
1H2O.ai and Computer Vision Lab, CAIDAS, University of W¨
urzburg, Germany
2Independent researcher and Team Leader, Australia
3Independent researcher, France
marcos.conde-osorio@uni-wuerzburg.de
https://github.com/IvanAer/G-Universal-CLIP
Abstract
The Google Universal Image Embedding (GUIE) Chal-
lenge is one of the first competitions in multi-domain image
representations in the wild, covering a wide distribution of
objects: landmarks, artwork, food, etc. This is a funda-
mental computer vision problem with notable applications
in image retrieval, search engines and e-commerce.
In this work, we explain our 4th place solution to the
GUIE Challenge, and our ”bag of tricks” to fine-tune zero-
shot Vision Transformers (ViT) pre-trained using CLIP.
1. Introduction
Image representations are a critical building block of
computer vision applications [11]. Traditionally, research
on image embedding learning has been conducted with a fo-
cus on per-domain models [18,20,23]. Generally, solutions
are based on generic embedding learning techniques which
are applied to different domains separately, rather than de-
veloping generic embedding models which could be applied
to all domains combined.
At the Google Universal Image Embedding (GUIE)
Challenge, the proposed models are expected to retrieve rel-
evant index database images to a given query image (i.e. im-
ages containing the same object as the query) considering a
great variety of domains. Our proposed solution has real-
world visual search applications, such as organizing photos,
improving search engines, and visual e-commerce.
Problem definition We seek for a function φsuch that:
φ:RH×W×37→ R64 φ(x) = qR64 (1)
given an input 3-channel RGB image xof dimension
H×W, our model φextract a compact 64-dimensional
(64D) image descriptor or embedding φ(x).
Then the image retrieval task [1,4,20] considers an
index-reference database of images Z={z1, z2, . . . , zn},
and a given a query image x, we calculate
argmin
Z
kφ(x)φ(zi)k2
2(2)
finally retrieve the top-kmost similar images (i.e. those that
minimize the previous equation).
Evaluation Methods are evaluated according to the mean
Precision at k= 5 (abbreviated as mP @5):
mP @5 = 1
Q
Q
X
q=1
1
min(nq,5)
min(nq,5)
X
j=1
relq(j)(3)
where Qis the number of query images, nqis the number
of index images containing an object in common with the
query image q. Note that nq>0for any query image q.
The term relq(j)denotes the relevance of prediction jfor
the q-th query: relq(j) = 1 if the j-th prediction is correct,
and 0 otherwise. Participants must submit a model file (e.g.
.pt). The model must take an image as an input, and return
a float vector (i.e. the image embedding) as the output. The
challenge platform Kaggle use the submitted model to:
1. Extract embeddings for the private test dataset (query
and index images).
2. Create a kNN (k= 5) lookup for each test sample,
using the Euclidean distance between test and index
embeddings. See Equation 2.
3. Score the quality of the lookups using Equation 3.
In Figure 1we provide an illustrative example of a real-
world image retrieval system, similar to the one employed
in this challenge for evaluating the quality of the produced
image descriptors.
1
arXiv:2210.11141v1 [cs.CV] 20 Oct 2022
摘要:

GeneralImageDescriptorsforOpenWorldImageRetrievalusingViTCLIPMarcosV.Conde1,IvanAerlic2,SimonJ´egou31H2O.aiandComputerVisionLab,CAIDAS,UniversityofW¨urzburg,Germany2IndependentresearcherandTeamLeader,Australia3Independentresearcher,Francemarcos.conde-osorio@uni-wuerzburg.dehttps://github.com/IvanAer...

展开>> 收起<<
General Image Descriptors for Open World Image Retrieval using ViT CLIP Marcos V . Conde1 Ivan Aerlic2 Simon J egou3 1H2O.ai and Computer Vision Lab CAIDAS University of W urzburg Germany.pdf

共5页,预览1页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:5 页 大小:3.73MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 5
客服
关注