
SimANS: Simple Ambiguous Negatives Sampling for Dense Text Retrieval
Kun Zhou1,3†
, Yeyun Gong4, Xiao Liu4, Wayne Xin Zhao2,3∗
, Yelong Shen5, Anlei Dong5,
Jingwen Lu5,Rangan Majumder5,Ji-Rong Wen2,3,Nan Duan4,Weizhu Chen5
1School of Information, Renmin University of China,
2Gaoling School of Artificial Intelligence, Renmin University of China,
3Beijing Key Laboratory of Big Data Management and Analysis Methods,
4Microsoft Research, 5Microsoft
Abstract
Sampling proper negatives from a large docu-
ment pool is vital to effectively train a dense re-
trieval model. However, existing negative sam-
pling strategies suffer from the uninformative
or false negative problem. In this work, we em-
pirically show that according to the measured
relevance scores, the negatives ranked around
the positives are generally more informative
and less likely to be false negatives. Intuitively,
these negatives are not too hard (may be false
negatives) or too easy (uninformative). They
are the ambiguous negatives and need more at-
tention during training. Thus, we propose a
simple ambiguous negatives sampling method,
SimANS, which incorporates a new sampling
probability distribution to sample more am-
biguous negatives. Extensive experiments on
four public and one industry datasets show the
effectiveness of our approach. We made the
code and models publicly available in https:
//github.com/microsoft/SimXNS.
1 Introduction
Dense text retrieval, which uses low-dimensional
vectors to represent queries and documents and
measure their relevance, has become a popular
topic (Karpukhin et al.,2020;Luan et al.,2021)
for both researchers and practitioners. It can im-
prove various downstream applications, e.g., web
search (Brickley et al.,2019;Qiu et al.,2022) and
question answer (Izacard and Grave,2021). A key
challenge for training a dense text retrieval model
is how to select appropriate negatives from a large
document pool (i.e., negative sampling), as most
existing methods use a contrastive loss (Karpukhin
et al.,2020;Xiong et al.,2021) to encourage the
model to rank positive documents higher than neg-
atives. However, the commonly-used negative
sampling strategies, namely random negative sam-
pling (Luan et al.,2021;Karpukhin et al.,2020)
†† This work was done during internship at MSRA.
∗∗
Corresponding author, email: batmanfly@gmail.com.
(using random documents in the same batch) and
top-
k
hard negatives sampling (Xiong et al.,2021;
Zhan et al.,2021) (using an auxiliary retriever to
obtain the top-
k
documents), have their limitations.
Random negative sampling tends to select uninfor-
mative negatives that are rather easy to be distin-
guished from positives and fail to provide useful
information (Xiong et al.,2021), while top-
k
hard
negatives sampling may include false negatives (Qu
et al.,2021), degrading the model performance.
Motivated by these problems, we propose to sam-
ple the ambiguous negatives
1
that are neither too
easy (uninformative) nor too hard (potential false
negatives). Our approach is inspired by an empir-
ical observation from experiments (in §3) using
gradients to assess the impact of data instances on
deep models (Koh and Liang,2017;Pruthi et al.,
2020): according to the measured relevance scores
using the dense retrieval model, negatives that rank
lower are mostly uninformative, as their gradient
means are close to zero; negatives that rank higher
are likely to be false negatives, as their gradient
variances are significantly higher than expected.
Both types of negatives are detrimental to the con-
vergence of deep matching models (Xiong et al.,
2021;Qu et al.,2021). Interestingly, we find that
the negatives ranked around positive examples tend
to have relatively larger gradient means and smaller
variances, indicating that they are informative and
have a lower risk of being false negatives, thus
probably being high-quality ambiguous negatives.
Based on these insights, we propose a
Sim
ple
A
mbiguous
N
egative
S
ampling method, namely
SimANS
, for improving deep text retrieval. Our
main idea is to design a sampling probability distri-
bution that can assign higher probabilities to the am-
biguous negatives while lower probabilities to the
1
We call them ambiguous negatives following the def-
inition of ambiguous examples (Swayamdipta et al.,2020;
Meissner et al.,2021), referring to the instances that are nei-
ther too hard nor too easy to learn.
arXiv:2210.11773v2 [cs.CL] 24 Oct 2022