develop high-performance algorithms in typical hash-based applications but also provides potential
possibilities to inspire future works in hash learning.
Although some works [
1
,
54
] try to answer the above questions by building connections between
hash codes and common descriptors with domain knowledge, we still seek theoretical guarantees for
hash codes’ performance. Similar situations occur in related works [
34
,
52
,
53
], where objectives to
optimize hash-models are designed by intuitions or heuristics.
By formulating correlations among hash codes as inter-class distinctiveness and intra-class com-
pactness, we try to address the above issues and provide a lower bound of hash codes’ performance.
Based on this, we could further derive from them as objectives to lift the lower bound. Moreover, to
effectively train model towards such targets, estimating and further controlling the posterior of hash
codes is necessary since bit-level dependence exists in posterior distribution. If we neglect this, bias
will be introduced during hash-model optimization, hindering performance. Summarizing both of
them, Our main contributions are summarized as follows:
1) To the best of our knowledge, we are the first to formulate a lower bound of hash codes’ per-
formance, which is directly proportional to inter-class distinctiveness and intra-class compactness
among codes. Based on this, an objective is further proposed to lift this lower bound and thus enhance
performance.
2) A novel posterior estimation over hash codes is proposed to perform low-bias optimization towards
the above objective. As a non-linear function, it could be utilized for optimizing hash-models via
gradient descent. Such plug-and-play component is able to integrate into current hash-models to
boost performance.
3) Extensive experiments on three benchmark datasets reveal the effectiveness of the proposed method
in two typical tasks. Specifically, we obtain an up to
26.5%
increase on mean Average Precision for
fast retrieval. We also observe an up to 20.5% increase on accuracy for hash-based recognition.
The rest of paper is organized as follows: We first briefly review current hashing methods and
give preliminaries (Secs. 2and 3). Then, lower bound on hash codes’ performance by inter-class
distinctiveness and intra-class compactness is proposed (Sec. 4). To fully exploit such guidance,
posterior estimation over multivariate Bernoulli distribution is designed for code control (Sec. 5).
Experiments (Sec. 6) are then conducted for evaluation.
2 Related Works
This paper will focus on data-dependent learning-to-hash methods. Recent advances in building
these models are validated to be effective in hash-based recognition and fast retrieval [
1
,
21
,
22
,
30
,
45
]. The typical approach utilizes a projection function to convert raw inputs into compact
binary codes and conduct classification or approximate nearest neighbour search on codes for
downstream tasks [
9
,
46
,
56
]. As mentioned in introduction, to obtain such a projector, current
practices suggest that we formulate it as a Hamming space metric learning problem under binary
constraints. Pointwise [
25
,
33
,
57
], pairwise [
5
,
6
,
29
,
30
,
42
,
63
], triplet [
13
,
32
], listwise [
51
,
62
]
and prototype-based [
2
,
7
,
21
,
59
] objectives are all studied for model training, where the core idea is
to adjust similarities among samples from same / different categories.
Besides the above objectives, we should also note that hash-model optimization is NP-hard [
1
,
33
,
45
,
54
]. That is caused by the discrete, binarized hash code formulation. To tackle the issue,
previous works propose many methods that target on following purposes:
a)
Surrogate functions that
approximate hashing by continuous functions, such as tanh [6], Stochastic neuron [12], affine [43]
or Straight-Through [
3
]. By adopting these, the discrete hash-model optimization is transformed into
approximated but easier one via gradient descent.
b)
Non-differentiable optimization algorithm design,
which focuses on solutions that directly handle discrete inputs, including Coordinate descent [
41
],
Discrete gradient estimation [
16
], etc. There is also bag of tricks which help for stable training. For
example, minimizing quantization error reduces the gap between raw outputs of the model and final
hash codes [30], and code-balancing alleviates the problem of producing trivial codes [15].
Please refer to literature reviews on above works [
34
,
52
,
53
] for comprehensive studies on learning
to hash. We could see two issues exist and remain primarily unexploited in above works. The first
is objective design, which is commonly proposed by intuitive and with few theoretical guarantees.
2