Supervised Metric Learning to Rank for Retrieval via Contextual Similarity Optimization

2025-05-02 0 0 4.2MB 33 页 10玖币

侵权投诉

Supervised Metric Learning to Rank for Retrieval via

Contextual Similarity Optimization

Christopher Liao 1Theodoros Tsiligkaridis 2Brian Kulis 1

Abstract

There is extensive interest in metric learning

methods for image retrieval. Many metric

learning loss functions focus on learning a

correct ranking of training samples, but strongly

overﬁt semantically inconsistent labels and

require a large amount of data. To address these

shortcomings, we propose a new metric learning

method, called contextual loss, which optimizes

contextual similarity in addition to cosine

similarity. Our contextual loss implicitly enforces

semantic consistency among neighbors while

converging to the correct ranking. We empirically

show that the proposed loss is more robust to

label noise, and is less prone to overﬁtting even

when a large portion of train data is withheld.

Extensive experiments demonstrate that our

method achieves a new state-of-the-art across

four image retrieval benchmarks and multiple

different evaluation settings. Code is available

at:

https://github.com/Chris210634/

metric-learning-using-contextual-

similarity

1. Introduction

Image retrieval refers to learning a ranking of instances from

a gallery set relative to a query image such that the highest

ranked instances are the most relevant to the query. Several

real-world applications are powered by this technology, such

as person re-identiﬁcation (Ye et al.,2021), face recognition

(Guillaumin et al.,2009), vehicle re-identiﬁcation (Chu

et al.,2019), landmark retrieval (Weyand et al.,2020), and

product retrieval (Cakir et al.,2019). Current metric learning

techniques often use a dataset with single discrete labels for

Department of Electrical and Computer Engineering, Boston

University

MIT Lincoln Laboratory. Correspondence to:

Christopher Liao

cliao25@bu.edu

, Theodoros Tsiligkaridis

<ttsili@ll.mit.edu>, Brian Kulis <bkulis@bu.edu>.

Proceedings of the

40 th

International Conference on Machine

Learning, Honolulu, Hawaii, USA. PMLR 202, 2023. Copyright

2023 by the author(s).

Figure 1.

Examples of metric learning labels which are inconsistent

with semantic information from two standard benchmarks: CUB

(top) and SOP (bottom). These labels are caused by a visual feature

which is not present or barely visible.

supervision, and train an embedding space where images

with the same label are closer together than images with

different labels. However, binary supervision is unreliable,

since it does not capture the complexity of relationships in

the data. Furthermore, methods which overly rely on the

binary supervision can be brittle in the presence of noise,

since the supervision is either correct or incorrect. Multi-

label datasets (Ranjan et al.,2015) mitigate this problem, but

can be expensive to procure, so developing a metric learning

method that is robust to label noise and generalizable to test

data is a challenging yet important problem.

Existing image retrieval approaches fall into two main cat-

egories: classiﬁcation and pairwise ranking losses. Clas-

siﬁcation losses optimize a classiﬁer on top of the embed-

ding layer and discard the classiﬁer at the end of train-

ing. Pairwise ranking losses train the embedding layer

directly by pulling together pairs of samples with the same

label and pushing apart pairs of samples with different la-

bels. Pairwise ranking methods include losses which explic-

itly optimize a ranking metric such as AP (average preci-

sion) surrogates: Fast-AP (Cakir et al.,2019), Smooth-AP

(Brown et al.,2020), Blackbox AP (Rol

ınek et al.,2020)

and Roadmap (Ramzi et al.,2021). They also include the

standard contrastive, triplet and multi-similarity (MS) losses

(Wang et al.,2019).

Empirically, classiﬁcation methods, such as proxy anchor

(Kim et al.,2020), proxy NCA (Teh et al.,2020), and HIST

(Lim et al.,2022) perform well on small benchmark datasets,

arXiv:2210.01908v3 [cs.LG] 2 Jun 2023

Supervised Metric Learning to Rank for Retrieval via Contextual Similarity Optimization

Figure 2.

Comparison of our contextual loss with popular metric

learning losses. We plot the test R@1 accuracy against the train

R@1 accuracy over the course of training on the CUB and Cars

benchmarks. The dashed line tracks the R@1 values over the

course of training, and the star indicates the R@1 values at the end

of training. Compared to baselines, the contextual loss achieves

higher test R@1 at the expense of lower train R@1.

while multi-similarity and AP surrogates perform well on

large datasets. This general trend is supported by our main

results in Section 5. We hypothesize that pairwise ranking

methods tend to overﬁt the training labels while sacriﬁcing

semantic consistency of the embedding space. This can be

possible even if the labels are “correct”, as Figure 1illus-

trates. For instance, pulling apart samples of white-necked

ravens from common ravens would likely lead to overﬁt-

ting, since the distinguishing visual attribute is absent from

the images. To address this issue, we propose to optimize

contextual similarity in addition to cosine similarity. The

resulting loss function implicitly regularizes the embedding

space for semantic consistency among neighbors (see re-

sults in Section 4). Figure 2clearly shows that our method

reduces overﬁtting, since we achieve the best test R@1 accu-

racy despite lower train R@1 accuracy than some baselines.

Results in Section 5 show that our method outperforms all

baselines across all standard benchmarks in terms of R@1

accuracy.

Contextual similarity is a widely used evaluation-time tech-

nique to boost retrieval accuracy. In simple terms, the con-

textual similarity is the fraction of neighbors two samples

have in common in embedding space. Intuitively, two sam-

ples are more likely to share the same label if they have

many neighbors in common, regardless of their cosine simi-

larity. Many retrieval frameworks (Zhong et al. (2017), Cao

et al. (2020)) use a combination of cosine similarity and con-

textual similarity for evaluation, but only explicitly optimize

the cosine similarity when training. In this paper, we pro-

pose to explicitly optimize both similarities, since contextual

similarity captures crucial semantic information. In another

line of work, some unsupervised metric learning methods

such as STML (Kim et al.,2022) use contextual similarity

to estimate the true similarity between unlabeled samples.

Inspired by STML, we show that optimizing contextual sim-

ilarity directly in the supervised setting is beneﬁcial. As far

as we know, we are the ﬁrst to treat contextual similarity as

a loss function for supervised learning. This is non-trivial

since contextual similarity involves non-differentiable count-

ing operations, and as a consequence, is not amenable to

off-the-shelf optimization techniques. We propose a sim-

ple but effective optimization strategy in Section 3, using

heuristic gradients. We analytically justify this optimization

approach in Section 4.1.

Our contributions are as follows:

We introduce the contextual loss, which establishes a

new state-of-the-art across all standard image retrieval

benchmarks, even when compared to more compli-

cated (e.g. Metrix, HIST and AVSL) and less scalable

methods (AP surrogates).

Our contextual loss mitigates overﬁtting by implicitly

enforcing semantic consistency among neighbors in

the embedding space. As a result, we achieve a 4%

improvement in R@1 accuracy over baselines in the

presence of label noise.

We conduct an extensive experimental study of our

method and several popular baselines. This includes

empirical results across ﬁve different benchmarks, two

different experimental settings, accompanied by a com-

prehensive ablation study. In addition, we tune base-

lines extensively to promote fair comparisons.

The optimization of non-differentiable steps in the loss

calculation may be of interest to some readers.

This paper is organized as follows. Section 2 summarizes

related work. Section 3 states our method, including how

we optimize the non-differentiable steps in calculating con-

textual similarity. Section 4.1 checks that minimizing the

contextual loss corresponds to learning the correct rank-

ing of samples and that the proposed optimization pro-

cedure converges. The rest of Section 4 explores why

the proposed contextual loss is less prone to overﬁtting

than other pairwise ranking losses by analyzing gradients

and running targeted experiments. Section 5 and the Ap-

pendix present an extensive experimental study. Code is avail-

able at:

https://github.com/Chris210634/metric-

learning-using-contextual-similarity

2. Related Work

Classiﬁcation Methods We refer to any method which op-

timizes class centroids in conjunction with embeddings as a

classiﬁcation method. These methods scale with the number

of classes in the training set and are usually sensitive to the

Supervised Metric Learning to Rank for Retrieval via Contextual Similarity Optimization

learning rate of class centroids (Teh et al.,2020). Classiﬁca-

tion losses have traditionally performed well on small met-

ric learning benchmarks; these include normalized-softmax

(Zhai & Wu,2018), arcface (Deng et al.,2019), proxy NCA

and proxy anchor. More recently, IBC (Seidenschwarz et al.,

2021) and HIST (Lim et al.,2022) report an improvement in

R@1 when learning a graph neural network in conjunction

with class centroids. However, even with these additional

tricks, classiﬁcation methods lag behind pairwise methods

on larger benchmarks.

Pairwise Ranking Methods Pairwise ranking losses in-

clude the contrastive loss (Hadsell et al.,2006), triplet loss

(Weinberger et al.,2005) (Wu et al.,2017), multi-similarity,

and AP surrogates (cited in previous section). Despite be-

ing more than a decade old, contrastive and triplet losses

remain the go-to method for metric learning, and Musgrave

et al. (2020a) show that they are comparable in perfor-

mance to many recent methods. Multi-similarity includes

a hard pair mining scheme that is effectively learning to

rank. AP maximization methods explicitly learn to rank

samples within a mini-batch. AP maximization is chal-

lenging because it involves back-propagating through the

non-differentiable heaviside function, similar to the current

work. As a workaround, Fast-AP uses soft-binning; Smooth-

AP uses a low-temperature sigmoid; Roadmap uses an upper

bound on the heaviside instead of an approximation. We ﬁnd

that using heuristic gradients works better for optimizing

contextual similarity.

Unsupervised Metric Learning The concept of contextual

similarity is extensively studied in the unsupervised metric

learning literature, mainly in the context of person re-ID

(see survey (Ye et al.,2021)). Most unsupervised person

re-ID methods use the

-reciprocal re-rank distance (Zhong

et al.,2017), which is a weighted combination of Euclidean

distance and Jaccard distance between reciprocal-neighbor

sets, calculated over the entire dataset. More recently, STML

(Kim et al.,2022) proposes an unsupervised metric learning

framework for image retrieval using a simpler batch-wise

contextual similarity measure. We loosely follow STML’s

contextual similarity deﬁnition, making signiﬁcant changes

to accommodate the change in problem setting and to ad-

dress optimization issues (these changes are enumerated in

Appendix E.1). We emphasize that prior work on contex-

tual similarity optimizes the cosine similarity towards the

contextual similarity, focusing on the unsupervised scenario,

while our work optimizes the contextual similarity towards

the true similarity, requiring full supervision.

Robust Metric Learning Over-reliance on binary super-

vision is a long-standing problem in metric learning. Many

studies overcome this issue by taking advantage of the hi-

erarchical nature of labels in metric learning datasets. Sun

et al. (2021), Zheng et al. (2022), and Ramzi et al. (2022) ex-

plicitly use hierarchical labels for training. These methods

assign a higher cost to mistakes in discriminating labels that

are farther apart in the hierarchy, leading to a more robust

embedding space. Yan et al. (2021) propose to generate syn-

thetic hierarchical labels for unsupervised metric learning,

and Yan et al. (2023) extend this idea to metric learning with

synthetic label noise. These two works use a hyperbolic

embedding space to better capture hierarchical relationships

(Khrulkov et al.,2020). Ermolov et al. (2022) show that

simply using a hyperbolic embedding space instead of a

Euclidean embedding space improves metric learning per-

formance. Our work has a similar motivation to the above

hierarchical and hyperbolic metric learning works, but we

use contextual similarity instead of hierarchical labels to

mitigate label inconsistency. Appendix I.4 contains some

results on hierarchical retrieval metrics.

3. Method

Notation Denote the normalized output of the embedding

network as

fi∈Rd

sij =⟨fi, fj⟩ ∈ [−1,1]

denotes

the cosine similarity between the samples

and

. There

are

samples in a mini-batch. We always use balanced

sampling, where

images are selected from

n/k

randomly

sampled labels.

is divisible by

is divisible by 2, but

we always use

k≥4

in experiments.

yij ∈ {0,1}

denotes

the true similarity between

and

, deﬁned as

yij = 1

if samples

and

share the same label and

otherwise.

We use uppercase letters to denote matrices, math script to

denote sets, and lowercase letters to denote scalars.

and

are reserved for sample indices.

is used to denote the

binary indicator matrix for set

. For instance, let

N(i)

denote the set of neighbors to sample

, then

N(i, j)=1

if j∈ N (i), and 0otherwise.

Contextual Similarity Deﬁnition We loosely follow the

deﬁnition of contextual similarity proposed in STML (Kim

et al.,2022), with signiﬁcant modiﬁcations to accommodate

the change in problem setting and to address optimization

issues (these modiﬁcations are enumerated in Appendix

E.1). In this section, we present the similarity deﬁnition

using indicator matrices in order to show an efﬁcient im-

plementation in PyTorch. Note that the binary “and” is

replaced by multiplication for differentiability. Algorithm 1

contains PyTorch-like pseudo-code for Equations 1-7. We

include the code here for reproducibility and to show that

our contextual loss can be compactly implemented despite

the cumbersome mathematical notation.

We denote the contextual similarity between samples

and

wij

. The matrix with entries

wij

is entirely a function

of the cosine similarity matrix with entries

sij

. The goal of

Equations 1-4is to calculate

wij

in terms of

sij

. This is

implemented as

get contextual similarity

in Algo-

rithm 1. For readability, we present the

wij

calculation as

Supervised Metric Learning to Rank for Retrieval via Contextual Similarity Optimization

Algorithm 1 Pseudo-code, PyTorch-like

# Hyperparameters: alpha, k, eps, s_tilde, lam, gamma

# The symbol '@' means matrix multiplication in Python

# Note: In PyTorch, set keepdim=True when calling sum(.)

class GreaterThan(autograd.Function):

# Implements theta with heuristic gradient

def forward(x, y):

return (x >= y).float()

def backward(g): # Returns gradient w.r.t (x, y)

return g*alpha, - g *alpha

def get_contextual_similarity(s, k, eps):

D=2-2*s# Squared Euclidean distance

Dk = -(-D).topk(k).values[:,-1:] # Distance to k-th neighbor

Nk_mask = GreaterThan(-D + eps, -Dk.detach())

M_plus = (Nk_mask @ Nk_mask.T) / Nk_mask.sum(dim=1).detach()

Nk_mask_not = 1- Nk_mask

M_minus = (Nk_mask_not @ Nk_mask_not.T)

/ Nk_mask_not.sum(dim=1).detach()

W_1 = 0.5 *(M_plus + M_minus) *Nk_mask

# Distance to k/2-th neighbor

Dk_over_2 = -(-D).topk(k//2).values[:,-1:]

Nk_over_2_mask = GreaterThan(-D + eps, -Dk_over_2.detach())

Rk_over_2_mask = Nk_over_2_mask *Nk_over_2_mask.T

W_2 = (Rk_over_2_mask @ W_1) / Rk_over_2_mask.sum(dim=1)

return 0.5 *(W_2 + W_2.T)

for data, labels in loader:

f = F.normalize(model(data)) # normalized embeddings

s=f@f.T # cosine similarity matrix

y = (labels.T == labels) # true similarity matrix

w = get_contextual_similarity(s, k, eps) # matrix

I_neg = 1- eye(w.shape[0]) # ones with zeros on diagonal

L_contrast = contrastive(s, y) # Standard, code omitted

L_reg = (s.mean() - s_tilde).square()

L_context = ((w - s).square()*I_neg).mean()

loss = lam *L_context + (1-lam) *L_contrast + gamma *L_reg

loss.backward()

optimizer.step()

three sequential steps.

Step 1 Neighborhood Calculation The ﬁrst step calculates

a binary matrix

Nk+ϵ(i, j)

indicating whether sample

a neighbor of

. This binary value can be thought of as a

preliminary prediction of

yij

. The neighborhood indicator

calculation can be deﬁned in terms of the heaviside function,

which has no gradient. We set a constant positive gradient

in the backward pass, which is reasonable since

is a (non-

strictly) increasing function.

Forward: θ(x)=1,if x≥0 ; 0 otherwise.

Backward: ∂θ(x)

∂x =α. (1)

Let

D(i, j)

denote the squared Euclidean distance between

samples

and

. By deﬁnition,

D(i, j) = 2 −2sij

and

D(i, j)∈[0,4]. The sg operator denotes stop gradient.

Using

, we calculate the indicator function for whether

sample jis in the k+ϵneighborhood of sample i:

Nk+ϵ(i, j) = θ(−D(i, j) + sg(D(i, p)) + ϵ),

where pdenotes the k-th closest neighbor of i.(2)

In words,

Nk+ϵ(i, j)

is a binary value indicating whether

or not

D(i, j)≤D(i, p) + ϵ

. By convention, the sample

itself is always included in the closest neighbor count (e.g.

k= 2

, then the “

-th closest neighbor” is the closest

neighbor to a sample). We now proceed to calculate the

intersection of neighborhood sets.

Step 2 Intersection of Neighborhoods This step reﬁnes the

similarity prediction by counting the number of neighbors

two samples have in common. Intuitively, samples with the

same label should have a similar set of neighbors.

W1(i, j) = Nk+ϵ(i, j)

2·

M+(i, j)

sg PpNk+ϵ(i, p)+M−(i, j)

sg Pp1−Nk+ϵ(i, p)!,

where M+(i, j) =

p=1

Nk+ϵ(i, p)Nk+ϵ(j, p),

and M−(i, j) =

p=1 1−Nk+ϵ(i, p)1−Nk+ϵ(j, p).

(3)

W1(i, j)∈[0,1]

is an intermediary similarity value.

M+(i, j)

counts the number of neighbors

and

have in

common.

M−(i, j)

counts the number of non-neighbors

and

have in common. Appendix E.3 Figure 12 explains

why both

M+

and

M−

are necessary. The normalization

factors in Eq. 3ensure that the similarity value is between 0

and 1. We do not backpropagate gradients through the nor-

malization factors, because it is undesirable to optimize the

number of samples in the neighborhood set. As further justi-

ﬁcation for the stop gradient, note that

PpNk+ϵ(i, p) = k

for any i, p when ϵ= 0.

Step 3 Query Expansion This ﬁnal step further reﬁnes the

similarity prediction by averaging

across close neigh-

bors (known as query expansion, see Arandjelovi

c & Zisser-

man (2012)).

Rk/2+ϵ(i, j) = Nk/2+ϵ(i, j)Nk/2+ϵ(j, i)

W2(i, j) = PpRk/2+ϵ(i, p)W1(p, j)

PpRk/2+ϵ(i, p)

wij =1

2(W2(i, j) + W2(j, i))

(4)

Rk/2+ϵ(i, j)

is a binary value which equals 1 if

is a

k/2 + ϵ

neighbor of

and

is a

k/2 + ϵ

neighbor of

, 0

otherwise. This type of reciprocal relationship is widely

used in the retrieval literature, most notably by Zhong et al.

(2017).

W2(i, j)

is an intermediary similarity value repre-

senting the entries of

averaged over the smaller

Rk/2+ϵ

neighborhood.

is then symmeterized to yield the ﬁnal

contextual similarity values wij ∈[0,1].

Loss Function We use the MSE loss to optimize

wij

against

Supervised Metric Learning to Rank for Retrieval via Contextual Similarity Optimization

the true similarity labels yij :

Lcontext =1

n2X

i,j|i̸=j

(yij −wij )2(5)

Our ﬁnal loss function

Lours

is a sum of three loss functions:

Lours =λLcontext + (1 −λ)Lcontrast +γLreg (6)

Lreg =

˜s−1

i,j

sij 



(7)

Lcontrast

is the standard contrastive loss (see Appendix E.3

Eq. 29). In our work,

Lcontrast

is best viewed as a regularizer

that reduces the decomposability gap between the batch-

wise contextual loss and the contextual loss over the entire

dataset. We justify this interpretation in Appendix D Fig.

Lreg

is a similarity regularizer that encourages the model

to use the entire embedding space by pushing the average

cosine similarity between all pairs towards the constant ˜s.

Remarks The parameter

wij

is a function of

sij

, so all

three components of our loss function in Eq. 6optimize the

cosine similarity matrix with entries

sij

. However

Lcontext

the main contribution of the current work, and experiments

verify that most of the improvement over baselines can be

attributed to this contextual loss. The value of

is not arbi-

trary; it must be set to the number of samples per label in

the mini-batch. Although the time and space to calculate

the contextual loss scales as

O(n3)

, all operations are imple-

mented as matrix multiplication, which is highly optimized

on modern hardware. Appendix C Figure 8shows that the

cubic scaling is negligible for all practical batch sizes.

Hyperparameters

controls the magnitude of the heavi-

side gradient. Tuning

is unnecessary, since it is redundant

with the learning rate.

is the desired similarity margin

between positive and negative samples.

is analogous to

the margin parameter on the triplet and multi-similarity loss.

δ+

and

δ−

(Appendix E.3 Eq. 29) are the positive and nega-

tive margins resp. for the contrastive loss.

˜s

is the desired

average cosine similarity between all pairs.

and

control

the relative weighting between the three losses. The choice

(1 −λ)

for the weight on the contrastive loss instead of a

separate hyperparameter is completely arbitrary, as tuning

the contrastive loss weight separately would be redundant

with tuning the learning rate.

4. Analysis

This section discusses intuition behind the contextual loss in

Eq. 5. Section 4.1 provides empirical evidence that

Lcontext

converges and shows that

Lcontext = 0

coincides with the

correct ranking of samples. Sections 4.2 - 4.4 carefully

justify the semantic consistency argument outlined in the

introduction.

Figure 3.

Left: plot of contextual loss value vs. 1-mAP (mean

AP) over the course of training on CUB for different choices of

Training proceeds from upper right to bottom left. Observe that

1-mAP decreases as contextual loss decreases. This shows that

Lcontext

is a valid surrogate for learning to rank. Right: convergence

plot of

Lcontext

on CUB without mini-batching.

Lcontext

decreases

almost monotonically when the contextual loss is minimized (

λ=

), while there is a large amount of noise when the contrastive loss

is minimized (λ= 0). γ= 0.

4.1. Contextual Loss and Optimization

Proposition 4.1. For a batch of size

with exactly

sam-

ples from each class (

divisible by

n > 2k

, and

k≥2

assuming that

ϵ= 0

Lcontext = 0

if and only if all sam-

ples are correctly ranked with respect to every other sam-

ple within the batch, i.e.

sip > sij ,∀p, j where yij =

0and yip = 1,∀i∈[1, n].

We defer the proof to Appendix B. This Proposition shows

that

Lcontext

is a valid ranking objective, similar to AP surro-

gates, multi-similarity, and triplet losses. Note that Proposi-

tion 4.1 does not hold for

Lcontrast

, since

Lcontrast

continues to

provide gradients up to ﬁxed margins, regardless of whether

the correct ranking is satisﬁed. Figure 3(left) shows that

Lcontext

is approximately a linearly scaled version of 1-mAP

over the course of training. Figure 3(right) suggests that the

value of

Lcontext

converges when optimized using gradient

descent. The Appendix contains more empirical evidence

that the value of

Lcontext

converges (Fig. 9and 15 ). Figure

4justiﬁes the choice of heuristic gradient in Eq. 1. In this

simple 2-D example, the gradient is always positive and

non-zero in the direction away from the minimum, until the

minimum is reached.

4.2. Intuition

In the previous subsection, we proved that the minimum of

Lcontext

corresponds to a correct ranking of samples within

a batch. We also showed that the value of

Lcontext

converges

empirically. However, we still need some intuition as to why

gradients from

Lcontext

work better than simple pair-wise

contrastive loss functions. This discussion will naturally

lead to the semantic consistency intuition promised at the

beginning of the paper. Let us start by asking: what is the

value of optimizing the intersection of neighborhood sets in

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

SupervisedMetricLearningtoRankforRetrievalviaContextualSimilarityOptimizationChristopherLiao1TheodorosTsiligkaridis2BrianKulis1AbstractThereisextensiveinterestinmetriclearningmethodsforimageretrieval.Manymetriclearninglossfunctionsfocusonlearningacorrectrankingoftrainingsamples,butstronglyoverfitsem...

展开>> 收起<<

Supervised Metric Learning to Rank for Retrieval via Contextual Similarity Optimization.pdf

共33页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Supervised Metric Learning to Rank for Retrieval via Contextual Similarity Optimization

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: