
MM ’22, October 10–14, 2022, Lisboa, Portugal Weichen Yu, Hongyuan Yu, Yan Huang, & Liang Wang
Figure 1: TSNE visualization on CASIA-B test set. One color
denotes a class. The red boxes denote suboptimal represen-
tations. (a) baseline method. (b) the proposed generalized
intra-class loss.
dependent on pose estimation accuracy. And appearance-based
methods including Gait Energy Image (GEI)-based, set-based, and
3DCNN-based, extract ne-grained features which enlarge inter-
class variance. But the above approaches do not explicitly constrain
inter-class feature distribution.
To address the small inter-class variance, explicitly constraining
the inter-class feature distribution is benecial. From sample-level
perspective, some samples of the same viewpoint from dierent
classes are close to each other due to visual similarity. They need to
be emphasized to increase their distances and thus the inter-class
variance increases. However, previous gait works [
4
,
5
,
11
,
21
,
31
,
35
] usually treat pairs from dierent classes inexibly, where the
penalty strength on pair scores is restricted to be equal. From class-
level perspective, constraining the inter-class feature distribution
to be more uniform can increase inter-class variance. Previous
works [
27
,
29
,
30
,
33
,
34
,
39
,
59
] seldom have constraints on inter-
class distribution, resulting in lack of spatial symmetry, which is
not optimal in keeping maximal mutual information. Nevertheless,
margin aims to constrain the distance between classes, but prior
works treat all pedestrians equally with the same given margin
[
4
,
5
,
11
,
21
], which lacks exibility for optimization. Also, dierent
classes with the same given margin lack ability to discriminate
between each other.
To this end, we propose a generalized inter-class loss to resolve
the inter-class variance problem from both sample-level and class-
level. From sample-level perspective, the proposed generalized inter-
class loss treats dierent pairs with dynamic and automatic coe-
cients, which enables dierent inter-class samples to dynamically
adjust their distances from the anchor class.
Further, from class-level perspective, the proposed generalized
inter-class loss adds a constraint on uniformity of inter-class feature
representation and has advantages threefold. Firstly, uniformity
prefers the inter-class feature distribution that preserves maximal
information. The proposed similarity cross entropy (SimCE) in
generalized inter-class loss can be regarded as a variation of von
Mises-Fisher kernel density estimation [
14
,
16
,
54
], and forces the
inter-class feature distribution to approximate a hypersphere in
high dimension space. Thus, inter-class uniformity enables maxi-
mal inter-class variance. Secondly, the proposed loss is robust with
respect to inter-class feature representation dierences in its local
area. Thirdly, to address the xed given margin between dierent
classes, the proposed generalized inter-class loss enables automat-
ically adjusting margins between dierent classes and forces a
exible inter-class feature distribution.
Fig.1 is the TSNE visualization of test features. It can be clearly
seen that the feature distribution is more uniform, those hard exem-
plars are eectively optimized and the suboptimal representations
in red boxes are decreasing.
The contributions of the proposed method are summarized as
follows:
•
We propose a unied method to resolve the inter-class vari-
ance of gait features from both sample-level and class-level,
which dynamically and automatically adjusts the penalty
strength on pair scores and margins between dierent classes.
•
We further analyze the properties of the proposed method
from three aspects, namely inter-class hard mining, unifor-
mity and robustness of inter-class feature distribution, and
dynamic margin. And we illustrate how these properties
constrain a better inter-class feature distribution.
•
The proposed gait recognition method improves the perfor-
mance regardless of model structure. Experimental results
on public datasets CASIA-B and OUMVLP achieve state-of-
the-art performances, especially with an improvement (6.2%)
in dierent cloth (CL) condition.
2 RELATED WORKS
2.1 Gait Recognition
Gait recognition [
5
,
19
,
31
,
40
,
43
,
57
] is to learn the unique spatio-
temporal pattern about the human gait characteristics to obtain its
identity information. The gait model input is bipartite: 3D based
methods [
1
–
3
,
69
] reconstructing the human 3D models from dier-
ent cameras views, while 2D gait data [
29
,
30
,
48
] is more convenient
and easier to achieve. In early gait recognition, to deal with the
large variance in gait representation of same identity, hand-crafted
view-invariant feature [
13
,
22
,
37
] and View Transformation Model
(VTM) [
24
,
58
] are proposed. Recent deep gait recognition networks
in CNN are mostly used to capture gait information. GEInet [
46
] and
siamese gait network [
62
] work on GEI input with CNN. Temporal
information capturing includes compressing the gait sequence into
one frame using order-consistent statistic operations along tempo-
ral dimensions [
4
,
19
]. Temporal information is also captured by
LSTM or GRU to aggregate pose features in time series to generate
the nal gait feature [68].
To further improve the spatial temporal gait representation,
Zhang et al. [
65
] utilizes a temporal attention mechanism and adap-
tively adjusts the weights of dierent frames. GaitNet [
67
,
68
] and
ICDNet [
30
] emphasize disentangled representation learning. GAN
is also utilized [
7
,
60
] to generate more data and help with fea-
ture constructing. SelfGait [
39
] uses self-supervised learning to
perform gait recognition. Gait in the wild attracts researchers’ at-
tention [
63
,
71
], which focuses on real gait conditions and provides
datasets in the wild. However, most of the works above focus more
on addressing large intra-class variance and seldom consider small
inter-class variance, which is of the same importance as well.