
Figure 2:
Left
: Class-level overfitting exists between base and novel classes, and simply applying
margins to the training can not help the overall performance.
Mid
: Pattern fits base classes more as
the margin increases, making it more discriminative but less transferable.
Right
: Transferability of
patterns decreases as the margin increases, pushing classes away from each other.
Pattern fitness measured by the template-matching score.
To further verify the fitness increase,
we view each pattern as a semantic template and measure its matching score to each base class. As
analyzed in [
47
] and [
2
], each pattern can be understood as a template [
5
] for the model to match
the input (so that each class would has its own set of templates for recognition), and the activation
can be viewed as the matching score. Therefore, we could know how much all patterns fit (match)
each class by finding the most important patterns for each class and compare their activation. As
analyzed in [
47
] and [
49
], patterns (channels) with higher weights in the classification layer are more
important, and the most important ones dominate the model decisions. Therefore, given an input,
we select its most important patterns by the top classification weights of its ground-truth class, and
record the average activation on these patterns. The Mean value of such Top Activation across all
samples is denoted as MTA in Fig. 2 (mid). As can be seen, as the margin increases, MTA increases
consistently, which further verifies patterns’ increase in fitting each base class.
Better pattern fitness, worse pattern transferability.
As each pattern could fit a corresponding
base class better, its discriminability increases accordingly, but could it be transferred across classes?
To answer it, we test the transferability of patterns. Since classes are related (e.g., cat and tiger),
transferable patterns activated in one class could also be activated in other classes (e.g., felid patterns).
Therefore, we first find important patterns for each base class by the classification weights, then
record activation of these patterns on
other
classes, and measure the transferability of patterns by the
mean value of such other-class-activation. The results are plotted in Fig. 2 (right). As can be seen,
the transferability consistently decreases when the margin increases. Combine this result with Fig. 2
(mid), we hold that patterns tend to be less transferable when they fit each base class better.
Discussion.
The fitness also reflects the how much the given pattern is specific to a base class.
Imagine the extreme situation where each base-class only needs one pattern for representation, the
fitness would reach its upper bound to make such pattern thoroughly specific to the corresponding
class. Therefore, we interpret that the higher the margin is, the more specific (overfitting) the patterns
are to each base class, which makes patterns more discriminative but less transferable. Meanwhile,
the lower the margin is, the more the patterns could be shared between classes (underfitting), making
patterns more transferable but less discriminative. The CO dilemma lies in that patterns can hardly be
both class-specific and shared among classes by simply applying the classification margin.
2.3.2 Inherent Class Relations Lead to the Change in Pattern’s Base-Class Fitness
Pattern’s fitness negatively influences class relations.
In Fig. 2 (right), we also plot the class
relations w.r.t. the margins. The class relations are measured by the average of cosine similarities
between every two classes’ prototypes. As can be seen, the relation drops as the margin grows, in
consistent with the trend of the patterns’ transferability. This is rationale because the if two prototypes
share some patterns, the activation of the corresponding channels will be similar, making the cosine
similarity larger. As the transferability of patterns is negatively related to pattern’s base-class fitness,
we hold that the class relations are also negatively related to the base-class fitness.
Inherent class relations influence pattern’s fitness.
The margin applied to the classification
directly modifies the decision boundary between every two classes, and the decision boundary is
related to the relationship between every two classes. Therefore, we study how the class relation
influences the pattern’s fitness to base classes. Specifically, given 60 base classes, for the model
trained without margins, we first calculate the cosine similarity between every two different classes,
which gives 60
×
(60 - 1) / 2 = 1,750 relations denoted as
R0
, which represents the
inherent relations
between all classes. Similarly, we calculate 1,750 relations for the model trained with positive and
negative margins respectively, denoted as
Rpos
and
Rneg
. Then we calculate
Dpos =Rpos −R0
and
4