Unbiased Scene Graph Generation using Predicate Similarities Misaki OhashiYusuke Matsui

2025-05-06 0 0 1.34MB 17 页 10玖币
侵权投诉
Unbiased Scene Graph Generation using
Predicate Similarities
Misaki OhashiYusuke Matsui
Abstract
Scene Graphs are widely applied in computer vision as a graphical rep-
resentation of relationships between objects shown in images. However,
these applications have not yet reached a practical stage of development
owing to biased training caused by long-tailed predicate distributions. In
recent years, many studies have tackled this problem. In contrast, rela-
tively few works have considered predicate similarities as a unique dataset
feature which also leads to the biased prediction. Due to the feature, infre-
quent predicates (e.g., “parked on”, “covered in”) are easily misclassified
as closely-related frequent predicates (e.g., on”, “in”). Utilizing pred-
icate similarities, we propose a new classification scheme that branches
the process to several fine-grained classifiers for similar predicate groups.
The classifiers aim to capture the differences among similar predicates in
detail. We also introduce the idea of transfer learning to enhance the
features for the predicates which lack sufficient training samples to learn
the descriptive representations. The results of extensive experiments on
the Visual Genome dataset show that the combination of our method
and an existing debiasing approach greatly improves performance on tail
predicates in challenging SGCls/SGDet tasks. Nonetheless, the overall
performance of the proposed approach does not reach that of the current
state of the art, so further analysis remains necessary as future work.
1 Introduction
Scene graphs describe objects that appear image data and their relationships in
the image. Generally, scene graph generation (SGG) is divided into three stages,
including object detection, object classification, and relationship classification.
Scene graphs comprehensively capture the content of image scenes. Hence, they
can be applied to high-level and wide-ranging practical tasks, including visual
question answering [2, 3, 4], image captioning [5, 6, 7], and image retrieval [8, 9].
The relationship classification stage in SGG typically involves class imbal-
ance problems in the most widely-used Visual Genome dataset [10]. As shown
in Fig. 1, the number of training samples for “on” is about 50 times higher than
standing on”. A model trained with such an imbalanced dataset is more likely
to predict a few frequent predicates (e.g., “on”, “in”) against many infrequent
predicates (e.g., “lying on”, “covered in”). Hereafter, we refer to frequent and
infrequent predicates as head and tail predicates, respectively.
The University of Tokyo
1
arXiv:2210.00920v1 [cs.CV] 3 Oct 2022
(d) Prediction by [1]
(b) Ground Truth
man
bike
pant
sidewalk
walking on
street
parked on
wearing
(a) Input Image
(c) Label Distribution
man
bike
pant
sidewalk
on
street
on
wearing
Label
Similarities
<on walking on>
<on parked on>
on
has
weraing
of
in
near
behind
wiith
holding
above
under
wears
sitting on
in front of
riding
standing on
at
attached to
over
carrying
walking on
for
looking at
watching
hanging from
and
belonging to
parked on
laying on
between
Predicate
0.0
0.1
Frequency
0.2
0.3
Figure 1: The class imbalance problem in scene graph generation. (a) An input
image. (b) Ground-truth scene graph. (c) Frequency distribution of training
samples for top-30 most frequent labels. (d) Affected by label similarities and
imbalanced data distribution, the prediction results of an earlier method [1]
misclassify some descriptive predicates as “on”.
Existing unbiased methods [11, 12, 13, 14, 15, 16, 17, 18, 19] have focused
on the long-tailed distribution in the dataset. However, few works have focused
on another unique dataset feature, predicate similarities, which are also an im-
portant cause of the biased predictions. In contrast to general classification
tasks, the dataset includes many semantically similar predicates. These simi-
larities make distinguishing between heads and tails challenging and encourage
misclassification of tail predicates as more predictable head predicates. Because
head predicates are less descriptive than tail predicates, the graphs with heads
are less informative and more impractical. For example, Fig. 1 (b)(c) show that
the behavior “walking on” and the state “parked on” are all predicted as “on”,
resulting in the ambiguous description of the image content. Scene graphs that
represent limited visual information typically perform poorly in applications
to high-level tasks. Therefore, SGG models should be developed to predict as
specific a predicate as possible based on the subjects represented in image.
In this study, we propose a new relation predictor that utilizes the predi-
cate similarities of the dataset. Conventional all-class classifiers consider only
significant differences between dissimilar predicates. In contrast, our proposed
predictor consists of several independent fine-grained classifiers, each focusing
on slight differences between semantically similar predicates. The proposed ap-
proach is designed to recognize tail predicates that conventional classifiers tend
to misclassify as similar head predicates.
Furthermore, inspired by earlier work [14], we adopt a knowledge transfer
module for better representation learning. It enhances poorly learned features of
tail predicates by transferring the features of heads learned with sufficient sam-
ples. In contrast to the previous method [14], we transfer the knowledge within
2
similar predicates rather than all predicates. Because each fine-grained classifier
targets specific similar predicates, features would be noisy if the knowledge from
all predicates were incorporated, including dissimilar ones.
The contributions of this study are summarized as follows.
We propose a method to handle the long-tail distribution and semantic
similarities of predicate labels by combining a similarity-based branching
scheme and a knowledge transfer module.
The proposed method effectively improves the tails’ prediction. In par-
ticular, when combined with an existing debiasing inference method, it
achieved the best recall on the challenging SGCls/SGDet tasks.
Although our approach improved the accuracy of tail labels, its overall
performance was lower than the current state of the art, especially for a
relatively easy task (PredCls). Further analysis remains as future work.
2 Related Work
2.1 Imbalanced Classification
In recent years, three primary methods have been applied to perform classifica-
tion tasks involving long-tailed datasets.
Data re-balancing is a classical approach that adjusts the amount of data
to achieve a more balanced distribution. This method includes over-sampling
for minority classes [20, 21] and under-sampling for major classes [22]. Over-
sampling is prone to over-fitting for the tail classes, whereas undersampling
discards most data, a considerable portion of the data, which makes it difficult
to apply to highly imbalanced datasets.
Cost-sensitive re-weighting assigns different loss weights based on the number
of classes or samples. Commonly used methods include weighting classes pro-
portionally to the inverse of the class frequency [23, 24] or the inverse square root
of the frequency [25, 26]. In recent years, Cui et al. [27] proposed re-weighting
by an inverse effective number of samples, and Lin et al. [28] introduced sample-
level re-weighting.
Transfer learning involves transferring features learned from head classes
with abundant samples to tail classes that are learned insufficiently. Liu et
al. [29] introduced dynamic meta-embedding to exchange visual knowledge be-
tween heads and tails by combining a direct image feature and associated mem-
ory representations.
2.2 Scene Graph Generation
In the first stage of SGG, an object detector (e.g., Faster R-CNN [30]) detects
several objects in an image. As the next step, object classification is performed
after encoding the detections from the first stage into object contextual infor-
mation. In most studies, the contexts are incorporated by message passing al-
gorithms such as graph attention networks [31], LSTM [1], and TreeLSTM [32].
Finally, the relationships among detected objects are predicted with a module
similar to object classification.
3
Many studies [11, 12, 13, 14, 15, 16, 17, 18, 19] have proposed various meth-
ods to deal with the class imbalance problem since Chen et al. [33] and Tang
et al. [32] proposed the more balanced mean recall metrics. Tang et al. [13]
adopted a counterfactual approach in making inferences to remove a context
co-occurrence bias. Chiou et al. [19] recovered the unbiased probabilities from
biased probabilities by label frequencies estimated dynamically in training. Also,
recent works have adopted general ideas to address tackle long-tailed issues, as
shown in Sec. 2.1. Li et al. [18] proposed bi-level data resampling, including
image-level oversampling and instance-level undersampling. Moreover, task-
specific loss functions and weighting methods have also been proposed. Yan
et al. [15] introduced loss re-weighting by an inverse of a degree of predicate
correlations. Yu et al. [16] proposed a loss for a hierarchical cognitive structure
to support coarse-to-fine classification. Suhail et al. [17] adopted a loss for-
mulation using an energy-based model for structured learning of scene graphs.
Furthermore, He et al. [14] applied the approach of transfer learning to SGG
tasks.
These recent works [11, 12, 13, 14, 15, 16, 17, 18, 19] have improved SGG per-
formance, but few studies have addressed predicate similarities in the dataset.
Yan et al. [15] mentioned the feature but focused on predicates having weak
correlations with others, and thereby did not directly take advantage of the
relationship between similar predicates. Yu et al. [16] adopted a similar focus
to that of the present work, but their method only considers parent-children
relationships among predicates, whereas the proposed method does not limit to
such hierarchical similarities.
3 Proposed Approach
Scene graph generation tasks involve generating a graph representation com-
prising objects and the visual relationships among them shown in a given input
image. In particular, we aim to address the biased relationship classification
caused by imbalanced predicate distributions and semantic overlaps among the
predicates. To this end, we introduce a classification strategy which focuses on
predicate similarities and utilizes the idea of transfer learning. In this section,
we first present the problem setting in Sec. 3.1. We then explain the details of
our proposed predictor in Sec. 3.2. Fig. 2 shows an overview of the model.
3.1 Problem Setting
We first detect object candidates using a standard object detector such as Faster
R-CNN [30]. Given an image I, the detector outputs Nbounding boxes B=
{bi}N
i=1 R4. Each box also includes an ROIAlign feature [34] and a tentative
object label such as “dog” and “man”. We then refine these features with
a message-passing module for the final object classification and relationship
classification.
Relationship classification is then performed as follows. Given a pair of
bounding boxes, a relation predictor classifies the pair from a set of Apredicate
labels (e.g., “on”, “in”) denoted as A={1,2, . . . , A}. Here, for each pair of
bounding boxes, we have three input features, e,z,and u(see an example in
Fig. 2). A P-dimensional pairwise relation feature eRPis obtained from
4
摘要:

UnbiasedSceneGraphGenerationusingPredicateSimilaritiesMisakiOhashi*YusukeMatsui*AbstractSceneGraphsarewidelyappliedincomputervisionasagraphicalrep-resentationofrelationshipsbetweenobjectsshowninimages.However,theseapplicationshavenotyetreachedapracticalstageofdevelopmentowingtobiasedtrainingcausedby...

展开>> 收起<<
Unbiased Scene Graph Generation using Predicate Similarities Misaki OhashiYusuke Matsui.pdf

共17页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:17 页 大小:1.34MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 17
客服
关注