Learning by Asking Questions for Knowledge-based Novel Object
Recognition
Kohei Uehara
The University of Tokyo
uehara@mi.t.u-tokyo.ac.jp
Tatsuya Harada
The University of Tokyo
RIKEN
harada@mi.t.u-tokyo.ac.jp
Abstract
In real-world object recognition, there are
numerous object classes to be recognized.
Conventional image recognition based on su-
pervised learning can only recognize object
classes that exist in the training data, and
thus has limited applicability in the real world.
On the other hand, humans can recognize
novel objects by asking questions and acquir-
ing knowledge about them. Inspired by this,
we study a framework for acquiring exter-
nal knowledge through question generation
that would help the model instantly recog-
nize novel objects. Our pipeline consists of
two components: the Object Classifier, which
performs knowledge-based object recognition,
and the Question Generator, which generates
knowledge-aware questions to acquire novel
knowledge. We also propose a question gen-
eration strategy based on the confidence of
the knowledge-aware prediction of the Ob-
ject Classifier. To train the Question Gen-
erator, we construct a dataset that contains
knowledge-aware questions about objects in
the images. Our experiments show that the pro-
posed pipeline effectively acquires knowledge
about novel objects compared to several base-
lines.
1 Introduction
Object category recognition has long been a central
topic in computer vision research. Traditionally,
object recognition has been addressed by super-
vised learning using a large dataset of image-label
pairs (Deng et al.,2009). However, with supervised
approaches, the model can only recognize a frozen
set of object classes and is not suitable for real-
world object recognition, where numerous object
classes exist. Recently, image recognition methods
based on contrastive learning using image-text pair
datasets have emerged (Radford et al.,2021;Jia
et al.,2021). By training on hundreds of millions
of image-text pairs, these models have acquired
remarkable zero-shot recognition capabilities for
a wide variety of objects. However, these mod-
els can recognize objects that commonly appear
in the pre-training dataset but are not as effective
for rare objects (Shen et al.,2022). Collecting new
data and retraining the entire model to make these
models recognize novel objects is impractical con-
sidering the cost of data collection and computation.
Therefore, it is essential to develop a method that
enables the model to recognize novel objects while
maintaining low data collection costs and avoiding
model retraining as much as possible.
When humans acquire knowledge about the
world, asking questions and explicitly ac-
quiring knowledge are important skills in-
volved (Chouinard et al.,2007;Ronfard et al.,
2018). Inspired by this, we explored methods to
dynamically increase knowledge in image recogni-
tion by asking questions. This approach has several
advantages over the traditional supervised learning
method: (1) it requires only a small amount of data
to acquire knowledge because the system acquires
only the knowledge it needs, and (2) it has a low
data collection cost because the system itself seeks
the required data.
We propose a pipeline consisting of a knowledge-
based object classifier (OC) and a question gener-
ator (QG) for knowledge acquisition. Following
previous research on structured knowledge (Ji et al.,
2022), we represent knowledge as a knowledge
triplet, that is, a list of three words or phrases: head,
relation, and tail, such as
h
dog, IsA, mammal
i
. We
train the OC to retrieve knowledge from knowledge
sources, which outputs the corresponding head in
the knowledge source as the predicted object class
(e.g.,
〈
IsA, mammal
〉→
dog). The QG model then
generates questions to add new knowledge to the
knowledge source for novel object recognition. In
the QG model, we use two modes in question gen-
eration:
confirmation
and
exploration
, as illus-
trated in Figure 1. First, “confirmation” is used
when the unknown object is relatively close to a
arXiv:2210.05879v1 [cs.CV] 12 Oct 2022