Merging Classiﬁcation Predictions with Sequential Information for Lightweight Visual Place Recognition in Changing Environments Bruno Arcanjo1 Bruno Ferrarini1 Michael Milford2 Klaus D. McDonald-Maier1and Shoaib Ehsan1

2025-05-02 1 0 4.49MB 8 页 10玖币

侵权投诉

Merging Classiﬁcation Predictions with Sequential Information for

Lightweight Visual Place Recognition in Changing Environments

Bruno Arcanjo1, Bruno Ferrarini1, Michael Milford2, Klaus D. McDonald-Maier1and Shoaib Ehsan1

Abstract— Low-overhead visual place recognition (VPR) is a

highly active research topic. Mobile robotics applications often

operate under low-end hardware, and even more hardware

capable systems can still beneﬁt from freeing up onboard system

resources for other navigation tasks. This work addresses

lightweight VPR by proposing a novel system based on the

combination of binary-weighted classiﬁer networks with a one-

dimensional convolutional network, dubbed merger. Recent

work in fusing multiple VPR techniques has mainly focused on

increasing VPR performance, with computational efﬁciency not

being highly prioritized. In contrast, we design our technique

prioritizing low inference times, taking inspiration from the

machine learning literature where the efﬁcient combination of

classiﬁers is a heavily researched topic. Our experiments show

that the merger achieves inference times as low as 1 millisecond,

being signiﬁcantly faster than other well-established lightweight

VPR techniques, while achieving comparable or superior VPR

performance on several visual changes such as seasonal varia-

tions and viewpoint lateral shifts.

I. INTRODUCTION

Visual place recognition (VPR) allows a system to localize

itself in its operating environment using visual information,

matching the currently observed place to a previously seen

one. The localization information can then be used in down-

stream tasks such as Simultaneous Localization and Mapping

(SLAM), allowing for internal map correction in mobile

robotics systems [1].

VPR remains a difﬁcult task with an array of non-trivial

challenges. The same place can appear drastically different

due to changes in illumination [2], seasonal variations [3],

dynamic agents [4] and viewpoint variations [5]. In contrast,

perceptual aliasing errors occur when two different places

are identiﬁed as being the same due to visual similarities.

Developing VPR techniques which are resilient to all or even

just a subset of these errors is made harder when considering

that many mobile robotic applications operate on hardware

restricted platforms [6], [7], making computational efﬁciency

an added important consideration. Due to size, payload

limitation or cost, these platforms cannot carry powerful

hardware such as graphic process units (GPUs) and hence

rely on the development of efﬁcient localization algorithms to

1B. Arcanjo, B. Ferrarini, K. D. McDonald-Maier and S.

Ehsan are with the School of Computer Science and Electronic

Engineering, University of Essex, United Kingdom (email:

bq17319@essex.ac.uk; bferra@essex.ac.uk;

kdm@essex.ac.uk; sehsan@essex.ac.uk)

2M. Milford is with the School of Electrical Engineering and Computer

Science, Queensland University of Technology, Brisbane, QLD 4000, Aus-

tralia (email: michael.milford@qut.edu.au)

autonomously navigate their environment. Moreover, devel-

oping lightweight techniques also beneﬁts applications with

more capable hardware, freeing up resources to be used in

other important navigation tasks [8], [9].

The different challenges that VPR presents have driven the

development of several techniques, some being more resilient

to certain visual variations than others [10], [11], [12] and

with different computational efﬁciencies. The availability

of multiple VPR approaches with different characteristics

has led to research in fusing different techniques into a

single performing algorithm, enabling more accurate VPR

[13]. However, most of the current fusion-based techniques

[13], [14] heavily focus on VPR performance, with little

attention given to computational efﬁciency, and might hence

be unsuitable for resource constrained robotic platforms.

This work aims to achieve highly efﬁcient VPR, with low

inference times. without relying on a dedicated graphics unit

or any other dedicated computational hardware. We propose

a novel lightweight system based on combining multiple

compact classiﬁers using a one-dimensional convolutional

neural network, dubbed merger. We start by introducing an

efﬁcient binary-weighted neural network [15] as the baseline

classifying unit. The merger network is then trained on score

vectors obtained by multiple units, where the convolutional

layer learns to relate the estimations of baseline models while

incorporating sequential information. The proposed training

schema relies on data augmentation to allow training both

the baseline stage and the merger with a single environment

sequence in an end-to-end fashion.

To the best of our knowledge, this is the ﬁrst VPR

technique to introduce the fusion of multiple baseline models

with a learned network. We claim the following contributions

in this manuscript:

•a lightweight, neural network with binary weights to be

used as the baseline classiﬁer of our VPR system.

•an one-dimensional convolutional neural network which

efﬁciently combines the outputs of several baseline

classiﬁers.

•a data augmentation based training scheme for the

proposed VPR system, allowing to train both the base-

line classiﬁers and the merger network with a single

environment traversal sequence.

The paper is structured as follows. In Section II we give

an overview of the state-of-the-art VPR techniques and prior

work on combining multiple models into one system. In

Section III we detail our proposed system, from the binary-

weighted classiﬁer to the merger network and its training.

Section IV details our implementation settings, the usage of

arXiv:2210.00834v1 [cs.CV] 3 Oct 2022

datasets, and evaluation metrics. We present our results in

Section V and draw conclusions on this work and future

directions in Section VI.

II. RELATED WORK

Several visual place recognition techniques have been

proposed in the literature over the last decades. One popular

approach is to compute local image descriptors from the most

informative features in images from a training set [16], [17],

building an internal feature map. The map is then coupled

with a retrieval algorithm, such as Bag-of-Words (BoW),

which searches the feature map for the best match for an

input image at runtime. While this pipeline has been heavily

explored for VPR [18], [19], it faces important challenges

such as consistently selecting the most distinctive regions of

an image and the long-term efﬁcient storing and matching

of the computed descriptors. The use of convolutional neural

networks (CNNs) as feature extractors [20] addresses the ﬁrst

issue by having the network learn what information to extract

from the image and techniques based on CNN feature extrac-

tion have recently achieved state-of-the-art VPR performance

[21], [22], [23]. However, the use of often large CNNs

for feature extraction followed by searching the developed

feature map for an optimal match makes these algorithms

highly computationally intensive, and hence unsuitable for

resource constrained robotic platforms.

A possible alternative is to treat VPR as an image clas-

siﬁcation problem, where each place is treated as a class

and each image is a training example belonging to a class.

This approach has the advantage of encoding an image and

inferring the place match in a single step, which can be de-

signed to be highly efﬁcient. Both DrosoNet [24] and FlyNet

[25] are proposed classiﬁer models designed for extremely

lightweight VPR, with the downside of reduced performance.

These small classiﬁers seem to present inconsistent perfor-

mance, with two models of the same architecture and trained

on the same environment sequence often outputting different

place predictions for the same input image. [24] exploits

this observation by using a large number of classiﬁers and

combining their output with a handcrafted voting mechanism,

trying to cancel out the weak spots of individual units.

Combining the estimations of multiple classiﬁers to im-

prove classiﬁcation performance has been heavily researched

in the broader machine learning (ML) ﬁeld [26], [27],

leading to powerful and popular ensemble algorithms [28],

[29]. For a combination of classiﬁers to perform well, it

is important that their class estimations for a given input

are, to some degree, different [30]. Several approaches are

possible to choose a set of baseline classiﬁers that meets this

requirement. The most obvious of which is to simply use

different classiﬁer algorithms as baseline. If using multiple

of the same baseline classiﬁer, training the different models

on different training sets will also generate variation in the

predicted scores. When combining multiple identical neural

networks, it is also possible to rely on the random weight

initializations [31], [32] to achieve different estimations,

even when trained on the same training data. Despite of

the chosen approach, it is also important that the baseline

classiﬁers do not overﬁt the training data. Focusing on

efﬁciency, using multiple fully binary neural networks has

been shown to improve performance in image classiﬁcation

tasks [33] while retaining the beneﬁts of a low computational

footprint and power usage. Binary neural networks seem

particularly attractive to be combined in an ensemble, as each

individual network is a weak but efﬁcient classiﬁer [34]. The

combination method itself has also been thoroughly explored

in ML literature [26], [35].The most common approach is to

use a standard mathematical operation, such as multiplication

or summation, on the outputs of the different classiﬁers. In

contrast, it is also possible to train an output classiﬁer using

the scores of the baseline classiﬁers as inputs [36].

The combination of multiple baseline techniques to im-

prove place matching performance has also been proposed

for visual place recognition. [13] fuses the features obtained

by multiple state-of-the-art (SOTA) VPR techniques in a

Hidden Markov Model that also incorporates sequential

information. Rather than parallel fusion, [14] proposes the

use of an hierarchical usage of the baseline techniques. Mea-

suring how different baseline techniques complement each

other’s weaknesses has also been a recent research topic [37],

with [12] proposing a frame-by-frame selection of optimal

techniques to combine. While fusion based techniques often

achieve SOTA performance, they are demanding from a

computational resource perspective, not being suitable for

hardware constrained robotic applications. [24] shows that

it is possible to design lightweight VPR systems based on

merging classiﬁers as long as both the baseline unit and

the combination method are efﬁcient. Finally, to the best of

our knowledge, all of the so far proposed fusing approaches

for VPR are based on variations of the ﬁxed mathematical

operations discussed above, leaving the usage of a learned

method for merging models for VPR largely unexplored.

In manuscript proposes using a learned classiﬁer merging

neural network for lightweight visual place recognition. We

start from a compact baseline classiﬁer with binary weights

whose efﬁciency and compactness allow for the use of

multiple units in parallel and whose outputs are then treated

as inputs to a neural network, the merger. The merger

network is based on a one-dimensional convolutional layer

which combines the outputs of the baseline classiﬁers and

relates the scores of nearby places in a single operation.

We design an end-to-end training scheme based on data

augmentation, allowing the baseline classiﬁers and merger

network to be trained with a single environment traversal.

III. METHODOLOGY

The overall system consists of two main components:

a binary-weighted neural network classiﬁer used as the

baseline unit and a convolutional neural network that merges

multiple one-dimensional score vectors to achieve better clas-

siﬁcation performance. In this section, we start by explaining

the two networks’ architecture and then the merger net-

works’ training process. The last two subsections cover the

datasets that we utilize in our experiments and performance

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

MergingClassicationPredictionswithSequentialInformationforLightweightVisualPlaceRecognitioninChangingEnvironmentsBrunoArcanjo1,BrunoFerrarini1,MichaelMilford2,KlausD.McDonald-Maier1andShoaibEhsan1AbstractLow-overheadvisualplacerecognition(VPR)isahighlyactiveresearchtopic.Mobileroboticsapplications...

展开>> 收起<<

Merging Classiﬁcation Predictions with Sequential Information for Lightweight Visual Place Recognition in Changing Environments Bruno Arcanjo1 Bruno Ferrarini1 Michael Milford2 Klaus D. McDonald-Maier1and Shoaib Ehsan1.pdf

共8页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Merging Classiﬁcation Predictions with Sequential Information for Lightweight Visual Place Recognition in Changing Environments Bruno Arcanjo1 Bruno Ferrarini1 Michael Milford2 Klaus D. McDonald-Maier1and Shoaib Ehsan1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: