ARUBA An Architecture-Agnostic Balanced Loss for Aerial Object Detection Rebbapragada V C Sairam Monish Keswani Uttaran Sinha Nishit Shah Vineeth N Balasubramanian_2

2025-04-30 0 0 5.98MB 10 页 10玖币

侵权投诉

ARUBA: An Architecture-Agnostic Balanced Loss for Aerial Object Detection

Rebbapragada V C Sairam Monish Keswani Uttaran Sinha

Nishit Shah Vineeth N Balasubramanian

Indian Institute of Technology Hyderabad

{ai20resch13001, monish.keswani, cs17mtech11003, cs18mtech11020, vineethnb}@iith.ac.in

Abstract

Deep neural networks tend to reciprocate the bias of

their training dataset. In object detection, the bias exists in

the form of various imbalances such as class, background-

foreground, and object size. In this paper, we denote size of

an object as the number of pixels it covers in an image and

size imbalance as the over-representation of certain sizes

of objects in a dataset. We aim to address the problem of

size imbalance in drone-based aerial image datasets. Ex-

isting methods for solving size imbalance are based on ar-

chitectural changes that utilize multiple scales of images or

feature maps for detecting objects of different sizes. We,

on the other hand, propose a novel ARchitectUre-agnostic

BAlanced Loss (ARUBA) that can be applied as a plu-

gin on top of any object detection model. It follows a

neighborhood-driven approach inspired by the ordinality of

object size. We evaluate the effectiveness of our approach

through comprehensive experiments on aerial datasets such

as HRSC2016, DOTAv1.0, DOTAv1.5 and VisDrone and ob-

tain consistent improvement in performance.

1. Introduction

In recent years, drones have shown immense potential in

numerous disciplines. In military warfare, they can be used

as target decoys for combat missions. In agriculture, drones

provide farmers with real-time data to make informed har-

vesting decisions. For search-and-rescue, they can reach

places where humans cannot. Alternatively, they are also

used in ﬁre-ﬁghting, delivery of essentials and aerial pho-

tography. This increasing demand for drones in various do-

mains has recently encouraged the computer vision commu-

nity to work extensively on vision from drones [3].

Deep neural networks have led computer vision research

and development for a decade now on multiple challeng-

ing problems such as semantic segmentation, object detec-

tion/tracking, as well as image classiﬁcation. In object de-

tection, methods like FasterRCNN [27], YOLO [25], Reti-

naNet [15] and its variants have achieved decent perfor-

mance on many challenging datasets. With increased inter-

Figure 1: Predictions on an image from VisDrone dataset

[6] with Focal loss [14] vs Ours. Top: Focal loss fails

to detect many objects. Bottom: Ours is able to recog-

nize additional objects, including small ones, because of our

ARchitectUre-agnostic BAlanced (ARUBA) loss. Yellow

boxes indicate objects additionally detected.

est and creation of datasets in drone-based imagery, aerial

object detection [34,6] has gained a lot of interest from the

research community. Although the aforementioned meth-

ods exhibit exceptional performance on popular general ob-

ject detection datasets such as MSCOCO [16], aerial-object

datasets [34,6] pose more challenges, even to state-of-the-

art object detection models.

High variation in scale and orientation of objects in

aerial datasets, especially from drone images, make detect-

ing these objects quite challenging. Specialized methods

[8,37] have been proposed to capture the oriented bound-

ing boxes efﬁciently. An added difﬁculty in aerial datasets

[34,6,19] is that they are highly skewed in their object size

distribution in addition to the class distribution (as shown

in Figure 2. Note that in Figure 2b, x-axis shows the object

area bins where the size of the objects increases from left to

right and y-axis shows the number of object instances per

an area bin). We also observe that size imbalance is severe

in aerial object datasets when compared to more general-

purpose object detection datasets (refer Figure 3), which

motivates us to address this imbalance problem of drone-

based aerial datasets in this work.

arXiv:2210.04574v3 [cs.CV] 18 Nov 2023

144867

79337

29647 27059 24956

12875 10480 5926 4812 3246

Classes

Instances per class

100

150

200

250

car

pedestrian

motor

people

van

truck

bicycle

bus

tricycle

a-tricycle

(a) Class Imbalance

Area bins

Instances per bin

100

200

(b) Size Imbalance

Figure 2: Highly skewed class and size distributions in Vis-

Drone dataset

Size imbalance is a common problem in object detection

datasets, and many methods have been proposed to miti-

gate this issue, as summarized in [20]. Existing methods

[13,18,28] have largely proposed architectural modiﬁca-

tions to enhance the model’s ability to view objects at dif-

ferent scales. However, such multi-scale approaches arise

from careful engineering of architectures to suit a speciﬁc

domain or setting. In this work, we propose to address the

size imbalance problem from an architecture-agnostic bal-

anced loss perspective. One could also view our approach

as a long-tailed perspective to a size balance problem, un-

like the class imbalance setting that is typically studied in

long-tailed detection/recognition problems. size imbalance

in aerial datasets. In contrary to the existing methods, we

propose an architecture-independent approach which can be

applied as a plugin on top of any object detection method.

Long-tailed object detection methods typically focus on

datasets with skewed class distribution, to improve perfor-

mance on detecting and classifying minority classes. Many

methods [10,2,32,31,5,30,24] have been proposed

to tackle this problem from a class imbalance perspective

(summarized in Sec 2). We focus on the idea of using loss-

reweighting [5,30,24] wherein higher weights are assigned

to tail classes. Unlike class labels, size (when distinguished

as large-to-small) is an ordinal variable making it non-trivial

to apply existing solutions for class imbalance to size. Be-

sides, as shown in Figure 2b, small-sized objects are dom-

inant in drone-based aerial datasets and large-sized objects

are sparse. Although large-sized objects are the tail, they

have larger spatial support which can provide richer and

more useful features compared to small objects which can

make it helpful to detect them.

On the other hand, learning small-sized objects, although

the majority in such datasets, is challenging, even for state-

of-the-art detection models [27,25,15]. The increasing use

of drone images and the lack of a consistent method for

detection of objects of different sizes in such datasets mo-

tivates us to solve the severe size imbalance in such aerial

datasets. In summary, we address the long-tailed size im-

balance issue in drone-based aerial datasets rather than the

long-tailed class imbalance issue that is typically addressed

in earlier related efforts.

To this end, we propose a novel architecture-agnostic

loss-reweighting strategy which considers the ordinality of

Area bins

Instances per bin

200

(a) Size Imbalance-COCO

Area bins

Instances per bin

100

200

(b) Size Imbalance-VisDrone

Figure 3: Comparison of size imbalance severity between

general and drone-based aerial object datasets. Note that y-

axis is log of frequency, hence the effect is exponential in

terms of occurrence.

the size variable in its design. The performance of an object

detection model on instances of a given size would have a

contribution from object instances of neighboring sizes. For

example, given a particular class, a model learned on object

instances of area Xis more likely to recognize an instance

of area X±δrather than X±kδ, where kis a large integer.

We hence apply a Gaussian ampliﬁcation on the size distri-

bution to consider the effect of such neighborhood instances

(as detailed in Section 3).

We subsequently use a clustering approach to assign

weights to object instances based on their sizes. Finally, in-

spired by previous balanced loss work which focus on class

imbalance [5], we reweight the loss based on size clusters

to suit our problem. Unlike existing methods for long-tailed

class imbalance which assign lower weights to head cate-

gories, our method assigns higher weights to the head cat-

egories (small-sized objects) ensuring that the model learns

better on them. We show that the size-imbalance problem

can be addressed using such a loss-based approach without

the need for time-consuming architecture engineering. To

summarize, our key contributions are as follows:

• We propose a novel architecture-agnostic loss-

reweighting strategy to solve the severe size im-

balance issue in drone-based aerial image datasets.

We call this ARchitectUre-agnostic BAlanced Loss

(ARUBA), which can be applied while training any

object detection model.

• To the best of our knowledge, this is the ﬁrst such loss-

based approach to handle size imbalance in this do-

main. Our key observations around the ordinality of

the considered categories and the connection of such

ordering to a model’s performance may be useful in

other settings with ordinal categories (e.g. class labels

of a disease with increasing severity levels).

• We propose a simple yet effective pipeline based on

well-known modules to achieve the objectives using

our loss-reweigting strategy. Our extensive experimen-

tal results corroborate the usefulness of this pipeline.

• We perform a comprehensive suite of experiments on

multiple drone-based aerial image datasets including

HRSC2016, DOTA-v1.0, DOTA-v1.5 and VisDrone to

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ARUBA:AnArchitecture-AgnosticBalancedLossforAerialObjectDetectionRebbapragadaVCSairamMonishKeswaniUttaranSinhaNishitShahVineethNBalasubramanianIndianInstituteofTechnologyHyderabad{ai20resch13001,monish.keswani,cs17mtech11003,cs18mtech11020,vineethnb}@iith.ac.inAbstractDeepneuralnetworkstendtorecipro...

展开>> 收起<<

ARUBA An Architecture-Agnostic Balanced Loss for Aerial Object Detection Rebbapragada V C Sairam Monish Keswani Uttaran Sinha Nishit Shah Vineeth N Balasubramanian_2.pdf

共10页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

ARUBA An Architecture-Agnostic Balanced Loss for Aerial Object Detection Rebbapragada V C Sairam Monish Keswani Uttaran Sinha Nishit Shah Vineeth N Balasubramanian_2

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: