LEARNABLE ACOUSTIC FRONTENDS IN BIRD ACTIVITY DETECTION Mark Anderson Naomi Harte SIGMEDIA Lab

2025-05-02 0 0 1.25MB 5 页 10玖币
侵权投诉
LEARNABLE ACOUSTIC FRONTENDS IN BIRD ACTIVITY DETECTION
Mark Anderson, Naomi Harte
SIGMEDIA Lab
School of Engineering
Trinity College Dublin, Ireland
{andersm3, nharte}@tcd.ie
©2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including
reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or
reuse of any copyrighted component of this work in other works.
ABSTRACT
Autonomous recording units and passive acoustic monitor-
ing present minimally intrusive methods of collecting bioa-
coustics data. Combining this data with species agnostic bird
activity detection systems enables the monitoring of activity
levels of bird populations. Unfortunately, variability in am-
bient noise levels and subject distance contribute to difficul-
ties in accurately detecting bird activity in recordings. The
choice of acoustic frontend directly affects the impact these
issues have on system performance. In this paper, we bench-
mark traditional fixed-parameter acoustic frontends against
the new generation of learnable frontends on a wide-ranging
bird audio detection task using data from the DCASE2018
BAD Challenge. We observe that Per-Channel Energy Nor-
malization is the best overall performer, achieving an accu-
racy of 89.9%, and that in general learnable frontends sig-
nificantly outperform traditional methods. We also identify
challenges in learning filterbanks for bird audio.
Index TermsBird Activity Detection, Bioacoustics,
Learnable Frontend, PCEN, LEAF, STRF, TD Filterbanks
1. INTRODUCTION
Monitoring of bird populations is of increasing importance as
biodiversity continues to decline and the impacts of climate
change become more evident [1]. Many population studies
are utilising passive acoustic monitoring and autonomous
recording units to record bioacoustic audio [2, 3], which
present non-intrusive methods for carrying out research on
bird populations [4]. The results of these studies can serve as
indicators of the effects of climate change [5].
Audio is highly suitable for monitoring bird populations
as many species are identifiable by their vocalisations and vi-
sual analysis can be difficult. In the age of Deep Learning,
spectrogram representations of the audio are the most com-
mon input features to a system [6]. This form of representa-
tion is easily interpretable by humans and also allows Con-
volutional Neural Networks to extract spectro-temporal pat-
terns.
The primary parameters governing the spectrogram rep-
resentation of audio are the window size (determining the
time/frequency trade-off), frequency axis scaling and ampli-
tude compression. There is evidence that increased time res-
olution favours bird audio [7] but there is little consensus on
the best representation of the frequency axis, whether it is
linear, logarithmic or mel scaling [6]. There is further de-
bate on whether mel scaling is suitable for bioacoustics, as
it is derived from human auditory perception [8]. Bird au-
dio is typically in the range of 800-8000Hz, and mel scaling
focuses much of the energy in the lower bands. Log compres-
sion is loudness dependant and can increase the appearance
of noise in the spectrogram. The robustness of the chosen
representation to environmental noise and loudness variation
is paramount. In spite of this, standard log-mel spectrograms
used in human speech applications are ubiquitous in the anal-
ysis of bird vocalisations [6].
New trainable frontends have been developed for a range
of audio applications which can learn the filterbanks [9, 10],
perform loudness normalisation and noise reduction [11, 10],
or exploit patterns in temporal and frequency modulation
[12]. Per-Channel Energy Normalisation has been employed
recently in bioacoustics [13, 14] and the remaining frontends
have been tested on various bird audio tasks, including bird
activity detection. However, no dedicated comparative as-
sessment of the suitability of these methods in bird activity
detection has been done. As a vital first step to any subse-
quent population analysis, this is crucial for judging their
suitability for bird bioacoustics work.
This paper compares traditional and learnable frontends
using the same datasets and model architecture. We evalu-
ate performance on a species-agnostic bird activity detection
task. The datasets contain a large variety of bird species and
call types, as well as many noise sources (e.g. wind, traffic,
human activity and speech). We provide the most comprehen-
sive, up-to-date investigation on the suitability of learnable
frontends in bird audio detection to date. Section 2 details the
frontends. In Section 3 we outline our methodology for eval-
uation. In Section 4, we present and discuss the experimental
results and Section 5 concludes the paper.
978-1-6654-6867-1/22/$31.00 ©2022 IEEE
arXiv:2210.00889v1 [eess.AS] 3 Oct 2022
摘要:

LEARNABLEACOUSTICFRONTENDSINBIRDACTIVITYDETECTIONMarkAnderson,NaomiHarteSIGMEDIALabSchoolofEngineeringTrinityCollegeDublin,Irelandfandersm3,nharteg@tcd.ie©2022IEEE.Personaluseofthismaterialispermitted.PermissionfromIEEEmustbeobtainedforallotheruses,inanycurrentorfuturemedia,includingreprinting/repub...

展开>> 收起<<
LEARNABLE ACOUSTIC FRONTENDS IN BIRD ACTIVITY DETECTION Mark Anderson Naomi Harte SIGMEDIA Lab.pdf

共5页,预览1页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:5 页 大小:1.25MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 5
客服
关注