MODEL-BASED ESTIMATION OF IN-CAR-COMMUNICATION FEEDBACK APPLIED TO SPEECH ZONE DETECTION Kaspar M uller1 Simon Doclo2 Jan Østergaard3 Tobias Wolff1

2025-05-06 0 0 455.94KB 5 页 10玖币

侵权投诉

MODEL-BASED ESTIMATION OF IN-CAR-COMMUNICATION FEEDBACK

APPLIED TO SPEECH ZONE DETECTION

Kaspar M¨

uller1, Simon Doclo2, Jan Østergaard 3, Tobias Wolff 1

1Cerence GmbH, Acoustic Speech Enhancement, Ulm, Germany, kaspar.mueller@cerence.com

2University of Oldenburg, Department of Medical Physics and Acoustics and

Cluster of Excellence Hearing4all, Oldenburg, Germany

3Aalborg University, Department of Electronic Systems, Aalborg, Denmark

ABSTRACT

Modern cars provide versatile tools to enhance speech communication.

While an in-car communication (ICC) system aims at enhancing com-

munication between the passengers by playing back desired speech via

loudspeakers in the car, these loudspeaker signals may disturb a speech

enhancement system required for hands-free telephony and automatic

speech recognition. In this paper, we focus on speech zone detection,

i.e. detecting which passenger in the car is speaking, which is a crucial

component of the speech enhancement system. We propose a model-

based feedback estimation method to improve robustness of speech

zone detection against ICC feedback. Specifically, since the zone de-

tection system typically does not have access to the ICC loudspeaker

signals, the proposed method estimates the feedback signal from the

observed microphone signals based on a free-field propagation model

between the loudspeakers and the microphones as well as the ICC gain.

We propose an efficient recursive implementation in the short-time

Fourier transform domain using convolutive transfer functions. A re-

alistic simulation study indicates that the proposed method allows to

increase the ICC gain by about

dB while still achieving robust speech

zone detection results.

Index Terms—feedback suppression, in-car communication,

hands-free, speaker activity detection, speech zone detection

1. INTRODUCTION

While multiple built-in loudspeakers for passenger entertainment in

cars have been a standard for decades, modern car cabins are increas-

ingly equipped with multiple distributed microphones for applications

such as hands-free telephony, automatic speech recognition or in-car

communication (ICC). The former applications are designed for com-

munication with external parties and require speech enhancement, e.g.

beamforming or noise reduction [1

–

3]. On the other hand, ICC systems

are designed to enhance speech intelligibility between passengers by

reinforcing desired speech signals in the car cabin [4–6]. Often, this

is achieved by recording the speech signal of a front passenger and re-

producing it over loudspeakers at the rear cabin (see Fig. 1) to prevent

the driver from turning around or shouting in order to be understood.

The main challenge of ICC systems is to stabilize the closed electroa-

coustic loop resulting from the feedback of the loudspeaker signals

to the microphones [4–10]. In practice, hands-free and ICC systems

often run on different processors with highly restricted information ex-

change meaning that the loudspeaker signals used for the ICC system

are generally not available for the hands-free system, which rules out a

joint processing of both systems as proposed in [11].

This project has received funding from the SOUNDS European Training

Network, an European Union’s Horizon 2020 research and innovation pro-

gramme under the Marie Skłodowska-Curie grant agreement No. 956369.

ICC

Hands-Free System

Limited

information

exchange

y11

Fig. 1. Exemplary car setup with independent hands-free and in-car

communication (ICC) systems. Dashed circles symbolize speech zones.

Depending on the ICC gain, the speech enhancement performance of

the hands-free system may be substantially degraded by the ICC sys-

tem. In this paper, we specifically focus on its influence on speech

zone detection: When speech zones are defined in the car (one zone for

each seat) to achieve speech enhancement for each zone individually,

it is required to distinguish which zone is active, i.e. which passenger

is speaking. According to [12], speech zone detection can be achieved

by evaluating the maximum signal power ratios of passenger-dedicated

microphones, as the speaker-dedicated microphone typically shows the

highest signal power. However, this assumption may be violated in

combination with an ICC system since the signal power of microphones

close to ICC loudspeakers may exceed that of the speaker-dedicated

microphone. One might consider classical feedback cancellation tech-

niques [7] to remove the ICC feedback from the microphone signals.

This would however require a loudspeaker reference signal, which in

practice is not available for speech zone detection.

In this paper, a model-based feedback signal estimation method

is proposed. This method estimates the ICC feedback contribution in

the observed microphone signals without requiring access to the clean

loudspeaker signals and by only considering free-field propagation

between the loudspeakers and microphones. We propose an efficient

recursive implementation in the short-time Fourier transform (STFT)

domain using convolutive transfer functions (CTF) [13]. Finally, we

suppress the ICC feedback contribution from the power spectral densi-

ties (PSD) of the observed microphone signals, which helps to improve

robustness of speech zone detection against ICC feedback.

2. SIGNAL MODEL

We consider a car environment with a single speaker, Mpassenger-

dedicated microphones and an ICC system with Lloudspeakers (see

Fig. 2). The observed microphone signals

ym(n)

, with the microphone

index

m={0, ..., M−1}

and the sample time index

, consist of three

arXiv:2210.03363v1 [cs.SD] 7 Oct 2022

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

MODEL-BASEDESTIMATIONOFIN-CAR-COMMUNICATIONFEEDBACKAPPLIEDTOSPEECHZONEDETECTIONKasparM¨uller1,SimonDoclo2,JanØstergaard3,TobiasWolff11CerenceGmbH,AcousticSpeechEnhancement,Ulm,Germany,kaspar.mueller@cerence.com2UniversityofOldenburg,DepartmentofMedicalPhysicsandAcousticsandClusterofExcellenceHearing...

展开>> 收起<<

MODEL-BASED ESTIMATION OF IN-CAR-COMMUNICATION FEEDBACK APPLIED TO SPEECH ZONE DETECTION Kaspar M uller1 Simon Doclo2 Jan Østergaard3 Tobias Wolff1.pdf

共5页,预览1页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

MODEL-BASED ESTIMATION OF IN-CAR-COMMUNICATION FEEDBACK APPLIED TO SPEECH ZONE DETECTION Kaspar M uller1 Simon Doclo2 Jan Østergaard3 Tobias Wolff1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: