MODEL-BASED ESTIMATION OF IN-CAR-COMMUNICATION FEEDBACK APPLIED TO SPEECH ZONE DETECTION Kaspar M uller1 Simon Doclo2 Jan Østergaard3 Tobias Wolff1

2025-05-06 0 0 455.94KB 5 页 10玖币
侵权投诉
MODEL-BASED ESTIMATION OF IN-CAR-COMMUNICATION FEEDBACK
APPLIED TO SPEECH ZONE DETECTION
Kaspar M¨
uller1, Simon Doclo2, Jan Østergaard 3, Tobias Wolff 1
1Cerence GmbH, Acoustic Speech Enhancement, Ulm, Germany, kaspar.mueller@cerence.com
2University of Oldenburg, Department of Medical Physics and Acoustics and
Cluster of Excellence Hearing4all, Oldenburg, Germany
3Aalborg University, Department of Electronic Systems, Aalborg, Denmark
ABSTRACT
Modern cars provide versatile tools to enhance speech communication.
While an in-car communication (ICC) system aims at enhancing com-
munication between the passengers by playing back desired speech via
loudspeakers in the car, these loudspeaker signals may disturb a speech
enhancement system required for hands-free telephony and automatic
speech recognition. In this paper, we focus on speech zone detection,
i.e. detecting which passenger in the car is speaking, which is a crucial
component of the speech enhancement system. We propose a model-
based feedback estimation method to improve robustness of speech
zone detection against ICC feedback. Specifically, since the zone de-
tection system typically does not have access to the ICC loudspeaker
signals, the proposed method estimates the feedback signal from the
observed microphone signals based on a free-field propagation model
between the loudspeakers and the microphones as well as the ICC gain.
We propose an efficient recursive implementation in the short-time
Fourier transform domain using convolutive transfer functions. A re-
alistic simulation study indicates that the proposed method allows to
increase the ICC gain by about
6
dB while still achieving robust speech
zone detection results.
Index Termsfeedback suppression, in-car communication,
hands-free, speaker activity detection, speech zone detection
1. INTRODUCTION
While multiple built-in loudspeakers for passenger entertainment in
cars have been a standard for decades, modern car cabins are increas-
ingly equipped with multiple distributed microphones for applications
such as hands-free telephony, automatic speech recognition or in-car
communication (ICC). The former applications are designed for com-
munication with external parties and require speech enhancement, e.g.
beamforming or noise reduction [1
3]. On the other hand, ICC systems
are designed to enhance speech intelligibility between passengers by
reinforcing desired speech signals in the car cabin [4–6]. Often, this
is achieved by recording the speech signal of a front passenger and re-
producing it over loudspeakers at the rear cabin (see Fig. 1) to prevent
the driver from turning around or shouting in order to be understood.
The main challenge of ICC systems is to stabilize the closed electroa-
coustic loop resulting from the feedback of the loudspeaker signals
to the microphones [4–10]. In practice, hands-free and ICC systems
often run on different processors with highly restricted information ex-
change meaning that the loudspeaker signals used for the ICC system
are generally not available for the hands-free system, which rules out a
joint processing of both systems as proposed in [11].
This project has received funding from the SOUNDS European Training
Network, an European Union’s Horizon 2020 research and innovation pro-
gramme under the Marie Skłodowska-Curie grant agreement No. 956369.
ICC
Hands-Free System
Limited
information
exchange
y3
y2
u
y11
0
3
2
y0
Fig. 1. Exemplary car setup with independent hands-free and in-car
communication (ICC) systems. Dashed circles symbolize speech zones.
Depending on the ICC gain, the speech enhancement performance of
the hands-free system may be substantially degraded by the ICC sys-
tem. In this paper, we specifically focus on its influence on speech
zone detection: When speech zones are defined in the car (one zone for
each seat) to achieve speech enhancement for each zone individually,
it is required to distinguish which zone is active, i.e. which passenger
is speaking. According to [12], speech zone detection can be achieved
by evaluating the maximum signal power ratios of passenger-dedicated
microphones, as the speaker-dedicated microphone typically shows the
highest signal power. However, this assumption may be violated in
combination with an ICC system since the signal power of microphones
close to ICC loudspeakers may exceed that of the speaker-dedicated
microphone. One might consider classical feedback cancellation tech-
niques [7] to remove the ICC feedback from the microphone signals.
This would however require a loudspeaker reference signal, which in
practice is not available for speech zone detection.
In this paper, a model-based feedback signal estimation method
is proposed. This method estimates the ICC feedback contribution in
the observed microphone signals without requiring access to the clean
loudspeaker signals and by only considering free-field propagation
between the loudspeakers and microphones. We propose an efficient
recursive implementation in the short-time Fourier transform (STFT)
domain using convolutive transfer functions (CTF) [13]. Finally, we
suppress the ICC feedback contribution from the power spectral densi-
ties (PSD) of the observed microphone signals, which helps to improve
robustness of speech zone detection against ICC feedback.
2. SIGNAL MODEL
We consider a car environment with a single speaker, Mpassenger-
dedicated microphones and an ICC system with Lloudspeakers (see
Fig. 2). The observed microphone signals
ym(n)
, with the microphone
index
m={0, ..., M1}
and the sample time index
n
, consist of three
978-1-6654-6867-1/22/$31.00 ©2022 IEEE
arXiv:2210.03363v1 [cs.SD] 7 Oct 2022
摘要:

MODEL-BASEDESTIMATIONOFIN-CAR-COMMUNICATIONFEEDBACKAPPLIEDTOSPEECHZONEDETECTIONKasparM¨uller1,SimonDoclo2,JanØstergaard3,TobiasWolff11CerenceGmbH,AcousticSpeechEnhancement,Ulm,Germany,kaspar.mueller@cerence.com2UniversityofOldenburg,DepartmentofMedicalPhysicsandAcousticsandClusterofExcellenceHearing...

展开>> 收起<<
MODEL-BASED ESTIMATION OF IN-CAR-COMMUNICATION FEEDBACK APPLIED TO SPEECH ZONE DETECTION Kaspar M uller1 Simon Doclo2 Jan Østergaard3 Tobias Wolff1.pdf

共5页,预览1页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:5 页 大小:455.94KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 5
客服
关注