BLIND SIGNAL DEREVERBERATION FOR MACHINE SPEECH RECOGNITION Samik Sadhu1 Hynek Hermansky12 1Center for Language and Speech Processing Johns Hopkins University USA

2025-04-27 0 0 4.75MB 5 页 10玖币

侵权投诉

BLIND SIGNAL DEREVERBERATION FOR MACHINE SPEECH RECOGNITION

Samik Sadhu1, Hynek Hermansky1,2

1Center for Language and Speech Processing, Johns Hopkins University, USA

2Human Language Technology Center of Excellence, Johns Hopkins University, USA

ABSTRACT

We present a method to remove unknown convolutive noise

introduced to speech by reverberations of recording environ-

ments, utilizing some amount of training speech data from the

reverberant environment, and any available non-reverberant

speech data. Using Fourier transform computed over long

temporal windows, which ideally cover the entire room im-

pulse response, we convert room induced convolution to addi-

tions in the log spectral domain. Next, we compute a spectral

normalization vector from statistics gathered over reverber-

ated as well as over clean speech in the log spectral domain.

During operation, this normalization vectors are used to al-

leviate reverberations from complex speech spectra recorded

under the same reverberant conditions . Such dereverberated

complex speech spectra are used to compute complex FDLP-

spectrograms for use in automatic speech recognition.

Index Terms—blind dereverberation, robust speech

recognition

1. INTRODUCTION

In many speech-to-text (STT) applications, the message-

carrying speech signal s(t)is corrupted by room reverbera-

tions n(t), yielding the reverberated signal o(t) = s(t)∗n(t)

where ∗is the convolution operator and tdenotes time. Ac-

cording to the convolution theorem of Fourier transform,

convolutions in time domain, turn into multiplication in the

spectral domain and thereby additions in log spectral domain

shown in equation 1.

log F(o(t)) = log F(o(t)) (1)

= log F(s(t)∗n(t))

= log(F(s(t)) × F(n(t)))

= log F(s(t)) + log F(n(t))

Findicates the Fourier transform operator. Thus, for known

n(t), the original signal s(t)could easily be recovered as

s(t) = F−1(exp(log F(o(t)) −log F(n(t)))) (2)

, where F−1is the inverse Fourier transform operator.

However, a few practical issues arise in this analysis.

• Even though n(t)has inﬁnite duration, in digital signal

processing, it can only be represented as a ﬁnite length

discrete time signal.

• In order to use equation 2 with digital signals, we need

equal length discrete Fourier transforms of the digitized

versions of the signals o(t)and n(t).

• As we shall show, arithmetic operations done in the log

spectral domain needs phase unwrapping operations to

remove phase ambiguity.

•n(t)is typically not known.

2. PROPOSED TECHNIQUE

Suppose that the inﬁnitely long impulse response n(t)can be

approximated by its truncated digital version n0={n0

k}S

k=1.

Even though n0is unknown, it can be assumed to be a constant

vector of real numbers.

Assume a digitized observed reverberated speech utter-

ances o={ok}S+T−1

k=1 , obtained by convolving a source sig-

nal s={sk}T

k=1 with n0. Appropriate number of zeros can

be appended to the signals to make them uniform length se-

quences leading to the equation

log F(o) = log F(s) + log F(n0)(3)

, where Fis the discrete Fourier transform operator. Since n0

is assumed to be a constant vector, so is log F(n0). Computing

the expected values on both sides of equation 3, we get

Elog F(o) = Elog F(s) + log F(n0)(4)

Thus, an estimate of the unchanging logarithmic spectra of

the room impulse response can simply be obtained as

log F(n0) = Elog F(o)−Elog F(s)(5)

Equation 5 forms the basis of our algorithm where the ex-

pected values are replaced with empirical sums computed

over a ﬁnite number of speech utterances to obtain an esti-

mate φof the log spectrum of the room impulse response

log F(n0). This estimate can be used to normalize the log

spectrum of the observed speech to estimate the clean speech

ˆs=F−1(exp(log F(o)−φ)) (6)

arXiv:2210.00117v1 [eess.AS] 30 Sep 2022

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

BLINDSIGNALDEREVERBERATIONFORMACHINESPEECHRECOGNITIONSamikSadhu1,HynekHermansky1;21CenterforLanguageandSpeechProcessing,JohnsHopkinsUniversity,USA2HumanLanguageTechnologyCenterofExcellence,JohnsHopkinsUniversity,USAABSTRACTWepresentamethodtoremoveunknownconvolutivenoiseintroducedtospeechbyreverberat...

展开>> 收起<<

BLIND SIGNAL DEREVERBERATION FOR MACHINE SPEECH RECOGNITION Samik Sadhu1 Hynek Hermansky12 1Center for Language and Speech Processing Johns Hopkins University USA.pdf

共5页,预览1页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

BLIND SIGNAL DEREVERBERATION FOR MACHINE SPEECH RECOGNITION Samik Sadhu1 Hynek Hermansky12 1Center for Language and Speech Processing Johns Hopkins University USA

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: