NAMED ENTITY DETECTION AND INJECTION FOR DIRECT SPEECH TRANSLATION Marco Gaidoyz Yun Tang Ilia Kulikov Rongqing Huang Hongyu Gong Hirofumi Inaguma Meta AI USAyFondazione Bruno Kessler ItalyzUniversity of Trento Italy

2025-05-02 0 0 654.18KB 5 页 10玖币

侵权投诉

NAMED ENTITY DETECTION AND INJECTION FOR DIRECT SPEECH TRANSLATION

Marco Gaido∗†‡ , Yun Tang?, Ilia Kulikov?, Rongqing Huang?, Hongyu Gong?, Hirofumi Inaguma?

?Meta AI, USA, †Fondazione Bruno Kessler, Italy, ‡University of Trento, Italy

mgaido@fbk.eu,{yuntang,kulikov,rhuangq,hygong,hirofumii}@meta.com

ABSTRACT

In a sentence, certain words are critical for its seman-

tic. Among them, named entities (NEs) are notoriously

challenging for neural models. Despite their importance,

their accurate handling has been neglected in speech-to-text

(S2T) translation research, and recent work has shown that

S2T models perform poorly for locations and notably person

names, whose spelling is challenging unless known in ad-

vance. In this work, we explore how to leverage dictionaries

of NEs known to likely appear in a given context to improve

S2T model outputs. Our experiments show that we can reli-

ably detect NEs likely present in an utterance starting from

S2T encoder outputs. Indeed, we demonstrate that the current

detection quality is sufﬁcient to improve NE accuracy in the

translation with a 31% reduction in person name errors.

Index Terms—speech translation, named entities

1. INTRODUCTION

Translation is the process to convey the same semantic mean-

ing of a source sentence into a target language. In this process,

named entities (NEs) – which identify real-world people, lo-

cations, organizations, etc. – play a paramount role and their

correct translation is crucial to express the accurate meaning

[1]. On the other end, current neural translation systems are

known to struggle in presence of rare words [2], as NEs often

are. These motivations drove researchers to study dedicated

solutions that exploit additional information available at infer-

ence time, such as bilingual dictionaries [3, 4, 5, 6]. All these

works, however, are targeted for text-to-text (T2T) translation

and assume that the dictionary entities present in the source

sentence can be easily retrieved with pattern matching. This

assumption does not hold for the speech-to-text (S2T) trans-

lation task, where the source modality is audio.

The S2T task was initially accomplished by a cascade of

automatic speech recognition (ASR) and T2T translation sys-

tems. However, end-to-end (or direct) S2T solutions have

recently progressed up to achieve similar translation quality

[7], with the beneﬁts of a simpler architecture and lower la-

tency. Cascade and direct models have been shown to equally

struggle with NEs [8], even more than T2T ones, especially

∗Work done during an internship at Meta AI.

regarding person names [9] that are particularly hard to rec-

ognize from speech. Despite this and the importance of NEs,

to the best of our knowledge, no work has so far explored

how to exploit contextual dictionaries of NEs available at in-

ference time in S2T. In addition, existing methods designed

for T2T are not applicable due to the different input modality.

Motivated by the practical relevance of the problem and

the lack of existing solutions, we present the ﬁrst approach to

exploit contextual information – in the form of a bilingual dic-

tionary of NEs – in direct S2T. Speciﬁcally, our main focus is

the detection of the NEs present in an utterance, among those

in a given contextual dictionary. Performing this task allows

us to rely on the existing solutions to inject the correct trans-

lations for the NEs. To showcase that the quality of our NE

detector is sufﬁcient to be useful, we adopt a decoder archi-

tecture similar to Contextual Listen Attend and Spell (CLAS)

[10] and provide it with the list of translated NEs considered

present by our detector module. Experimental results on 3

language pairs (en→es,fr,it) demonstrate that we can improve

NE accuracy by up to 7.1% over a base S2T model, and re-

duce the errors on person names by up to 31.3% over a strong

baseline exploiting the same inference-time contextual data.

2. ENTITY DETECTION FOR S2T TRANSLATION

Two operations are necessary to exploit a dictionary of NEs

likely to appear in an utterance: i) detect the relevant NEs

among those in the dictionary, ii) look at the corresponding

translations to accurately generate them. Accordingly, we add

two modules to the S2T model: i) a detector identifying the

NEs present in the utterance, and ii) a module informing the

decoder about the forms of the NEs in the target language.

2.1. Entity Detection

A recent research direction in S2T consists in training models

that jointly perform S2T and T2T to improve the quality of

direct S2T [11, 12]. These speech/text-to-text (ST2T) models

include auxiliary tasks to force the encoder outputs of differ-

ent modalities to be close when the text/audio content is the

same. Fig. 1 conﬁrms that encoder outputs for text (the text

is actually converted into phonemes before being fed to the

arXiv:2210.11981v2 [cs.CL] 11 Mar 2023

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

NAMEDENTITYDETECTIONANDINJECTIONFORDIRECTSPEECHTRANSLATIONMarcoGaidoyz,YunTang?,IliaKulikov?,RongqingHuang?,HongyuGong?,HirofumiInaguma??MetaAI,USA,yFondazioneBrunoKessler,Italy,zUniversityofTrento,Italymgaido@fbk.eu,fyuntang,kulikov,rhuangq,hygong,hirofumiig@meta.comABSTRACTInasentence,certainword...

展开>> 收起<<

NAMED ENTITY DETECTION AND INJECTION FOR DIRECT SPEECH TRANSLATION Marco Gaidoyz Yun Tang Ilia Kulikov Rongqing Huang Hongyu Gong Hirofumi Inaguma Meta AI USAyFondazione Bruno Kessler ItalyzUniversity of Trento Italy.pdf

共5页,预览1页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

NAMED ENTITY DETECTION AND INJECTION FOR DIRECT SPEECH TRANSLATION Marco Gaidoyz Yun Tang Ilia Kulikov Rongqing Huang Hongyu Gong Hirofumi Inaguma Meta AI USAyFondazione Bruno Kessler ItalyzUniversity of Trento Italy

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: