Machine Translation between Spoken Languages and Signed Languages Represented in SignWriting Zifan Jiang

2025-04-24 0 0 1.2MB 19 页 10玖币

侵权投诉

Machine Translation between Spoken Languages and Signed Languages

Represented in SignWriting

Zifan Jiang

University of Zurich

jiang@cl.uzh.ch

Amit Moryossef

Bar-Ilan University

University of Zurich

amitmoryossef@gmail.com

Mathias Müller

University of Zurich

mmueller@cl.uzh.ch

Sarah Ebling

University of Zurich

ebling@cl.uzh.ch

Abstract

This paper presents work on novel machine

translation (MT) systems between spoken and

signed languages, where signed languages are

represented in SignWriting, a sign language

writing system. Our work1seeks to address

the lack of out-of-the-box support for signed

languages in current MT systems and is based

on the SignBank dataset, which contains pairs

of spoken language text and SignWriting con-

tent. We introduce novel methods to parse,

factorize, decode, and evaluate SignWriting,

leveraging ideas from neural factored MT. In

a bilingual setup—translating from American

Sign Language to (American) English—our

method achieves over 30 BLEU, while in two

multilingual setups— translating in both direc-

tions between spoken languages and signed

languages—we achieve over 20 BLEU. We

ﬁnd that common MT techniques used to im-

prove spoken language translation similarly af-

fect the performance of sign language transla-

tion. These ﬁndings validate our use of an in-

termediate text representation for signed lan-

guages to include them in NLP research.

1 Introduction

Most current machine translation (MT) systems

only support spoken language input and output

(text or speech), which excludes around 200 dif-

ferent signed languages used by up to 70 million

deaf people

worldwide from modern language

technology. Since signed languages are also natu-

ral languages, Yin et al. (2021) calls for including

sign language processing (SLP) in natural language

processing (NLP) research.

From a technical point of view, SLP brings novel

challenges to NLP due to the visual-gestural modal-

ity of sign language and special linguistic features

Code and documentation available at

https://github.

com/J22Melody/signwriting-translation

According to the World Federation of the Deaf:

https:

//wfdeaf.org/our-work/

15/06/2022, 14:49

Sign Translate

https://sign.mt

1/1

 Sign Translate

Verse 41. He gave

her his hand and helped her up.

Then he called in the

widows and all the believers, and

he presented her to

them alive.

137 / 500



 Text







 

󲇈

󵨑

󸗨

󱦡󰀡

󿌁

󲇡󸗨

󲇸󲇐

󶿈

󰁈

󳈠 󳈘

󺒲 󺒡

󲇡󸗢

󲇸󲇐

󶿈

󿌁

  

UNITED STATES UNITED KINGDOM FRANCE

DETECT LANGUAGE ENGLISH FRENCH SPANISH

Figure 1: Demo application based on our models, trans-

lating from spoken languages to signed languages rep-

resented in SignWriting, then to human poses.

(e.g., the use of space, simultaneity, referencing),

which requires both computer vision (CV) and NLP

technologies. Crucially, the lack of a standardized

or widely used written form for signed languages

has hindered their inclusion in NLP research.

However, sign language writing systems do exist

and are sporadically used (e.g., SignWriting (Sut-

ton,1990) and HamNoSys (Prillwitz and Zienert,

1990)). Therefore, we adopt the proposal of Yin

et al. (2021) to formulate the sign language trans-

lation (SLT) task using a sign language writing

system as an intermediate step (illustrated by Fig-

ure 1): given spoken language text, we propose to

translate to sign language in a written form, then

transform this intermediate result into a ﬁnal video

or pose output

—and vice versa. According to this

multi-step view of SLT, in this work we study trans-

lation between signed languages in written form

and spoken languages. We use SignWriting as the

intermediate writing system.

SignWriting has many advantages, like being

universal (multilingual), comparatively easy to un-

derstand, extensively documented, and computer-

supported. In addition, despite looking picto-

graphic, it is a well-deﬁned writing system. Every

Note that the second step, animation of SignWriting into

human poses or video, is not included in this research. In the

demo application, spoken language text is translated directly

into sign language poses, resulting in low-quality output.

arXiv:2210.05404v2 [cs.CL] 23 Feb 2023

sign can be written as a sequence of symbols (box

markers, graphemes, and punctuation marks) and

their location on a 2-dimensional plane.

To our knowledge, this work is the ﬁrst to create

automatic SLT systems that use SignWriting. Our

main contributions are as follows: (a) we propose

methods to parse (§3.3), factorize (§3.4), decode

(§4.3), and evaluate (§4.3) SignWriting sequences;

(b) we report experiments on multilingual machine

translation systems between SignWriting and spo-

ken language text (§4); (c) we demonstrate that

common techniques for low-resource MT are bene-

ﬁcial for SignWriting translation systems (§5).

2 Background

2.1 Sign language processing (SLP)

SLP (Bragg et al.,2019;Yin et al.,2021;Moryossef

and Goldberg,2021) is an emerging subﬁeld of

both NLP and CV, which focuses on automatic

processing and analysis of sign language content.

Prominent tasks include pose estimation from sign

language videos (Cao et al.,2017,2021;Güler

et al.,2018), gloss transcription (Mesch and Wallin,

2012;Johnston and Beuzeville,2016;Konrad et al.,

2018), sign language detection (Borg and Camilleri,

2019;Moryossef et al.,2020), sign language identi-

ﬁcation (Gebre et al.,2013;Monteiro et al.,2016),

and sign language segmentation (Bull et al.,2020;

Farag and Brock,2019;Santemiz et al.,2009).

Besides, tasks including sign language recogni-

tion (Adaloglou et al.,2021), translation, and pro-

duction involve transforming one sign language rep-

resentation to another or from/to spoken language

text, as shown in Figure 2

. We ﬁnd that exist-

ing works cover gloss-to-text (Camgöz et al.,2018;

Yin and Read,2020) (where “text” denotes spo-

ken language text), text-to-gloss (Zhao et al.,2000;

Othman and Jemni,2012), video-to-text (Camgöz

et al.,2020b,a), pose-to-text (Ko et al.,2019), and

text-to-pose (Saunders et al.,2020a,b,c;Zelinka

and Kanis,2020;Xiao et al.,2020).

2.2 Motivation

Our work is the ﬁrst to explore translation between

spoken language text and sign language content

represented in SignWriting

. We focus on a sign

language writing system for the following reasons:

In the paper, we distinguish between a phonetic “writing

system” (e.g., SignWriting) and “glosses” (lexical notation,

marking the semantics of each sign with a distinct category).

Related work based on HamNoSys: Morrissey (2011);

Sanaullah et al. (2021); Walsh et al. (2022)

21/06/2022, 14:40

https://raw.githubusercontent.com/sign-language-processing/sign-language-processing.github.io/eddb4ac50ffc7698d4b2b9c8c34d6397721…

https://raw.githubusercontent.com/sign-language-processing/sign-language-processing.github.io/eddb4ac50ffc7698d4b2b9c8c34d63977211602c/src/assets/tasks/…

1/1

Video Text

Pose Glosses

Writing System

Figure 2: SLP tasks. Every edge on the left side rep-

resents a task in CV (language-agnostic). Every edge

on the right side represents a task in NLP (language-

speciﬁc). Every edge crossing both sides represents a

task requiring a combination of CV and NLP. Figure

taken from Moryossef and Goldberg (2021).

(a) currently an end-to-end (video-to-text/text-to-

video) approach is not feasible. State-of-the-art

systems either have a BLEU score lower than 1

(Müller et al.,2022a) or work only on a very nar-

row linguistic domain, e.g., Camgöz et al. (2020b,a)

work on the RWTH-PHOENIX-Weather T data set

which covers only 1,231 unique signs from weather

reports (less than what we use in Table 2); (b) a

writing system is lower-dimensional than videos

(not all parts of a video are relevant in a linguistic

sense), while adequate to encode information of

signs; (c) written sign language is a closer ﬁt to

current MT pipelines than videos or poses; (d) a

phonetic writing system is a more universal solu-

tion than glossing since glosses are semantic and

therefore language-speciﬁc, and are an inadequate

representation of meaning (Müller et al.,2022b).

2.3 SignWriting, FSW, and SWU

SignWriting (Sutton,1990) is a featural and vi-

sually iconic sign language writing system (intro-

duced extensively in Appendix A). Previous work

explored recognition (Stiehl et al.,2015) and ani-

mation (Bouzid and Jemni,2013) of SignWriting.

SignWriting has two computerized speciﬁca-

tions, Formal SignWriting in ASCII (FSW) and

SignWriting in Unicode (SWU). SignWriting is

two-dimensional, but FSW and SWU are written

linearly, similar to spoken languages. Figure 3

gives an example of the relationship between Sign-

Writing, FSW, and SWU

. We use FSW in our

research instead of SWU to explore the potential

of factorizing SignWriting symbols and utilizing

numerical values of their position (§3.3, §3.4).

Online demonstration:

https://slevinski.github.

io/SuttonSignWriting/characters/index.html.

Figure 3: “Hello world.” in FSW, SWU and SignWriting graphics. In FSW/SWU, A/SWA and M/SWM are the box

markers (acting as sign boundaries); S14c20 and S27106 (graphemes in SWU) are the symbols; 518 and 529 are

the x, y positional numbers on a 2-dimensional plane that denote symbols’ position within a sign box, S38800

(horizontal bold line in SWU) is the punctuation full stop symbol.

3 Data and method

The data source we use for this research is Sign-

Bank, the largest repository of SignPuddles

A SignPuddle is a community-driven dictionary

where users add parallel examples of SignWrit-

ing and spoken language text (not necessarily with

corresponding videos and glosses). The puddles

contain material from various signed languages and

linguistic domains (e.g., general literature or Bible)

without a strict writing standard. We use the Sign

Language Datasets (Moryossef and Müller,2021)

library to load SignBank as a Tensorﬂow Dataset.

3.1 Data statistics

In SignBank, there are roughly 220k parallel sam-

ples from 141 puddles covering 76 language pairs,

yet the distribution is unbalanced (full details in

Appendix C). Relatively high-resource language

pairs (over 10k samples) are listed in Table 1.

Notably, most of the puddles are dictionaries,

which we consider less valuable than sentence pairs

(instances of continuous signing) for a general MT

system. If dictionaries are used as training data,

we expect models to memorize word mappings and

not learn to generate sentences.

Therefore, we treat the four sentence-pair pud-

dles (Table 2) of the relatively high-resource lan-

guage pairs as primary data and the other dictionary

puddles as auxiliary data. Note that even the lan-

guage pairs constituting the high-resource pairs of

SignBank are low-resource compared to datasets

used in mature MT systems for spoken languages,

where millions of parallel sentences are common-

place (Akhbardeh et al.,2021).

7https://www.signbank.org/signpuddle/

3.2 Data preprocessing

We ﬁrst perform general data cleaning to extract

the main body of spoken language text and remove

irrelevant parts such as HTML tags or samples that

are empty or too long (100 words for a dictionary

entry). We then learn a byte pair encoding (BPE)

segmentation (Sennrich et al.,2016) on the cleaned

spoken language text, using a vocabulary size of

2,000.

Multilingual models

In our multilingual experi-

ments (§4.2, §4.3), we learn a shared BPE model

across all spoken languages.

Following Johnson et al. (2017), we add special

tags at the beginning of source sequences to indi-

cate the desired target language and nature of the

training data (sentence pair or dictionary). Three

types of tags are designed to encode all necessary

information: (a) spoken language code; (b) coun-

try code

; (c) dictionary vs. sentence pair. For

example, an English sentence to be translated into

American Sign Language is represented as the fol-

lowing:

<2en> <4us> <sent> Hello world.

Data split

We shufﬂe the data and split it into

95%, 3%, and 2% for training, validation, and test

sets, respectively.

3.3 FSW parsing

On the sign language side, an appropriate segmen-

tation and tokenization strategy is needed for the

FSW data. We parse an original FSW sequence

(e.g. Figure 3) into several pieces:

• box markers: A,M,L,R,B;

spoken language code plus country code speciﬁes a one-

to-one mapping to a related signed language in our data.

language pair #samples #puddles

en-us (American English & American Sign Language) 43,698 7

pt-br (Brazilian Portuguese & Brazilian Sign Language) 42,454 3

de-de (Standard German & German Sign Language) 24,704 3

fr-ca (Canadian French & Quebec Sign Language) 11,189 3

Table 1: Relatively high-resource language pair statistics.

puddle name language pair #samples #signs mean sequence len

Literature US en-us 700 9,922 24

ASL Bible Books NLT en-us 11,667 51,485 24

ASL Bible Books Shores Deaf Church en-us 4,321 44,612 31

Literatura Brasil pt-br 1,884 19,221 13

Table 2: Primary sentence-pair puddles. Mean sequence length is measured by the mean number of words in the

spoken language sentences.

• symbols: S1f010,S18720, etc.;

• positional numbers x and y: 515,483, etc.;

•

punctuation marks (special symbols without

box markers): S38800, etc.

We further factorize each symbol into several

parts regarding its orientation (see Figure 7in Ap-

pendix Afor an explicit motivation of this step).

For example, the symbol S1f010 is split into:

• symbol core: S1f0;

• column number (from 0to 5): 1;

• row number (from 0to hex F): 0.

For positional numbers, which have a large range

(from

250

750

) and are encoded discretely, we

hypothesise that models might have difﬁculty un-

derstanding their relative order. Therefore, we fur-

ther calculate two additional factors that denote a

symbol’s relative position (based on the absolute

numbers) within a sign: relative x and relative y,

both ranging from 0to #symbols - 1.

We provide a full example of the result of FSW

parsing in Listing 1in Appendix C.

3.4 Factored machine translation

We use a factored machine translation system

(Koehn and Hoang,2007;Garcia-Martinez et al.,

2016) to encode or decode parsed FSW sequences.

We argue that this architecture is suitable because

Source Target

symbol S1f010

“Hi”

X 515

Y 483

relative X 0

relative Y 1

symbol core S1f0

column 1

row 0

Figure 4: Representation of translating a FSW symbol

together with its factors to English.

concatenating all parsed FSW tokens results in se-

quences much longer than the maximum length of

many Transformer models (e.g., 512).

From another perspective, the essential infor-

mation units are the symbols. Nevertheless, the

positional numbers are necessary to determine how

symbols are assembled. The same symbols can be

arranged differently in space to convey different

meanings.

In our setup, we treat the symbols (including

punctuation marks and box markers) as the primary

source/target tokens and the rest as source/target

factors that are strictly aligned with each source/-

target token (illustrated by Figure 4).

Depending on the translation direction, factored

FSW representations need to be encoded or de-

coded. For encoding (when FSW is the source), we

embed each factor separately and then concatenate

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

MachineTranslationbetweenSpokenLanguagesandSignedLanguagesRepresentedinSignWritingZifanJiangUniversityofZurichjiang@cl.uzh.chAmitMoryossefBar-IlanUniversityUniversityofZurichamitmoryossef@gmail.comMathiasMüllerUniversityofZurichmmueller@cl.uzh.chSarahEblingUniversityofZurichebling@cl.uzh.chAbstractT...

展开>> 收起<<

Machine Translation between Spoken Languages and Signed Languages Represented in SignWriting Zifan Jiang.pdf

共19页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Machine Translation between Spoken Languages and Signed Languages Represented in SignWriting Zifan Jiang

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: