Submission Date September 30 2022 Updated on October 12 2022 1 SDW -ASL A Dynamic System to Generate Large Scale Dataset for Continuous American Sign Language

2025-04-24 0 0 398.08KB 9 页 10玖币

侵权投诉

Submission Date: September 30, 2022; Updated on October 12, 2022 1

SDW-ASL: A Dynamic System to Generate Large Scale Dataset for

Continuous American Sign Language

Yehong Jiang

The Nueva School

yehjian@nuevaschool.org

Abstract

Despite tremendous progress in natural language processing using deep learning

techniques in recent years, sign language production and comprehension has

advanced very little. One critical barrier is the lack of largescale datasets available

to the public due to the unbearable cost of labeled data generation. Efforts to

provide public data for American Sign Language (ASL) comprehension have

yielded two datasets, comprising more than thousand video clips. These datasets

are large enough to enable a meaningful start to deep learning research on sign

languages but are far too small to lead to any solution that can be practically

deployed. So far, there is still no suitable dataset for ASL production.

We proposed a system that can generate large scale ASL datasets for continuous

ASL. It is suitable for general ASL processing and is particularly useful for ASL

production. The continuous ASL dataset contains English labeled human

articulations in condensed body pose data formats. To better serve the research

community, we are releasing the first version of our ASL dataset, which contains

30k sentences, 416k words, a vocabulary of 18k words, in a total of 104 hours. This

is the largest continuous sign language dataset published to date in terms of video

duration. We also describe a system that can evolve and expand the dataset to

incorporate better data processing techniques and more contents when available. It

is our hope that the release of this ASL dataset and the sustainable dataset

generation system to the public will propel better deep-learning research in ASL

natural language processing.

Keywords: American Sign Language, Natural Language Processing, Language Production, Dataset, Deep

Learning, Neural Machine Translation

The dataset can be viewed at https://adeddb94ac1d.ngrok.io

Submission Date: September 30, 2022; Updated on October 12, 2022 2

1. Introduction

According to a deaf community survey study conducted by R. Mitchell [1], 5% of the worldwide population

suffers from hearing impairment. 80% of all deaf people cannot read or write. Here in the United States,

about one million people are legally deaf, which is 4% of the population. Even among the educated deaf

population, reading capability is far behind the performance of the hearing population. The mean reading

grade level for a deaf high-school graduate is 5.9, while hearing people have a mean reading grade of 9.8.

Additionally, there are 500,000 people who use American Sign Language (ASL) as their primary method

of communication. English is usually a second language for most ASL users. According to the National

Association of the Deaf, many deaf people feel that captions are inaccurate, difficult to follow, or

“inaccessible” [2]. During the global COVID-19 pandemic, the federal and state governments, including

the Department of Health [3], put up special effort to serve the deaf community by putting out ASL videos

and hiring ASL signers for COVID-19 responses and other related announcements. However, many deaf

people have still reported that they are unable to understand from coronavirus briefings what they need to

do to stay safe and healthy [2].

Non-sign language users are enjoying better language related services than sign language users, such as

machine command and dialogue, hands-free communication with devices, language translation, automatic

caption, transcription and translation for streaming and conferencing video, etc. Some very early research

works in sign language processing show that recent breakthroughs in machine learning, computer vision

and natural language processing can also be applied to sign languages [9, 10, 11]. Advances in sign language

research will eventually help to bring these services for non-sign language users to the Deaf community. In

addition, computers may perform sign language interpretation when a human interpreter is not available.

It is known that deep neural network techniques require very large datasets. Lack of public-domain large

sign language data has been recognized as the primary barrier against advancing sign language processing

research. Efforts to provide public data for ASL understanding have yielded two datasets up to 80 hours of

video clips. These datasets are large enough to enable meaningful early deep learning research on sign

languages but are far too small to lead to any solution that can be practically deployed. So far, there is still

no suitable dataset for ASL production.

In this paper, we introduce the SDW-ASL system that facilitates the generation of large scale ASL dataset

and the release of the first generation SDW-ASL dataset to the public.

2. Background and Related Work

2.1. Methods to Record Sign Language

Sign language is a multimodal visual language. One way to record a sign language is through glossing.

Gloss is a written approximation of another language. It is possible to gloss American Sign Language (ASL)

using English with additional symbols. However, the gloss annotation requires human ASL linguists. The

process is limited, time consuming and expensive. For example, How2Sign dataset includes gloss

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

SubmissionDate:September30,2022;UpdatedonOctober12,20221SDW-ASL:ADynamicSystemtoGenerateLargeScaleDatasetforContinuousAmericanSignLanguageYehongJiangTheNuevaSchoolyehjian@nuevaschool.orgAbstractDespitetremendousprogressinnaturallanguageprocessingusingdeeplearningtechniquesinrecentyears,signlanguagep...

展开>> 收起<<

Submission Date September 30 2022 Updated on October 12 2022 1 SDW -ASL A Dynamic System to Generate Large Scale Dataset for Continuous American Sign Language.pdf

共9页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Submission Date September 30 2022 Updated on October 12 2022 1 SDW -ASL A Dynamic System to Generate Large Scale Dataset for Continuous American Sign Language

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: