Submission Date: September 30, 2022; Updated on October 12, 2022 2
1. Introduction
According to a deaf community survey study conducted by R. Mitchell [1], 5% of the worldwide population
suffers from hearing impairment. 80% of all deaf people cannot read or write. Here in the United States,
about one million people are legally deaf, which is 4% of the population. Even among the educated deaf
population, reading capability is far behind the performance of the hearing population. The mean reading
grade level for a deaf high-school graduate is 5.9, while hearing people have a mean reading grade of 9.8.
Additionally, there are 500,000 people who use American Sign Language (ASL) as their primary method
of communication. English is usually a second language for most ASL users. According to the National
Association of the Deaf, many deaf people feel that captions are inaccurate, difficult to follow, or
“inaccessible” [2]. During the global COVID-19 pandemic, the federal and state governments, including
the Department of Health [3], put up special effort to serve the deaf community by putting out ASL videos
and hiring ASL signers for COVID-19 responses and other related announcements. However, many deaf
people have still reported that they are unable to understand from coronavirus briefings what they need to
do to stay safe and healthy [2].
Non-sign language users are enjoying better language related services than sign language users, such as
machine command and dialogue, hands-free communication with devices, language translation, automatic
caption, transcription and translation for streaming and conferencing video, etc. Some very early research
works in sign language processing show that recent breakthroughs in machine learning, computer vision
and natural language processing can also be applied to sign languages [9, 10, 11]. Advances in sign language
research will eventually help to bring these services for non-sign language users to the Deaf community. In
addition, computers may perform sign language interpretation when a human interpreter is not available.
It is known that deep neural network techniques require very large datasets. Lack of public-domain large
sign language data has been recognized as the primary barrier against advancing sign language processing
research. Efforts to provide public data for ASL understanding have yielded two datasets up to 80 hours of
video clips. These datasets are large enough to enable meaningful early deep learning research on sign
languages but are far too small to lead to any solution that can be practically deployed. So far, there is still
no suitable dataset for ASL production.
In this paper, we introduce the SDW-ASL system that facilitates the generation of large scale ASL dataset
and the release of the first generation SDW-ASL dataset to the public.
2. Background and Related Work
2.1. Methods to Record Sign Language
Sign language is a multimodal visual language. One way to record a sign language is through glossing.
Gloss is a written approximation of another language. It is possible to gloss American Sign Language (ASL)
using English with additional symbols. However, the gloss annotation requires human ASL linguists. The
process is limited, time consuming and expensive. For example, How2Sign dataset includes gloss