Combining Efficient and Precise Sign Language Recognition Good pose estimation library is all you need Maty aˇs Boh aˇcek12 Zhuo Cao34and Marek Hr uz1

2025-04-27 0 0 1.42MB 5 页 10玖币
侵权投诉
Combining Efficient and Precise Sign Language Recognition:
Good pose estimation library is all you need
Maty´
aˇ
s Boh´
aˇ
cek1,2, Zhuo Cao3,4 and Marek Hr´
uz1
1University of West Bohemia, Pilsen, Czech Republic
2Gymnasium of Johannes Kepler, Prague, Czech Republic
3KU Leuven, Leuven, Belgium
4ML6, Ghent, Belgium
The authors can be contacted at matyas.bohacek@matsworld.io.
Abstract
Notice: This extended abstract was presented at the
CVPR 2022 AVA workshop1in New Orleans, USA.
Sign language recognition could significantly improve
the user experience for d/Deaf people with the general con-
sumer technology we use daily, such as IoT devices or
videoconferencing. However, current sign language recog-
nition architectures are usually computationally heavy and
require robust GPU-equipped hardware to run in real-time.
Some models aim for lower-end devices (such as smart-
phones) by minimizing their size and complexity, which
leads to worse accuracy. This highly scrutinizes accurate
in-the-wild applications. We build upon the SPOTER ar-
chitecture, which belongs to the latter group of light meth-
ods, as it came close to the performance of large mod-
els employed for this task. By substituting its original
third-party pose estimation module with the MediaPipe li-
brary, we achieve an overall state-of-the-art result on the
WLASL100 dataset. Significantly, our method beats previ-
ous larger architectures while still being twice as compu-
tationally efficient and almost 11 times faster on inference
when compared to a relevant benchmark. To demonstrate
our method’s combined efficiency and precision, we built
an online demo that enables users to translate sign lem-
mas of American sign language in their browsers. This is
the first publicly available online application demonstrat-
ing this task to the best of our knowledge.
1https://accessibility-cv.github.io/
1. Introduction
Sign languages (SLs) are the primary means of com-
munication for the d/Deaf communities. They are a form
of natural language systems based on manual articulations
and non-manual components. They utilize a significantly
more variable and complex modality despite enabling one
to convey identical semantics as the spoken and written lan-
guage. With over 70 million people considering one of the
approximately 300 SLs as their native language, computa-
tional methods that would cross the bridge between written
or spoken languages and SLs have been subjects of exten-
sive study in the literature since the 1990s. Two prevalent
topics concerning SLs have emerged: SL synthesis and SL
recognition (SLR). Regardless of the time that has passed,
these tasks are far from being solved.
In this work, we address SLR, whose objective is to
translate videos of performed signs from a known set into a
written form. It can be divided into isolated SLR, where
only single lemmas are translated, and continuous SLR,
translating unconstrained signing utterances. We attend the
first of these streams: isolated SLR.
We identified that a critical problem of current light-
weight SLR architectures aimed for applications in the wild
on standard consumer devices (e.g., smartphones) is that
they perform markedly worse compared to their heavier
counterparts. We hence focus on boosting their accuracy
without adding more computational demand. For this pur-
pose, we build upon the SPOTER architecture [5], which
came close to current heavy architectures’ performance at a
notably smaller size and computational requirements. Bo-
hacek et al. use a third-party pose estimation library in their
architecture to represent the videos at the input with se-
arXiv:2210.00893v1 [cs.CV] 30 Sep 2022
摘要:

CombiningEfcientandPreciseSignLanguageRecognition:GoodposeestimationlibraryisallyouneedMaty´asBoh´acek1,2,ZhuoCao3,4andMarekHr´uz11UniversityofWestBohemia,Pilsen,CzechRepublic2GymnasiumofJohannesKepler,Prague,CzechRepublic3KULeuven,Leuven,Belgium4ML6,Ghent,BelgiumTheauthorscanbecontactedatmatyas....

展开>> 收起<<
Combining Efficient and Precise Sign Language Recognition Good pose estimation library is all you need Maty aˇs Boh aˇcek12 Zhuo Cao34and Marek Hr uz1.pdf

共5页,预览1页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:5 页 大小:1.42MB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 5
客服
关注