AUTOMATIC SEVERITY CLASSIFICATION OF DYSARTHRIC SPEECH BY USING SELF-SUPERVISED MODEL WITH MULTI-TASK LEARNING Eun Jung Yeo1 Kwanghee Choi2 Sunhee Kim3 Minhwa Chung1

2025-05-02 0 0 435.58KB 5 页 10玖币

侵权投诉

AUTOMATIC SEVERITY CLASSIFICATION OF DYSARTHRIC SPEECH

BY USING SELF-SUPERVISED MODEL WITH MULTI-TASK LEARNING

Eun Jung Yeo1∗, Kwanghee Choi2∗, Sunhee Kim3, Minhwa Chung1

Department of Linguistics, Seoul National University, Republic of Korea1

Department of Computer Science and Engineering, Sogang University, Republic of Korea2

Department of French Language Education, Seoul National University, Republic of Korea3

ABSTRACT

Automatic assessment of dysarthric speech is essential for

sustained treatments and rehabilitation. However, obtaining

atypical speech is challenging, often leading to data scarcity

issues. To tackle the problem, we propose a novel automatic

severity assessment method for dysarthric speech, using the

self-supervised model in conjunction with multi-task learn-

ing. Wav2vec 2.0 XLS-R is jointly trained for two different

tasks: severity classiﬁcation and auxiliary automatic speech

recognition (ASR). For the baseline experiments, we employ

hand-crafted acoustic features and machine learning classi-

ﬁers such as SVM, MLP, and XGBoost. Explored on the

Korean dysarthric speech QoLT database, our model out-

performs the traditional baseline methods, with a relative

percentage increase of 1.25% for F1-score. In addition, the

proposed model surpasses the model trained without ASR

head, achieving 10.61% relative percentage improvements.

Furthermore, we present how multi-task learning affects the

severity classiﬁcation performance by analyzing the latent

representations and regularization effect.

Index Terms—dysarthric speech, automatic assessment,

self-supervised learning, multi-task learning

1. INTRODUCTION

Dysarthria is a group of motor speech disorders resulting

from neuromuscular control disturbances, which affects di-

verse speech dimensions such as respiration, phonation, res-

onance, articulation, and prosody [1]. Accordingly, people

with dysarthria often suffer from degraded speech intelligi-

bility, repeated communication failures, and, consequently,

poor quality of life. Hence, accurate and reliable speech as-

sessment is essential in the clinical ﬁeld, as it helps track the

condition of patients and the effectiveness of treatments.

The most common way of assessing severity levels of

dysarthria is by conducting standardized tests such as Fren-

chay Dysarthria Assessment (FDA) [2]. However, these tests

heavily rely on human perceptual evaluations, which can be

∗Equal contributors.

subjective and laborious. Therefore, automatic assessments

that are highly consistent with the experts will have great po-

tential for assisting clinicians in diagnosis and therapy.

Research on automatic assessment of dysarthria can be

grouped into two approaches. The ﬁrst is to investigate a

novel feature set. For instance, paralinguistic features such

as eGeMAPS were explored on their usability for atypical

speech analysis [3]. On the other hand, common symptoms

of dysarthric speech provided insights into new feature sets -

glottal [4], resonance [5], pronunciation [6, 7], and prosody

features [8, 9]. Furthermore, representations extracted from

deep neural networks were also examined, such as spectro-

temporal subspace [10], i-vectors [11], and deepspeech pos-

teriors [12]. While this approach can provide intuitive de-

scriptions of the acoustic cues used in assessments, it has the

drawback of losing the information that may be valuable to

the task.

The second approach is to explore the network architec-

tures which take raw waveforms as input. Networks include

but are not limited to distance-based neural networks [13],

LSTM-based models [14, 15] and CNN-RNN hybrid models

[16, 17]. As neural networks are often data-hungry, many

researchers suffer from the data scarcity of atypical speech.

Consequently, research has often been limited to dysarthria

detection, which is a binary classiﬁcation task. However,

multi-class classiﬁcation should also be considered for more

detailed diagnoses. Recently, self-supervised representation

learning has arisen to alleviate such problems, presenting

successes in various downstream tasks with a small amount

of data [18, 19]. Promising results were also reported for dif-

ferent tasks for atypical speech, including automatic speech

recognition (ASR) [20, 21] and assessments [22, 23, 24].

However, limited explorations were performed on the sever-

ity assessment of dysarthric speech.

This paper proposes a novel automatic severity classiﬁ-

cation method for dysarthric speech using a self-supervised

learning model ﬁne-tuned with multi-task learning (MTL).

The model handles 1) a ﬁve-way multi-class classiﬁcation of

dysarthria severity levels as the main task and 2) automatic

speech recognition as the auxiliary task. We expect MTL to

arXiv:2210.15387v3 [cs.CL] 28 Apr 2023

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

AUTOMATICSEVERITYCLASSIFICATIONOFDYSARTHRICSPEECHBYUSINGSELF-SUPERVISEDMODELWITHMULTI-TASKLEARNINGEunJungYeo1,KwangheeChoi2,SunheeKim3,MinhwaChung1DepartmentofLinguistics,SeoulNationalUniversity,RepublicofKorea1DepartmentofComputerScienceandEngineering,SogangUniversity,RepublicofKorea2Departmentof...

展开>> 收起<<

AUTOMATIC SEVERITY CLASSIFICATION OF DYSARTHRIC SPEECH BY USING SELF-SUPERVISED MODEL WITH MULTI-TASK LEARNING Eun Jung Yeo1 Kwanghee Choi2 Sunhee Kim3 Minhwa Chung1.pdf

共5页,预览1页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

AUTOMATIC SEVERITY CLASSIFICATION OF DYSARTHRIC SPEECH BY USING SELF-SUPERVISED MODEL WITH MULTI-TASK LEARNING Eun Jung Yeo1 Kwanghee Choi2 Sunhee Kim3 Minhwa Chung1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: