Inner speech recognition through electroencephalographic signals Francesca Gasparini120000000262796660 Elisa Cazzaniga1 and Aurora

2025-04-27 1 0 1.05MB 14 页 10玖币

侵权投诉

Inner speech recognition through

electroencephalographic signals

Francesca Gasparini1,2[0000−0002−6279−6660], Elisa Cazzaniga1, and Aurora

Saibene1,2[0000−0002−4405−8234]

1University of Milano-Bicocca, Viale Sarca 336, 20126, Milano, Italy

aurora.saibene@unimib.it, e.cazzaniga@campus.unimib.it,

francesca.gasparini@unimib.it

2NeuroMI, Milan Center for Neuroscience, Piazza dell’Ateneo Nuovo 1, 20126,

Milano, Italy

Abstract. This work focuses on inner speech recognition starting from

EEG signals. Inner speech recognition is deﬁned as the internalized pro-

cess in which the person thinks in pure meanings, generally associated

with an auditory imagery of own inner “voice”. The decoding of the EEG

into text should be understood as the classiﬁcation of a limited num-

ber of words (commands) or the presence of phonemes (units of sound

that make up words). Speech-related BCIs provide eﬀective vocal com-

munication strategies for controlling devices through speech commands

interpreted from brain signals, improving the quality of life of people

who have lost the capability to speak, by restoring communication with

their environment. Two public inner speech datasets are analysed. Us-

ing this data, some classiﬁcation models are studied and implemented

starting from basic methods such as Support Vector Machines, to en-

semble methods such as the eXtreme Gradient Boosting classiﬁer up to

the use of neural networks such as Long Short Term Memory (LSTM)

and Bidirectional Long Short Term Memory (BiLSTM). With the LSTM

and BiLSTM models, generally not used in the literature of inner speech

recognition, results in line with or superior to those present in the state-

of-the-art are obtained.

Keywords: EEG ·inner speech recognition ·BCI

1 Introduction

Human speech production is a complex motor process that starts in the brain and

ends with respiratory, laryngeal, and articulatory gestures for creating acoustic

signals of verbal communication. Physiological measurements using specialized

sensors and methods can be made at each level of speech processing, including

the central and peripheral nervous systems, muscular action potentials, speech

kinematics (tongue, lips, jaw), and sound pressure [25]. However, there are cases

of subjects suﬀering from neurodegenerative diseases or motor disorders that

prevent the normal activity of signal transmission from the brain to the periph-

eral areas. These subjects are prevented from communicating or carrying out

certain actions.

arXiv:2210.06472v1 [cs.HC] 11 Oct 2022

2 F. Gasparini et al.

Brain Computer Interfaces (BCIs) are promising technologies for improving

the quality of life of people who have lost the capability to move or speak, by

restoring communication with their environment. A BCI is a system that makes

possible the interaction between an individual and a computer without using the

brain’s normal output pathways of peripheral nerves and muscles. In particular,

speech-related BCI technologies provide neuro-prosthetic help for people with

speaking disabilities, neuro-muscular disorders and diseases. It can equip these

users with a medium to communicate and express their thoughts, thereby im-

proving the quality of rehabilitation and clinical neurology [23]. Speech-related

paradigms, based on either silent, imagined or inner speech provide a more nat-

ural way for controlling external devices [20].

There are diﬀerent types of brain-signal recording techniques that are mainly

divided into invasive or non-invasive methods. The ﬁrst ones involve implant-

ing electrodes directly into the brain. They provide better spatial and temporal

resolution, also increasing the quality of the signal obtained. However, invasive

technologies have problems related to usability and the need for surgical inter-

vention on the subject. This is why non-invasive techniques are increasingly used

in BCI research. Among the non-invasive technologies, the electroencephalogram

(EEG) is the most used method for measuring the electrical activity of the brain

from the human scalp. It has an exceedingly high time resolution, it is simple to

record and it is suﬃciently inexpensive [13]. Over the years, EEG hardware tech-

nology has evolved and several wireless multichannel systems have emerged that

deliver high quality EEG and physiological signals in a simpler, more convenient

and comfortable design than the traditional, cumbersome systems.

This paper focuses on inner speech recognition starting from EEG signals,

where the basic deﬁnition of inner speech is [1] ”the subjective experience of

language in the absence of overt and audible articulation”.

As suggested in [3], there is evidence from past neuroscience research that

inner speech engages brain regions that are commonly associated with language

comprehension and production [2]. This includes temporal, frontal and sensori-

motor areas predominantly in the left hemisphere of the brain [2] [4]. Therefore,

by monitoring these brain areas, it is theoretically possible to develop an inner

speech BCI that classiﬁes neural representations of imagined word [4].

In section 2 the studies in the ﬁeld of inner speech are described. Section 3

presents the two publicly available datasets used for the analyses proposed in

section 4. In section 5 the results obtained with our models are presented and

discussed. Finally, in section 6 some conclusions are proposed.

2 Related works

Most studies on classiﬁcation of inner speech focus on invasive methods such as

electrocorticography (ECoG) [17] as they provide higher spatial resolution while

fewer studies concerning inner speech classiﬁcation using EEG data are available

[21]. It is important for a BCI application to be non-invasive, accessible and easy

to implement so that it can be used by a large number of subjects.

Inner speech recognition through electroencephalographic signals 3

Inner speech recognition is generally faced considering phonemes, in general

vowels or syllables, such as /ba/ or /ku/, or simple words such as left, right, up

and down, in subject-dependent approaches.

Preliminary works were conducted with very few participants and syllables

by D’Zmura et al. [11], where EEG waveform envelopes have been adopted to

recognize EEG patterns. Also Brigham and Kumar [5] and Deng et al. [10] con-

sidered the recognition of two syllables. In the ﬁrst work, the accuracy obtained

for the 7 subjects ranges from 46% to 88%. They preprocessed raw EEG data to

reduce the eﬀects of artifacts and noise, and applied k-Nearest Neighbor classiﬁer

to autoregressive coeﬃcients extracted as features. While Deng end colleagues

using Hilbert spectra and linear discriminant analysis recognized the two syl-

lables imagined in three diﬀerent rhythms, for a 6 classes task, with accuracy

ranging from 19% to 22%.

Considering works where recognition of phonemes have been investigated,

Da Salla et.al [9] and [8], analyzed the recognition of three tasks: /a/, /u/, and

rest, obtaining from 68% to 79% of accuracy by using common spatial patterns.

On the same dataset, several other researchers have been tested diﬀerent models

obtaining promising results [14], [22].

Kim et al. [15], instead, consider three vowels /a/, /i/ and /u/ and applied

multivariate empirical mode decomposition and common spatial pattern for fea-

ture extraction together with linear discriminant analysis, reaching around 70 %

of accuracy.

Few representative studies that try to recognize imagined words using EEG

data are reported in the literature. Given the complexity of the task, the number

of terms considered is generally limited.

Suppes et al. [26] proposed an experiment in which ﬁve subjects performed the

internal speech considering the following words: ﬁrst, second, third, yes, no, right,

and left for all subjects with the addition of to, too and hear for the last three

subjects.

In the work performed by Wang at al. [27], eight Chinese subjects were required

to read in mind two Chinese characters (that meant left and one). They were

able to distinguish between the two characters and the rest state. Feature vectors

of EEG signals were extracted using CSP, and then these vectors were classiﬁed

with SVM. Accuracies between 73.65% and 95.76% were obtained when com-

paring between each of the imagined words and the rest state. A mean accuracy

of 82.3% was achieved between the two words themselves.

Salama et al. [24] implemented diﬀerent types of classiﬁers such as SVM, dis-

criminant analysis, self-organizing map, feed-forward back-propagation and a

combination of them, to recognize two words (Yes and No). They used a single

electrode EEG device to collect data from seven subjects and the accuracy ob-

tained ranges from 57% to 59%.

In [18], Mohanchandra at al. constructed a one-against-all multiclass SVM classi-

ﬁer to discriminate ﬁve subvocalized words (water, help, thanks, food, and stop)

and reported an accuracy ranging from 60% to 92%.

In the Gonz´alez-Casta˜neda at al. [12] analyses, some techniques of soniﬁcation

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

InnerspeechrecognitionthroughelectroencephalographicsignalsFrancescaGasparini1;2[0000000262796660],ElisaCazzaniga1,andAuroraSaibene1;2[0000000244058234]1UniversityofMilano-Bicocca,VialeSarca336,20126,Milano,Italyaurora.saibene@unimib.it,e.cazzaniga@campus.unimib.it,francesca.gasparini@unimib.it2Neur...

展开>> 收起<<

Inner speech recognition through electroencephalographic signals Francesca Gasparini120000000262796660 Elisa Cazzaniga1 and Aurora.pdf

共14页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Inner speech recognition through electroencephalographic signals Francesca Gasparini120000000262796660 Elisa Cazzaniga1 and Aurora

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: