Inner speech recognition through electroencephalographic signals Francesca Gasparini120000000262796660 Elisa Cazzaniga1 and Aurora

2025-04-27 0 0 1.05MB 14 页 10玖币
侵权投诉
Inner speech recognition through
electroencephalographic signals
Francesca Gasparini1,2[0000000262796660], Elisa Cazzaniga1, and Aurora
Saibene1,2[0000000244058234]
1University of Milano-Bicocca, Viale Sarca 336, 20126, Milano, Italy
aurora.saibene@unimib.it, e.cazzaniga@campus.unimib.it,
francesca.gasparini@unimib.it
2NeuroMI, Milan Center for Neuroscience, Piazza dell’Ateneo Nuovo 1, 20126,
Milano, Italy
Abstract. This work focuses on inner speech recognition starting from
EEG signals. Inner speech recognition is defined as the internalized pro-
cess in which the person thinks in pure meanings, generally associated
with an auditory imagery of own inner “voice”. The decoding of the EEG
into text should be understood as the classification of a limited num-
ber of words (commands) or the presence of phonemes (units of sound
that make up words). Speech-related BCIs provide effective vocal com-
munication strategies for controlling devices through speech commands
interpreted from brain signals, improving the quality of life of people
who have lost the capability to speak, by restoring communication with
their environment. Two public inner speech datasets are analysed. Us-
ing this data, some classification models are studied and implemented
starting from basic methods such as Support Vector Machines, to en-
semble methods such as the eXtreme Gradient Boosting classifier up to
the use of neural networks such as Long Short Term Memory (LSTM)
and Bidirectional Long Short Term Memory (BiLSTM). With the LSTM
and BiLSTM models, generally not used in the literature of inner speech
recognition, results in line with or superior to those present in the state-
of-the-art are obtained.
Keywords: EEG ·inner speech recognition ·BCI
1 Introduction
Human speech production is a complex motor process that starts in the brain and
ends with respiratory, laryngeal, and articulatory gestures for creating acoustic
signals of verbal communication. Physiological measurements using specialized
sensors and methods can be made at each level of speech processing, including
the central and peripheral nervous systems, muscular action potentials, speech
kinematics (tongue, lips, jaw), and sound pressure [25]. However, there are cases
of subjects suffering from neurodegenerative diseases or motor disorders that
prevent the normal activity of signal transmission from the brain to the periph-
eral areas. These subjects are prevented from communicating or carrying out
certain actions.
arXiv:2210.06472v1 [cs.HC] 11 Oct 2022
2 F. Gasparini et al.
Brain Computer Interfaces (BCIs) are promising technologies for improving
the quality of life of people who have lost the capability to move or speak, by
restoring communication with their environment. A BCI is a system that makes
possible the interaction between an individual and a computer without using the
brain’s normal output pathways of peripheral nerves and muscles. In particular,
speech-related BCI technologies provide neuro-prosthetic help for people with
speaking disabilities, neuro-muscular disorders and diseases. It can equip these
users with a medium to communicate and express their thoughts, thereby im-
proving the quality of rehabilitation and clinical neurology [23]. Speech-related
paradigms, based on either silent, imagined or inner speech provide a more nat-
ural way for controlling external devices [20].
There are different types of brain-signal recording techniques that are mainly
divided into invasive or non-invasive methods. The first ones involve implant-
ing electrodes directly into the brain. They provide better spatial and temporal
resolution, also increasing the quality of the signal obtained. However, invasive
technologies have problems related to usability and the need for surgical inter-
vention on the subject. This is why non-invasive techniques are increasingly used
in BCI research. Among the non-invasive technologies, the electroencephalogram
(EEG) is the most used method for measuring the electrical activity of the brain
from the human scalp. It has an exceedingly high time resolution, it is simple to
record and it is sufficiently inexpensive [13]. Over the years, EEG hardware tech-
nology has evolved and several wireless multichannel systems have emerged that
deliver high quality EEG and physiological signals in a simpler, more convenient
and comfortable design than the traditional, cumbersome systems.
This paper focuses on inner speech recognition starting from EEG signals,
where the basic definition of inner speech is [1] ”the subjective experience of
language in the absence of overt and audible articulation”.
As suggested in [3], there is evidence from past neuroscience research that
inner speech engages brain regions that are commonly associated with language
comprehension and production [2]. This includes temporal, frontal and sensori-
motor areas predominantly in the left hemisphere of the brain [2] [4]. Therefore,
by monitoring these brain areas, it is theoretically possible to develop an inner
speech BCI that classifies neural representations of imagined word [4].
In section 2 the studies in the field of inner speech are described. Section 3
presents the two publicly available datasets used for the analyses proposed in
section 4. In section 5 the results obtained with our models are presented and
discussed. Finally, in section 6 some conclusions are proposed.
2 Related works
Most studies on classification of inner speech focus on invasive methods such as
electrocorticography (ECoG) [17] as they provide higher spatial resolution while
fewer studies concerning inner speech classification using EEG data are available
[21]. It is important for a BCI application to be non-invasive, accessible and easy
to implement so that it can be used by a large number of subjects.
Inner speech recognition through electroencephalographic signals 3
Inner speech recognition is generally faced considering phonemes, in general
vowels or syllables, such as /ba/ or /ku/, or simple words such as left, right, up
and down, in subject-dependent approaches.
Preliminary works were conducted with very few participants and syllables
by D’Zmura et al. [11], where EEG waveform envelopes have been adopted to
recognize EEG patterns. Also Brigham and Kumar [5] and Deng et al. [10] con-
sidered the recognition of two syllables. In the first work, the accuracy obtained
for the 7 subjects ranges from 46% to 88%. They preprocessed raw EEG data to
reduce the effects of artifacts and noise, and applied k-Nearest Neighbor classifier
to autoregressive coefficients extracted as features. While Deng end colleagues
using Hilbert spectra and linear discriminant analysis recognized the two syl-
lables imagined in three different rhythms, for a 6 classes task, with accuracy
ranging from 19% to 22%.
Considering works where recognition of phonemes have been investigated,
Da Salla et.al [9] and [8], analyzed the recognition of three tasks: /a/, /u/, and
rest, obtaining from 68% to 79% of accuracy by using common spatial patterns.
On the same dataset, several other researchers have been tested different models
obtaining promising results [14], [22].
Kim et al. [15], instead, consider three vowels /a/, /i/ and /u/ and applied
multivariate empirical mode decomposition and common spatial pattern for fea-
ture extraction together with linear discriminant analysis, reaching around 70 %
of accuracy.
Few representative studies that try to recognize imagined words using EEG
data are reported in the literature. Given the complexity of the task, the number
of terms considered is generally limited.
Suppes et al. [26] proposed an experiment in which five subjects performed the
internal speech considering the following words: first, second, third, yes, no, right,
and left for all subjects with the addition of to, too and hear for the last three
subjects.
In the work performed by Wang at al. [27], eight Chinese subjects were required
to read in mind two Chinese characters (that meant left and one). They were
able to distinguish between the two characters and the rest state. Feature vectors
of EEG signals were extracted using CSP, and then these vectors were classified
with SVM. Accuracies between 73.65% and 95.76% were obtained when com-
paring between each of the imagined words and the rest state. A mean accuracy
of 82.3% was achieved between the two words themselves.
Salama et al. [24] implemented different types of classifiers such as SVM, dis-
criminant analysis, self-organizing map, feed-forward back-propagation and a
combination of them, to recognize two words (Yes and No). They used a single
electrode EEG device to collect data from seven subjects and the accuracy ob-
tained ranges from 57% to 59%.
In [18], Mohanchandra at al. constructed a one-against-all multiclass SVM classi-
fier to discriminate five subvocalized words (water, help, thanks, food, and stop)
and reported an accuracy ranging from 60% to 92%.
In the Gonz´alez-Casta˜neda at al. [12] analyses, some techniques of sonification
摘要:

InnerspeechrecognitionthroughelectroencephalographicsignalsFrancescaGasparini1;2[0000000262796660],ElisaCazzaniga1,andAuroraSaibene1;2[0000000244058234]1UniversityofMilano-Bicocca,VialeSarca336,20126,Milano,Italyaurora.saibene@unimib.it,e.cazzaniga@campus.unimib.it,francesca.gasparini@unimib.it2Neur...

展开>> 收起<<
Inner speech recognition through electroencephalographic signals Francesca Gasparini120000000262796660 Elisa Cazzaniga1 and Aurora.pdf

共14页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:14 页 大小:1.05MB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 14
客服
关注