On the Generalizability of ECG-based Stress Detection Models Pooja Prajod

2025-04-27 0 0 231.16KB 6 页 10玖币
侵权投诉
On the Generalizability of ECG-based Stress
Detection Models
Pooja Prajod
Human-Centered Artificial Intelligence
University of Augsburg
Augsburg, Germany
pooja.prajod@uni-a.de
Elisabeth Andr´
e
Human-Centered Artificial Intelligence
University of Augsburg
Augsburg, Germany
elisabeth.andre@uni-a.de
©2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including
reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or
reuse of any copyrighted component of this work in other works. DOI: 10.1109/ICMLA55696.2022.00090
Abstract—Stress is prevalent in many aspects of everyday life
including work, healthcare, and social interactions. Many works
have studied handcrafted features from various bio-signals that
are indicators of stress. Recently, deep learning models have also
been proposed to detect stress. Typically, stress models are trained
and validated on the same dataset, often involving one stressful
scenario. However, it is not practical to collect stress data for
every scenario. So, it is crucial to study the generalizability of
these models and determine to what extent they can be used
in other scenarios. In this paper, we explore the generalization
capabilities of Electrocardiogram (ECG)-based deep learning
models and models based on handcrafted ECG features, i.e.,
Heart Rate Variability (HRV) features. To this end, we train three
HRV models and two deep learning models that use ECG signals
as input. We use ECG signals from two popular stress datasets -
WESAD and SWELL-KW - differing in terms of stressors and
recording devices. First, we evaluate the models using leave-one-
subject-out (LOSO) cross-validation using training and validation
samples from the same dataset. Next, we perform a cross-dataset
validation of the models, that is, LOSO models trained on the
WESAD dataset are validated using SWELL-KW samples and
vice versa. While deep learning models achieve the best results
on the same dataset, models based on HRV features considerably
outperform them on data from a different dataset. This trend is
observed for all the models on both datasets. Therefore, HRV
models are a better choice for stress recognition in applications
that are different from the dataset scenario. To the best of our
knowledge, this is the first work to compare the cross-dataset
generalizability between ECG-based deep learning models and
HRV models.
Index Terms—Stress, Deep learning, Convolutional neural
networks, Recurrent neural networks, Machine learning, Support
vector machines, Physiology, Heart rate variability, Electrocar-
diography
I. INTRODUCTION
Stress recognition research has become an important part
of affective computing, especially in applications involving
human-computer interaction [1]. Long-term stress has severe
consequences and hence, there is a need for automatic stress
recognition to detect stress early [1], [2]. Stress stimuli or
stressors trigger physiological responses in people which can
be detected through different bio-signals such as Electrocar-
diogram (ECG) and Electrodermal Activity (EDA) [1]–[3].
This work is partially funded by the European Union’s Horizon 2020 re-
search and innovation programme under grant agreement No 847926 MindBot
So, stress recognition research is further facilitated by the in-
creasing popularity of wearable sensors that can unobtrusively
collect real-time bio-signal data [4], [5].
ECG is one of the most common bio-signal used in stress
and affect recognition [5], [6]. There are two popular ap-
proaches to detect stress from ECG - models based on hand-
crafted Heart Rate Variability (HRV) features [1], [2], [4],
[7] and deep learning models [5], [6], [8]. HRV features and
their relationship with stress have been studied thoroughly [9],
[10]. They have also been validated as indicators of stress in
different stressful conditions [1], [4], [11]. However, cleaning
the ECG signal and computing the HRV features often require
specific domain knowledge [5]. This paved the way for deep
learning models, which typically have convolution layers for
automatic feature extraction.
We say an ECG-based stress recognition model has good
generalization capability if it performs well on samples col-
lected using different sensor devices under different stress
conditions. It is crucial to evaluate the generalizability of a
model as it is not possible to collect stress data and train
specialized models in every scenario. In some cases, the
models have to be trained on an available dataset and deployed
in a scenario different from the training dataset. For example,
a neuro-rehabilitation use-case described in [12] employs
an agent which adapts exercises by taking into account the
stress level of the patient. Due to ethical considerations, it is
difficult to collect a dataset by stressing the patients during a
rehabilitation session. Another example to consider is stress
recognition for special groups of people, like people with
autism spectrum disorder (ASD), dementia, etc. Often, there is
a lack of stress datasets that includes data collected from these
groups of people. Moreover, there could be differences in the
intensity or the characteristics of stress responses of the people
belonging to these groups. For instance, one of the datasets
we consider in this study is the WESAD dataset [1], which
uses social evaluation as a stressor. But, in [13], the authors
found that children with ASD had blunted physiological stress
response to social evaluation stressor.
In this work, we investigate if the models trained on one
stress dataset can detect stress in another dataset. Specifically,
among ECG deep learning models and HRV models, we
determine if one group outperforms the other in detecting
arXiv:2210.06225v2 [cs.LG] 31 Jan 2024
stress samples from another scenario.
II. RELATED WORK
Due to the health consequences of stress, there is extensive
research on stress recognition. It is beyond the scope of this
work to summarize the numerous works that improve stress
recognition. So, we focus on works that compare various
models or stress datasets to gain insights into trends pertaining
to their performance.
There are multiple feature-based models proposed for stress
recognition in various works. Bobade and Vani [2] compare
the stress recognition performance of various machine learning
models trained on hand-crafted features from various physi-
ological signals. They use the WESAD dataset [1] to train
K-Nearest Neighbour (KNN), Linear Discriminant Analysis
(LDA), Random Forest Classifier (RFC), Support Vector Ma-
chine (SVM), etc. They also propose a simple feed-forward
Artificial Neural Network (ANN) trained on the same input.
Their comparison shows that ANN achieves higher accuracy
than other models.
As mentioned before, there are two main types of stress
recognition models - deep neural networks and feature-
based machine learning models. Naturally, questions arise on
whether one type is better than the other. Zhang et al. [11]
address this question by studying the performance of a deep
neural network and feature-based models on a dataset they
collected. They propose a stress recognition model consisting
of both convolutional neural networks (CNN) and bidirectional
long short-term memory (BiLSTM). For comparison, they
extract HRV features and train popular machine learning
models like SVMs, RFC, Ada Boost, etc. The CNN-LSTM
model takes 10 sof raw ECG signal, whereas the other
machine learning models use HRV features extracted from
60 sof ECG data. Zhang et al. demonstrate that deep neural
networks significantly outperform HRV-based models.
Dzie˙
zyc et al. [14] compare various deep learning models
on their performance in emotion recognition tasks (including
stressful condition). An extensive study is performed on four
different datasets, separately. They chose an input signal length
of 50 60 s, which is longer than the typical input length for
deep learning models. They note that CNN-based models tend
to perform better than LSTM-based models.
All the above works train and test the stress recognition
models on the same dataset. Cho et al. [15] consider two
datasets differing in size and train ECG-based deep learning
models to detect stress. They propose a transfer learning ap-
proach, which involves training a model on the bigger dataset
and then fine-tuning it on the smaller dataset. They observe
that the stress recognition on the smaller dataset improves
through transfer learning. Other than the size, the datasets were
very similar (e.g. same ECG sensor and configuration). The
authors note that when data from other datasets are used, their
model shows high bias to the type of stressor and a dependency
on the sensor used. In line with this observation, Liapis et
al. [16] demonstrate that a high stress recognition accuracy on
one dataset does not necessarily translate to high accuracy in
another dataset. To this end, they extract Skin Conductance
(SC) features from the WESAD dataset [1] and train four
machine learning models for stress recognition. These models
achieve high accuracy while testing on the WESAD dataset.
However, they did not achieve good results on input signals
from a different dataset (UX evaluation dataset). Since the UX
evaluation dataset is annotated primarily for emotion and not
stress, it is difficult to conclude about the generalizability of
the models. Nevertheless, their observation highlights the need
for cross-dataset evaluations and assessing the generalizability
of the stress recognition models.
As a first step towards combining stress datasets for devel-
oping generic models, Baird et al. [17] evaluate three datasets
on their ability to predict cortisol values. Cortisol values are
considered the ground truth for stress response. As they note,
the scales of cortisol values of the datasets are incompatible
and thus, a cross-dataset evaluation is not feasible. However,
all three datasets were collected through similar Trier Social
Stress Test (TSST) procedures. So, the responses in each
condition of the test are expected to be similar and therefore,
the trends in predicted cortisol values can be compared. To
this end, they extract features from the speech signals in the
datasets and train models for each dataset. They highlight
the feasibility of using speech signals from one dataset as
predictors of stress in another dataset.
III. APPROACH
Deep learning models trained directly on the ECG signal
typically outperform hand-crafted HRV features on a given
dataset [11]. However, it remains unexplored if these deep
learning models perform equally well in cross-dataset evalua-
tions. To investigate this, we train 5stress recognition models
- two deep learning models using ECG signals as input, and
three models based on hand-crafted HRV features. First, we
train and evaluate the stress models on the same dataset using
leave-one-subject-out (LOSO) cross-validation. We perform
this evaluation on two different datasets. Then, we evaluate
the LOSO models trained on dataset A using samples from
the other dataset B (cross-dataset evaluation) to assess their
generalization capabilities. Baird et al. [17] note that machine
learning models can benefit from combining stress datasets
as it increases the data available for training. It has not been
investigated if this holds true if the datasets are vastly different,
especially in terms of the stressors, the intensity of stress
experienced, and the brand of sensors used. So additionally, we
train the models on a combined dataset (merging samples from
the two datasets) and evaluate them using LOSO validation.
A. Datasets
1) WESAD: WESAD [1] is a multimodal dataset that
contains motion (ACC) and physiological (ECG, EDA, etc.)
signals, which were collected using chest-worn RespiBan and
wrist-worn Empatica E4 devices. The data was collected from
15 participants under three conditions: baseline, stress, and
amusement. Stress was elicited using the Trier Social Stress
Test (TSST) involving public speaking and mental arithmetic
摘要:

OntheGeneralizabilityofECG-basedStressDetectionModelsPoojaPrajodHuman-CenteredArtificialIntelligenceUniversityofAugsburgAugsburg,Germanypooja.prajod@uni-a.deElisabethAndr´eHuman-CenteredArtificialIntelligenceUniversityofAugsburgAugsburg,Germanyelisabeth.andre@uni-a.de©2022IEEE.Personaluseofthismater...

展开>> 收起<<
On the Generalizability of ECG-based Stress Detection Models Pooja Prajod.pdf

共6页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:6 页 大小:231.16KB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 6
客服
关注