On the Generalizability of ECG-based Stress Detection Models Pooja Prajod

2025-04-27 0 0 231.16KB 6 页 10玖币

侵权投诉

On the Generalizability of ECG-based Stress

Detection Models

Pooja Prajod

Human-Centered Artiﬁcial Intelligence

University of Augsburg

Augsburg, Germany

pooja.prajod@uni-a.de

Elisabeth Andr´

Human-Centered Artiﬁcial Intelligence

University of Augsburg

Augsburg, Germany

elisabeth.andre@uni-a.de

reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or

reuse of any copyrighted component of this work in other works. DOI: 10.1109/ICMLA55696.2022.00090

Abstract—Stress is prevalent in many aspects of everyday life

including work, healthcare, and social interactions. Many works

have studied handcrafted features from various bio-signals that

are indicators of stress. Recently, deep learning models have also

been proposed to detect stress. Typically, stress models are trained

and validated on the same dataset, often involving one stressful

scenario. However, it is not practical to collect stress data for

every scenario. So, it is crucial to study the generalizability of

these models and determine to what extent they can be used

in other scenarios. In this paper, we explore the generalization

capabilities of Electrocardiogram (ECG)-based deep learning

models and models based on handcrafted ECG features, i.e.,

Heart Rate Variability (HRV) features. To this end, we train three

HRV models and two deep learning models that use ECG signals

as input. We use ECG signals from two popular stress datasets -

WESAD and SWELL-KW - differing in terms of stressors and

recording devices. First, we evaluate the models using leave-one-

subject-out (LOSO) cross-validation using training and validation

samples from the same dataset. Next, we perform a cross-dataset

validation of the models, that is, LOSO models trained on the

WESAD dataset are validated using SWELL-KW samples and

vice versa. While deep learning models achieve the best results

on the same dataset, models based on HRV features considerably

outperform them on data from a different dataset. This trend is

observed for all the models on both datasets. Therefore, HRV

models are a better choice for stress recognition in applications

that are different from the dataset scenario. To the best of our

knowledge, this is the ﬁrst work to compare the cross-dataset

generalizability between ECG-based deep learning models and

HRV models.

Index Terms—Stress, Deep learning, Convolutional neural

networks, Recurrent neural networks, Machine learning, Support

vector machines, Physiology, Heart rate variability, Electrocar-

diography

I. INTRODUCTION

Stress recognition research has become an important part

of affective computing, especially in applications involving

human-computer interaction [1]. Long-term stress has severe

consequences and hence, there is a need for automatic stress

recognition to detect stress early [1], [2]. Stress stimuli or

stressors trigger physiological responses in people which can

be detected through different bio-signals such as Electrocar-

diogram (ECG) and Electrodermal Activity (EDA) [1]–[3].

This work is partially funded by the European Union’s Horizon 2020 re-

search and innovation programme under grant agreement No 847926 MindBot

So, stress recognition research is further facilitated by the in-

creasing popularity of wearable sensors that can unobtrusively

collect real-time bio-signal data [4], [5].

ECG is one of the most common bio-signal used in stress

and affect recognition [5], [6]. There are two popular ap-

proaches to detect stress from ECG - models based on hand-

crafted Heart Rate Variability (HRV) features [1], [2], [4],

[7] and deep learning models [5], [6], [8]. HRV features and

their relationship with stress have been studied thoroughly [9],

[10]. They have also been validated as indicators of stress in

different stressful conditions [1], [4], [11]. However, cleaning

the ECG signal and computing the HRV features often require

speciﬁc domain knowledge [5]. This paved the way for deep

learning models, which typically have convolution layers for

automatic feature extraction.

We say an ECG-based stress recognition model has good

generalization capability if it performs well on samples col-

lected using different sensor devices under different stress

conditions. It is crucial to evaluate the generalizability of a

model as it is not possible to collect stress data and train

specialized models in every scenario. In some cases, the

models have to be trained on an available dataset and deployed

in a scenario different from the training dataset. For example,

a neuro-rehabilitation use-case described in [12] employs

an agent which adapts exercises by taking into account the

stress level of the patient. Due to ethical considerations, it is

difﬁcult to collect a dataset by stressing the patients during a

rehabilitation session. Another example to consider is stress

recognition for special groups of people, like people with

autism spectrum disorder (ASD), dementia, etc. Often, there is

a lack of stress datasets that includes data collected from these

groups of people. Moreover, there could be differences in the

intensity or the characteristics of stress responses of the people

belonging to these groups. For instance, one of the datasets

we consider in this study is the WESAD dataset [1], which

uses social evaluation as a stressor. But, in [13], the authors

found that children with ASD had blunted physiological stress

response to social evaluation stressor.

In this work, we investigate if the models trained on one

stress dataset can detect stress in another dataset. Speciﬁcally,

among ECG deep learning models and HRV models, we

determine if one group outperforms the other in detecting

arXiv:2210.06225v2 [cs.LG] 31 Jan 2024

stress samples from another scenario.

II. RELATED WORK

Due to the health consequences of stress, there is extensive

research on stress recognition. It is beyond the scope of this

work to summarize the numerous works that improve stress

recognition. So, we focus on works that compare various

models or stress datasets to gain insights into trends pertaining

to their performance.

There are multiple feature-based models proposed for stress

recognition in various works. Bobade and Vani [2] compare

the stress recognition performance of various machine learning

models trained on hand-crafted features from various physi-

ological signals. They use the WESAD dataset [1] to train

K-Nearest Neighbour (KNN), Linear Discriminant Analysis

(LDA), Random Forest Classiﬁer (RFC), Support Vector Ma-

chine (SVM), etc. They also propose a simple feed-forward

Artiﬁcial Neural Network (ANN) trained on the same input.

Their comparison shows that ANN achieves higher accuracy

than other models.

As mentioned before, there are two main types of stress

recognition models - deep neural networks and feature-

based machine learning models. Naturally, questions arise on

whether one type is better than the other. Zhang et al. [11]

address this question by studying the performance of a deep

neural network and feature-based models on a dataset they

collected. They propose a stress recognition model consisting

of both convolutional neural networks (CNN) and bidirectional

long short-term memory (BiLSTM). For comparison, they

extract HRV features and train popular machine learning

models like SVMs, RFC, Ada Boost, etc. The CNN-LSTM

model takes 10 sof raw ECG signal, whereas the other

machine learning models use HRV features extracted from

60 sof ECG data. Zhang et al. demonstrate that deep neural

networks signiﬁcantly outperform HRV-based models.

Dzie˙

zyc et al. [14] compare various deep learning models

on their performance in emotion recognition tasks (including

stressful condition). An extensive study is performed on four

different datasets, separately. They chose an input signal length

of 50 −60 s, which is longer than the typical input length for

deep learning models. They note that CNN-based models tend

to perform better than LSTM-based models.

All the above works train and test the stress recognition

models on the same dataset. Cho et al. [15] consider two

datasets differing in size and train ECG-based deep learning

models to detect stress. They propose a transfer learning ap-

proach, which involves training a model on the bigger dataset

and then ﬁne-tuning it on the smaller dataset. They observe

that the stress recognition on the smaller dataset improves

through transfer learning. Other than the size, the datasets were

very similar (e.g. same ECG sensor and conﬁguration). The

authors note that when data from other datasets are used, their

model shows high bias to the type of stressor and a dependency

on the sensor used. In line with this observation, Liapis et

al. [16] demonstrate that a high stress recognition accuracy on

one dataset does not necessarily translate to high accuracy in

another dataset. To this end, they extract Skin Conductance

(SC) features from the WESAD dataset [1] and train four

machine learning models for stress recognition. These models

achieve high accuracy while testing on the WESAD dataset.

However, they did not achieve good results on input signals

from a different dataset (UX evaluation dataset). Since the UX

evaluation dataset is annotated primarily for emotion and not

stress, it is difﬁcult to conclude about the generalizability of

the models. Nevertheless, their observation highlights the need

for cross-dataset evaluations and assessing the generalizability

of the stress recognition models.

As a ﬁrst step towards combining stress datasets for devel-

oping generic models, Baird et al. [17] evaluate three datasets

on their ability to predict cortisol values. Cortisol values are

considered the ground truth for stress response. As they note,

the scales of cortisol values of the datasets are incompatible

and thus, a cross-dataset evaluation is not feasible. However,

all three datasets were collected through similar Trier Social

Stress Test (TSST) procedures. So, the responses in each

condition of the test are expected to be similar and therefore,

the trends in predicted cortisol values can be compared. To

this end, they extract features from the speech signals in the

datasets and train models for each dataset. They highlight

the feasibility of using speech signals from one dataset as

predictors of stress in another dataset.

III. APPROACH

Deep learning models trained directly on the ECG signal

typically outperform hand-crafted HRV features on a given

dataset [11]. However, it remains unexplored if these deep

learning models perform equally well in cross-dataset evalua-

tions. To investigate this, we train 5stress recognition models

- two deep learning models using ECG signals as input, and

three models based on hand-crafted HRV features. First, we

train and evaluate the stress models on the same dataset using

leave-one-subject-out (LOSO) cross-validation. We perform

this evaluation on two different datasets. Then, we evaluate

the LOSO models trained on dataset A using samples from

the other dataset B (cross-dataset evaluation) to assess their

generalization capabilities. Baird et al. [17] note that machine

learning models can beneﬁt from combining stress datasets

as it increases the data available for training. It has not been

investigated if this holds true if the datasets are vastly different,

especially in terms of the stressors, the intensity of stress

experienced, and the brand of sensors used. So additionally, we

train the models on a combined dataset (merging samples from

the two datasets) and evaluate them using LOSO validation.

A. Datasets

1) WESAD: WESAD [1] is a multimodal dataset that

contains motion (ACC) and physiological (ECG, EDA, etc.)

signals, which were collected using chest-worn RespiBan and

wrist-worn Empatica E4 devices. The data was collected from

15 participants under three conditions: baseline, stress, and

amusement. Stress was elicited using the Trier Social Stress

Test (TSST) involving public speaking and mental arithmetic

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

OntheGeneralizabilityofECG-basedStressDetectionModelsPoojaPrajodHuman-CenteredArtificialIntelligenceUniversityofAugsburgAugsburg,Germanypooja.prajod@uni-a.deElisabethAndr´eHuman-CenteredArtificialIntelligenceUniversityofAugsburgAugsburg,Germanyelisabeth.andre@uni-a.de©2022IEEE.Personaluseofthismater...

展开>> 收起<<

On the Generalizability of ECG-based Stress Detection Models Pooja Prajod.pdf

共6页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

On the Generalizability of ECG-based Stress Detection Models Pooja Prajod

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: