CorrLoss Integrating Co-Occurrence Domain Knowledge for Affect Recognition

2025-09-29 0 0 843.22KB 8 页 10玖币

侵权投诉

CorrLoss: Integrating Co-Occurrence Domain

Knowledge for Affect Recognition

Ines Rieger∗†, Jaspar Pahl∗†, Bettina Finzel†and Ute Schmid†

∗Fraunhofer IIS, Fraunhofer Institute for Integrated Circuits IIS, Erlangen

†University of Bamberg, Cognitive Systems Group

Email: {ines.rieger, jaspar.pahl, bettina.ﬁnzel, ute.schmid}@uni-bamberg.de

Abstract—Neural networks are widely adopted, yet the inte-

gration of domain knowledge is still underutilized. We propose

to integrate domain knowledge about co-occurring facial move-

ments as a constraint in the loss function to enhance the training

of neural networks for affect recognition. As the co-ccurrence

patterns tend to be similar across datasets, applying our method

can lead to a higher generalizability of models and a lower

risk of overﬁtting. We demonstrate this by showing performance

increases in cross-dataset testing for various datasets. We also

show the applicability of our method for calibrating neural

networks to different facial expressions.

A purely data-driven approach for training neural networks

may reach its limits, for example, when there is training data

of low quality or when there are constraints the model must

satisfy such as natural laws or other regulations [1]. Addi-

tionally, as neural networks become more complex, the need

for interpretability increases. Integration of domain knowledge

can tackle all of these disadvantages by forcing the neural

network to adhere to constraints, which also enhances the

interpretability.

In our approach, we propose to integrate domain knowledge

on co-occuring target classes directly in the loss function to

enhance affect recognition models. For our experiments, we

concentrate on detecting facial movements called Action Units

(AUs). AUs are a psychological framework to describe distinct,

objective facial muscle movements such as lowering the brow,

or raising the cheek in a modular way. For more information

about AUs see the description by Ekman and Friesen [2] and

the survey on automatic facial AU analysis by Zhi et al. [3].

One disadvantage of affective computing and especially AU

datasets are the varying properties regarding their recording

conditions, i.e. in-the-lab vs. in-the-wild or acted vs. natural.

Training on datasets with very speciﬁc properties leads to

models which suffer from bad generalizability and therefore

do not evaluate well on datasets with different properties [4]

in a cross-dataset setting. Domain knowledge can tackle this

disadvantage, since it is to a certain degree disentangled

from the dataset properties (e.g. recording setting or subject

metadata) and therefore provides general information about

the task. For AUs, domain knowledge in the form of co-

occurrences exist due to the fact that facial expressions such as

emotions, pain or stress activate speciﬁc subgroups of AUs [5].

Furthermore, because of the anatomically predetermined de-

pendence of movements in the face, the contraction of muscles

can lead to the activation of several AUs. Since the patterns

for the same facial expression are similar across subjects, we

propose to use the co-occurrence information to enhance the

model’s generalizability and to calibrate models on distinct

facial expressions.

More speciﬁcally, we formulate the co-occurrence information

as a weighted regularization term (CorrLoss) to optimize

positive and negative AU correlations and combine it with

binary crossentropy loss (BCE). In contrast other approaches

that model the co-occurrence information in a hypothesis space

(see Section I), we formulate the co-occurrence constraint as a

regularization term. We ﬁnd this a lightweight solution, which

is furthermore ﬂexible to steer as the domain knowledge does

not need to be modeled ﬁrst. For highlighting the interpretabil-

ity aspect, we provide visualizations of the ground truth and

learned co-occurrences that can be inspected with respect to

plausibility (Fig. 1). To the best of our knowledge, we are

the ﬁrst to formalize a co-occurrence constraint directly in the

loss function and to ﬁnetune with this knowledge on facial

expressions. We are also the ﬁrst to conduct a comprehensive

cross-dataset evaluation for assessing the generalizability of

using co-occurrence knowledge. Concisely, we answer the

following research questions. Does CorrLoss improve

1) within dataset performance?

2) cross-dataset performance?

3) calibration on facial expressions?

To evaluate our approach we use several AU benchmark

datasets (Section II-A): BP4D ([6], [7]), CK+ [8], [9], and

GFT [10], Actor Study [11], AffWild2 ([12]–[15]), and Emo-

tioNet (manually annotated part) [16].

Our key ﬁndings are: (1) When evaluating the within dataset

performance, using our CorrLoss decreases the variance over

different data folds, but does not signiﬁcantly increase the

mean results (Section III-C). The lower variance over several

different data folds can indicate enhanced robustness. We can

also observe a decreased risk of overﬁtting in the training.

(2) When evaluating our CorrLoss in a cross-dataset setting,

the mean performance increases and variance decreases for

most datasets compared to our baseline. This means that

CorrLoss can increase the robustness and generalizability of

the model (Section III-C). This is also reﬂected in our state-

of-the-art comparison as our model outperforms in the cross-

dataset evaluation (Table VI). (3) We can see a performance

gain when we calibrate our trained models with CorrLoss on

arXiv:2210.17233v1 [cs.CV] 31 Oct 2022

(a) Happy

face example

from [11].

(b) Ground truth correlations of facial AUs

for happy faces: There is eye brow movement

(AU 1-2), cheek raising and lid tightening

(AU 6-7), and mouth movements that extend

also to the cheek and chin (AU 12-24).

test data, when trained with binary crossen-

tropy. The neural network learns the corre-

lations automatically to a certain degree.

(d) These are predictions when trained with

binary crossentropy (BCE) and CorrLoss.

CorrLoss forces the NN to learn the true

correlations between AUs. This matrix is

more similar to the ground truth correlation

compared to when only trained with BCE.

Fig. 1. Intuition on how well the model learns ground truth correlations with our CorrLoss regularization term. The example stems from Section III-D, where

we ﬁnetune models on facial expressions.

speciﬁc facial expression tasks like happiness or pain (Sec-

tion III-D).

I. RELATED WORK

There are different ways to include domain knowledge in

deep learning models as a constraint. Borghesi et al. [17]

highlight the following main approaches: feature engineering,

modelling the hypothesis space e.g. with a Graph Neural Net-

work [18], using constrained data augmentation, and adding a

regularization term that includes mathematically formulated

constraint knowledge. Based on these categories, we can

categorize the related work with respect to our approach:

a) Regularization Term: A regularization term for en-

forcing domain constraints needs to be mathematically for-

mulated in such a way that the term is differentiable and

therefore suitable for updating the weights in a neural network.

To the best of our knowledge we are the ﬁrst to incorporate

co-occurrence domain knowledge in a loss function. However,

there are approaches that formulate regularization terms based

on other domain knowledge. Muralidhar et al. [19] incorporate

monotonicity and approximation constraints in the loss func-

tion for predicting the solubility of oxygen in water. Song et

al. [20] use an additional channel correlation loss for image

classiﬁcation in order to constrain the relations between classes

and channels. Since these are different constraints, we cannot

build on their approaches.

b) Model Hypothesis Space: Most approaches using AU

correlations model this information in a hypothesis space.

Corneanu et al.[21] apply Structure Inference that is inspired

by graphical models to exploit the correlation information

and update the AU predictions accordingly. Before applying

Structure Inference, they use patch learning and fuse the

patches. Li et al.[22] use a Gated Graph Neural Network,

which is guided by a knowledge graph containing AU cor-

relations. Like Corneanu et al.[21], they combine it with

patch learning. Cui et al. [23] use a Bayesian Network to

model the correlation information as a weak supervision for

backpropagation. Song et al.[24] employ different Bayesian

graph structures to capture different correlations for each

facial expression. In contrast to our work they do not use

the correlations to ﬁnetune or evaluate on facial expressions.

All these approaches demonstrate increasing performance on

the same dataset, which is encouraging but in contrast to

us do not evaluate cross-dataset to demonstrate the model’s

generalizability. Furthermore, modeling the information in a

hypothesis space is an extra step our approach does not need,

which gives us for example the opportunity to ﬁnetune easily

on new data.

c) Other: Wang et al. [25] use the correlation informa-

tion between AUs and emotions as a probability for generating

pseudo AU expressions in a semi-supervised approach. They

evaluate their approach cross-dataset with three datasets, but

do not compare with and without correlation information.

Also, since it is a semi-supervised approach it is not suitable

for comparison. Zhao et al. [26] propose a patch-learning

approach that takes region patches of co-occurring AUs into

account and thus preserve the correlation of the dataset.

Shao et al. [27] reach state-of-the-art results by learning AU

detection and face alignment together. We compare our results

to their J ˆ

AA model.

All in all, none of these approaches use the AU correlation

information directly as a constraint in the loss function and we

can conclude that there is a lack of cross-dataset evaluation.

To the best of our knowledge we are also the ﬁrst to use co-

occurrence information for calibration on facial expression.

II. METHODS

A. Data Pre-Processing

Table I describes the different properties of our datasets.

For training, we load the video frames or images in color

and crop the faces with the OpenCV [28] DNN module for

face detection, a state-of-the-art and open-source framework.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

CorrLoss:IntegratingCo-OccurrenceDomainKnowledgeforAffectRecognitionInesRiegery,JasparPahly,BettinaFinzelyandUteSchmidyFraunhoferIIS,FraunhoferInstituteforIntegratedCircuitsIIS,ErlangenyUniversityofBamberg,CognitiveSystemsGroupEmail:fines.rieger,jaspar.pahl,bettina.nzel,ute.schmidg@uni-bamberg.d...

展开>> 收起<<

CorrLoss Integrating Co-Occurrence Domain Knowledge for Affect Recognition.pdf

共8页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

CorrLoss Integrating Co-Occurrence Domain Knowledge for Affect Recognition

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: