CorrLoss Integrating Co-Occurrence Domain Knowledge for Affect Recognition

2025-09-29 0 0 843.22KB 8 页 10玖币
侵权投诉
CorrLoss: Integrating Co-Occurrence Domain
Knowledge for Affect Recognition
Ines Rieger, Jaspar Pahl, Bettina Finzeland Ute Schmid
Fraunhofer IIS, Fraunhofer Institute for Integrated Circuits IIS, Erlangen
University of Bamberg, Cognitive Systems Group
Email: {ines.rieger, jaspar.pahl, bettina.finzel, ute.schmid}@uni-bamberg.de
Abstract—Neural networks are widely adopted, yet the inte-
gration of domain knowledge is still underutilized. We propose
to integrate domain knowledge about co-occurring facial move-
ments as a constraint in the loss function to enhance the training
of neural networks for affect recognition. As the co-ccurrence
patterns tend to be similar across datasets, applying our method
can lead to a higher generalizability of models and a lower
risk of overfitting. We demonstrate this by showing performance
increases in cross-dataset testing for various datasets. We also
show the applicability of our method for calibrating neural
networks to different facial expressions.
A purely data-driven approach for training neural networks
may reach its limits, for example, when there is training data
of low quality or when there are constraints the model must
satisfy such as natural laws or other regulations [1]. Addi-
tionally, as neural networks become more complex, the need
for interpretability increases. Integration of domain knowledge
can tackle all of these disadvantages by forcing the neural
network to adhere to constraints, which also enhances the
interpretability.
In our approach, we propose to integrate domain knowledge
on co-occuring target classes directly in the loss function to
enhance affect recognition models. For our experiments, we
concentrate on detecting facial movements called Action Units
(AUs). AUs are a psychological framework to describe distinct,
objective facial muscle movements such as lowering the brow,
or raising the cheek in a modular way. For more information
about AUs see the description by Ekman and Friesen [2] and
the survey on automatic facial AU analysis by Zhi et al. [3].
One disadvantage of affective computing and especially AU
datasets are the varying properties regarding their recording
conditions, i.e. in-the-lab vs. in-the-wild or acted vs. natural.
Training on datasets with very specific properties leads to
models which suffer from bad generalizability and therefore
do not evaluate well on datasets with different properties [4]
in a cross-dataset setting. Domain knowledge can tackle this
disadvantage, since it is to a certain degree disentangled
from the dataset properties (e.g. recording setting or subject
metadata) and therefore provides general information about
the task. For AUs, domain knowledge in the form of co-
occurrences exist due to the fact that facial expressions such as
emotions, pain or stress activate specific subgroups of AUs [5].
Furthermore, because of the anatomically predetermined de-
pendence of movements in the face, the contraction of muscles
can lead to the activation of several AUs. Since the patterns
for the same facial expression are similar across subjects, we
propose to use the co-occurrence information to enhance the
model’s generalizability and to calibrate models on distinct
facial expressions.
More specifically, we formulate the co-occurrence information
as a weighted regularization term (CorrLoss) to optimize
positive and negative AU correlations and combine it with
binary crossentropy loss (BCE). In contrast other approaches
that model the co-occurrence information in a hypothesis space
(see Section I), we formulate the co-occurrence constraint as a
regularization term. We find this a lightweight solution, which
is furthermore flexible to steer as the domain knowledge does
not need to be modeled first. For highlighting the interpretabil-
ity aspect, we provide visualizations of the ground truth and
learned co-occurrences that can be inspected with respect to
plausibility (Fig. 1). To the best of our knowledge, we are
the first to formalize a co-occurrence constraint directly in the
loss function and to finetune with this knowledge on facial
expressions. We are also the first to conduct a comprehensive
cross-dataset evaluation for assessing the generalizability of
using co-occurrence knowledge. Concisely, we answer the
following research questions. Does CorrLoss improve
1) within dataset performance?
2) cross-dataset performance?
3) calibration on facial expressions?
To evaluate our approach we use several AU benchmark
datasets (Section II-A): BP4D ([6], [7]), CK+ [8], [9], and
GFT [10], Actor Study [11], AffWild2 ([12]–[15]), and Emo-
tioNet (manually annotated part) [16].
Our key findings are: (1) When evaluating the within dataset
performance, using our CorrLoss decreases the variance over
different data folds, but does not significantly increase the
mean results (Section III-C). The lower variance over several
different data folds can indicate enhanced robustness. We can
also observe a decreased risk of overfitting in the training.
(2) When evaluating our CorrLoss in a cross-dataset setting,
the mean performance increases and variance decreases for
most datasets compared to our baseline. This means that
CorrLoss can increase the robustness and generalizability of
the model (Section III-C). This is also reflected in our state-
of-the-art comparison as our model outperforms in the cross-
dataset evaluation (Table VI). (3) We can see a performance
gain when we calibrate our trained models with CorrLoss on
arXiv:2210.17233v1 [cs.CV] 31 Oct 2022
(a) Happy
face example
from [11].
(b) Ground truth correlations of facial AUs
for happy faces: There is eye brow movement
(AU 1-2), cheek raising and lid tightening
(AU 6-7), and mouth movements that extend
also to the cheek and chin (AU 12-24).
(c) These are predictions for happy faces in
test data, when trained with binary crossen-
tropy. The neural network learns the corre-
lations automatically to a certain degree.
(d) These are predictions when trained with
binary crossentropy (BCE) and CorrLoss.
CorrLoss forces the NN to learn the true
correlations between AUs. This matrix is
more similar to the ground truth correlation
compared to when only trained with BCE.
Fig. 1. Intuition on how well the model learns ground truth correlations with our CorrLoss regularization term. The example stems from Section III-D, where
we finetune models on facial expressions.
specific facial expression tasks like happiness or pain (Sec-
tion III-D).
I. RELATED WORK
There are different ways to include domain knowledge in
deep learning models as a constraint. Borghesi et al. [17]
highlight the following main approaches: feature engineering,
modelling the hypothesis space e.g. with a Graph Neural Net-
work [18], using constrained data augmentation, and adding a
regularization term that includes mathematically formulated
constraint knowledge. Based on these categories, we can
categorize the related work with respect to our approach:
a) Regularization Term: A regularization term for en-
forcing domain constraints needs to be mathematically for-
mulated in such a way that the term is differentiable and
therefore suitable for updating the weights in a neural network.
To the best of our knowledge we are the first to incorporate
co-occurrence domain knowledge in a loss function. However,
there are approaches that formulate regularization terms based
on other domain knowledge. Muralidhar et al. [19] incorporate
monotonicity and approximation constraints in the loss func-
tion for predicting the solubility of oxygen in water. Song et
al. [20] use an additional channel correlation loss for image
classification in order to constrain the relations between classes
and channels. Since these are different constraints, we cannot
build on their approaches.
b) Model Hypothesis Space: Most approaches using AU
correlations model this information in a hypothesis space.
Corneanu et al.[21] apply Structure Inference that is inspired
by graphical models to exploit the correlation information
and update the AU predictions accordingly. Before applying
Structure Inference, they use patch learning and fuse the
patches. Li et al.[22] use a Gated Graph Neural Network,
which is guided by a knowledge graph containing AU cor-
relations. Like Corneanu et al.[21], they combine it with
patch learning. Cui et al. [23] use a Bayesian Network to
model the correlation information as a weak supervision for
backpropagation. Song et al.[24] employ different Bayesian
graph structures to capture different correlations for each
facial expression. In contrast to our work they do not use
the correlations to finetune or evaluate on facial expressions.
All these approaches demonstrate increasing performance on
the same dataset, which is encouraging but in contrast to
us do not evaluate cross-dataset to demonstrate the model’s
generalizability. Furthermore, modeling the information in a
hypothesis space is an extra step our approach does not need,
which gives us for example the opportunity to finetune easily
on new data.
c) Other: Wang et al. [25] use the correlation informa-
tion between AUs and emotions as a probability for generating
pseudo AU expressions in a semi-supervised approach. They
evaluate their approach cross-dataset with three datasets, but
do not compare with and without correlation information.
Also, since it is a semi-supervised approach it is not suitable
for comparison. Zhao et al. [26] propose a patch-learning
approach that takes region patches of co-occurring AUs into
account and thus preserve the correlation of the dataset.
Shao et al. [27] reach state-of-the-art results by learning AU
detection and face alignment together. We compare our results
to their J ˆ
AA model.
All in all, none of these approaches use the AU correlation
information directly as a constraint in the loss function and we
can conclude that there is a lack of cross-dataset evaluation.
To the best of our knowledge we are also the first to use co-
occurrence information for calibration on facial expression.
II. METHODS
A. Data Pre-Processing
Table I describes the different properties of our datasets.
For training, we load the video frames or images in color
and crop the faces with the OpenCV [28] DNN module for
face detection, a state-of-the-art and open-source framework.
摘要:

CorrLoss:IntegratingCo-OccurrenceDomainKnowledgeforAffectRecognitionInesRiegery,JasparPahly,BettinaFinzelyandUteSchmidyFraunhoferIIS,FraunhoferInstituteforIntegratedCircuitsIIS,ErlangenyUniversityofBamberg,CognitiveSystemsGroupEmail:fines.rieger,jaspar.pahl,bettina.nzel,ute.schmidg@uni-bamberg.d...

展开>> 收起<<
CorrLoss Integrating Co-Occurrence Domain Knowledge for Affect Recognition.pdf

共8页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:8 页 大小:843.22KB 格式:PDF 时间:2025-09-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 8
客服
关注