CorrLoss: Integrating Co-Occurrence Domain
Knowledge for Affect Recognition
Ines Rieger∗†, Jaspar Pahl∗†, Bettina Finzel†and Ute Schmid†
∗Fraunhofer IIS, Fraunhofer Institute for Integrated Circuits IIS, Erlangen
†University of Bamberg, Cognitive Systems Group
Email: {ines.rieger, jaspar.pahl, bettina.finzel, ute.schmid}@uni-bamberg.de
Abstract—Neural networks are widely adopted, yet the inte-
gration of domain knowledge is still underutilized. We propose
to integrate domain knowledge about co-occurring facial move-
ments as a constraint in the loss function to enhance the training
of neural networks for affect recognition. As the co-ccurrence
patterns tend to be similar across datasets, applying our method
can lead to a higher generalizability of models and a lower
risk of overfitting. We demonstrate this by showing performance
increases in cross-dataset testing for various datasets. We also
show the applicability of our method for calibrating neural
networks to different facial expressions.
A purely data-driven approach for training neural networks
may reach its limits, for example, when there is training data
of low quality or when there are constraints the model must
satisfy such as natural laws or other regulations [1]. Addi-
tionally, as neural networks become more complex, the need
for interpretability increases. Integration of domain knowledge
can tackle all of these disadvantages by forcing the neural
network to adhere to constraints, which also enhances the
interpretability.
In our approach, we propose to integrate domain knowledge
on co-occuring target classes directly in the loss function to
enhance affect recognition models. For our experiments, we
concentrate on detecting facial movements called Action Units
(AUs). AUs are a psychological framework to describe distinct,
objective facial muscle movements such as lowering the brow,
or raising the cheek in a modular way. For more information
about AUs see the description by Ekman and Friesen [2] and
the survey on automatic facial AU analysis by Zhi et al. [3].
One disadvantage of affective computing and especially AU
datasets are the varying properties regarding their recording
conditions, i.e. in-the-lab vs. in-the-wild or acted vs. natural.
Training on datasets with very specific properties leads to
models which suffer from bad generalizability and therefore
do not evaluate well on datasets with different properties [4]
in a cross-dataset setting. Domain knowledge can tackle this
disadvantage, since it is to a certain degree disentangled
from the dataset properties (e.g. recording setting or subject
metadata) and therefore provides general information about
the task. For AUs, domain knowledge in the form of co-
occurrences exist due to the fact that facial expressions such as
emotions, pain or stress activate specific subgroups of AUs [5].
Furthermore, because of the anatomically predetermined de-
pendence of movements in the face, the contraction of muscles
can lead to the activation of several AUs. Since the patterns
for the same facial expression are similar across subjects, we
propose to use the co-occurrence information to enhance the
model’s generalizability and to calibrate models on distinct
facial expressions.
More specifically, we formulate the co-occurrence information
as a weighted regularization term (CorrLoss) to optimize
positive and negative AU correlations and combine it with
binary crossentropy loss (BCE). In contrast other approaches
that model the co-occurrence information in a hypothesis space
(see Section I), we formulate the co-occurrence constraint as a
regularization term. We find this a lightweight solution, which
is furthermore flexible to steer as the domain knowledge does
not need to be modeled first. For highlighting the interpretabil-
ity aspect, we provide visualizations of the ground truth and
learned co-occurrences that can be inspected with respect to
plausibility (Fig. 1). To the best of our knowledge, we are
the first to formalize a co-occurrence constraint directly in the
loss function and to finetune with this knowledge on facial
expressions. We are also the first to conduct a comprehensive
cross-dataset evaluation for assessing the generalizability of
using co-occurrence knowledge. Concisely, we answer the
following research questions. Does CorrLoss improve
1) within dataset performance?
2) cross-dataset performance?
3) calibration on facial expressions?
To evaluate our approach we use several AU benchmark
datasets (Section II-A): BP4D ([6], [7]), CK+ [8], [9], and
GFT [10], Actor Study [11], AffWild2 ([12]–[15]), and Emo-
tioNet (manually annotated part) [16].
Our key findings are: (1) When evaluating the within dataset
performance, using our CorrLoss decreases the variance over
different data folds, but does not significantly increase the
mean results (Section III-C). The lower variance over several
different data folds can indicate enhanced robustness. We can
also observe a decreased risk of overfitting in the training.
(2) When evaluating our CorrLoss in a cross-dataset setting,
the mean performance increases and variance decreases for
most datasets compared to our baseline. This means that
CorrLoss can increase the robustness and generalizability of
the model (Section III-C). This is also reflected in our state-
of-the-art comparison as our model outperforms in the cross-
dataset evaluation (Table VI). (3) We can see a performance
gain when we calibrate our trained models with CorrLoss on
arXiv:2210.17233v1  [cs.CV]  31 Oct 2022