LEARNING DIVERSIFIED FEATURE REPRESENTATIONS FOR FACIAL EXPRESSION RECOGNITION IN THE WILD Negar Heidari and Alexandros Iosifidis

2025-05-02 0 0 366.36KB 5 页 10玖币
侵权投诉
LEARNING DIVERSIFIED FEATURE REPRESENTATIONS
FOR FACIAL EXPRESSION RECOGNITION IN THE WILD
Negar Heidari and Alexandros Iosifidis
Department of Electrical and Computer Engineering, Aarhus University, Denmark
ABSTRACT
Diversity of the features extracted by deep neural networks is im-
portant for enhancing the model generalization ability and accord-
ingly its performance in different learning tasks. Facial expression
recognition in the wild has attracted interest in recent years due to
the challenges existing in this area for extracting discriminative and
informative features from occluded images in real-world scenarios.
In this paper, we propose a mechanism to diversify the features ex-
tracted by CNN layers of state-of-the-art facial expression recog-
nition architectures for enhancing the model capacity in learning
discriminative features. To evaluate the effectiveness of the pro-
posed approach, we incorporate this mechanism in two state-of-the-
art models to (i) diversify local/global features in an attention-based
model and (ii) diversify features extracted by different learners in an
ensemble-based model. Experimental results on three well-known
facial expression recognition in-the-wild datasets, AffectNet, FER+
and RAF-DB, show the effectiveness of our method, achieving state-
of-the-art performance of 89.99% on RAF-DB, 89.34% on FER+
and the competitive accuracy of 60.02% on AffectNet dataset.
Index TermsFacial Expression Recognition, Feature Repre-
sentation, Feature Diversity, Deep Learning, Ensemble Learning
1. INTRODUCTION
Facial expression as a fundamental natural signal for human social
communication plays an important role in different applications of
artificial intelligence, such as Human Computer Interaction (HCI),
healthcare, and driver fatigue monitoring. Deep Convolutional Neu-
ral Networks (CNNs) have led to considerable progress in automatic
Facial Expression Recognition (FER) on large-scale datasets in real-
world scenarios. FER methods aim to solve a visual perception prob-
lem by learning feature representations from facial images/videos
to be classified as an emotional category, i.e. happiness, sadness,
fear, anger, surprise, disgust, neutral, and contempt. In laboratory-
controlled datasets, such as CK+ [1] and JAFFE [2], where the facial
images are in fixed frontal pose without any occlusion, FER meth-
ods have achieved excellent performance. However, these methods
confront challenges for in-the-wild datasets, such as AffectNet [3],
FER+ [4], and RAF-DB [5], where facial images come with illumi-
nation, occlusion and pose variations causing considerable change
in facial appearance. To address that, many recent methods rely on
transfer learning to exploit the feature representations learned for
other visual perception tasks, such as object recognition, with well-
designed networks, like ResNet-18 [6], trained on large datasets, like
This work received funding from the European Union’s Horizon 2020
research and innovation programme under grant agreement No. 871449
(OpenDR). This publication reflects the authors’ views only. The European
Commission is not responsible for any use that may be made of the informa-
tion it contains.
VGG-Face [7] and MS-Celeb-1M [8], to be transferred for facial
expression recognition in challenging in-the-wild datasets. How-
ever, considering that many face datasets are small and imbalanced,
these deep neural networks are mostly over-parameterized and tend
to overfit on the training data, which can degrade their generalization
ability on unseen data.
Increasing the diversity of features learned by different network
layers/neurons has been recognized as an effective way to improve
model generalization [9]. It is theoretically shown in [10,11] that the
within-layer activation diversity improves the generalization perfor-
mance of neural networks and lowers the effect of overfitting. In this
paper, we propose a mechanism for learning diversified facial feature
representations by encouraging the learner to extract diverse spatial
and channel-wise features. This mechanism can be used in different
CNN architectures to increase the features diversity between layers
or branches, spatial regions, and/or channels of feature maps. We
incorporate our proposed optimization mechanism into two state-of-
the-art models, i.e., the MA-Net [12] and the ESR [13], and conduct
experiments on three well-known in-the-wild datasets, i.e., Affect-
Net, FER+ and RAF-DB. Experimental results demonstrate the ef-
fectiveness of learning diversified features in improving the accuracy
and generalization of the pretrained state-of-the-art models on new
samples.
The contributions of the paper can be summarized as follows:
We propose a mechanism for learning diversified features
in spatial and channel dimensions of CNNs to improve the
model’s accuracy in discriminating facial expressions.
We evaluate our feature extraction mechanism by incorpo-
rating it into two state-of-the-art models which have differ-
ent properties, i.e., one benefits from a region-based attention
mechanism and transfer learning, and the other one is an effi-
cient ensemble-based architecture. In both cases, our diversi-
fied feature learning mechanisms boost the performance.
Conducted experiments on three benchmark in-the-wild
datasets, including the large-scale dataset AffectNet, indi-
cate the effectiveness and adaptability of our method, which
can be used in different types of models. Our code is publicly
available at https://github.com/negarhdr/Diversified-Facial-
Expression-Recognition.
2. RELATED WORKS
Recent studies are focused on addressing the challenges of in-the-
wild facial expression recognition by training models with multi-
pose examples [14], and extracting key facial features based on fa-
cial landmarks and region-based attention mechanisms [15,16,17].
Learning facial features from global and local perspectives simulates
the human brain’s perception mechanism and helps achieving better
performance in visual perception problems. MA-Net [12] is a global
arXiv:2210.09381v2 [cs.CV] 19 Feb 2023
摘要:

LEARNINGDIVERSIFIEDFEATUREREPRESENTATIONSFORFACIALEXPRESSIONRECOGNITIONINTHEWILDNegarHeidariandAlexandrosIosidisDepartmentofElectricalandComputerEngineering,AarhusUniversity,DenmarkABSTRACTDiversityofthefeaturesextractedbydeepneuralnetworksisim-portantforenhancingthemodelgeneralizationabilityandacc...

展开>> 收起<<
LEARNING DIVERSIFIED FEATURE REPRESENTATIONS FOR FACIAL EXPRESSION RECOGNITION IN THE WILD Negar Heidari and Alexandros Iosifidis.pdf

共5页,预览1页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:5 页 大小:366.36KB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 5
客服
关注