EXPLOITING MODALITY-INVARIANT FEATURE FOR ROBUST MULTIMODAL EMOTION RECOGNITION WITH MISSING MODALITIES Haolin Zuo1 Rui Liu1 Jinming Zhao2 Guanglai Gao1 Haizhou Li3

2025-05-06 0 0 603.29KB 5 页 10玖币
侵权投诉
EXPLOITING MODALITY-INVARIANT FEATURE FOR ROBUST MULTIMODAL EMOTION
RECOGNITION WITH MISSING MODALITIES
Haolin Zuo1, Rui Liu1,, Jinming Zhao2, Guanglai Gao1, Haizhou Li3
1Inner Mongolia University, Hohhot, China
2Qiyuan Lab, Beijing, China
3The Chinese University of Hong Kong, Shenzhen, China
zuohaolin 0613@163.com, liurui imu@163.com, zhaojinming@qiyuanlab.com, csggl@imu.edu.cn, haizhouli@cuhk.edu.cn
ABSTRACT
Multimodal emotion recognition leverages complementary
information across modalities to gain performance. How-
ever, we cannot guarantee that the data of all modalities
are always present in practice. In the studies to predict
the missing data across modalities, the inherent difference
between heterogeneous modalities, namely the modality gap,
presents a challenge. To address this, we propose to use
invariant features for a missing modality imagination network
(IF-MMIN) which includes two novel mechanisms: 1) an
invariant feature learning strategy that is based on the central
moment discrepancy (CMD) distance under the full-modality
scenario; 2) an invariant feature based imagination module
(IF-IM) to alleviate the modality gap during the missing
modalities prediction, thus improving the robustness of mul-
timodal joint representation. Comprehensive experiments
on the benchmark dataset IEMOCAP demonstrate that the
proposed model outperforms all baselines and invariantly
improves the overall emotion recognition performance under
uncertain missing-modality conditions. We release the code
at: https://github.com/ZhuoYulang/IF-MMIN.
Index TermsMultimodal emotion recognition,
Missing modality imagination, Central moment discrepancy
(CMD), Invariant feature
1. INTRODUCTION
The study of multimodal emotion recognition with missing
modalities seeks to perform emotion recognition in realistic
environments [1, 2], where some data could be missing
due to obscured cameras, damaged microphones, etc. The
mainstream solution for the missing modality problem can
be summarized in two categories: 1) missing data generation
[3–5], 2) multimodal joint representation learning [6, 7]. In
[3], an encoder-decoder network was proposed to generate
*: Corresponding author.
This research is funded by the High-level Talents Introduction Project
of Inner Mongolia University (No. 10000-22311201/002) and the Young
Scientists Fund of the National Natural Science Foundation of China (NSFC)
(No. 62206136).
high-quality missing modality images according to the avail-
able modality, In [7], a translation-based method with cycle
consistency loss was studied to learn joint representations
between modalities. In [1], a Missing Modality Imagination
Network, or MMIN for short, was studied to learn joint repre-
sentations by predicting missing modalities, which combines
the above two methods.
The modality gap between heterogeneous modalities [8–
10] remains an issue, which adversely affects emotion recog-
nition accuracy. The question is how to alleviate such a
modality gap. While the modalities have their unique char-
acteristics, they share the same information in the semantic
space. The modality-invariant feature was introduced to mul-
timodal emotion recognition with full modality data, which
shows remarkable performance. Hazarika et al. [8] proposed
the shared subspace to learn potential commonalities between
modalities to reduce the influence of the modality gap. Liu
et al. [11] proposed discrete shared spaces for capturing
fine-grained representations to improve cross-modal retrieval
accuracy. All the studies suggest that the modality-invariant
feature effectively bridges the modality gap. We note that
there has been no related work for emotion recognition under
the missing-modality conditions.
In this work, we propose a missing modality imagination
network with the invariant feature (IF-MMIN). Specifically,
we first learn the modality-invariant feature among various
modalities by using a central moment discrepancy (CMD)
distance [12] based constraint training strategy. We then de-
sign the IF-MMIN neural architecture to predict the invariant
features of the missing modality from the available modality.
In this way, we fully explore the available modality to alle-
viate the modality gap problem in cross-modal imagination,
thus, improving the robustness of multimodal joint represen-
tation. The experimental results, on the benchmark dataset
IEMOCAP, show that the proposed method outperforms the
state-of-the-art baseline models under all missing-modality
conditions.
The main contributions of this work are, 1) we propose a
CMD-based distance constraint training to learn the modality-
arXiv:2210.15359v1 [cs.CV] 27 Oct 2022
摘要:

EXPLOITINGMODALITY-INVARIANTFEATUREFORROBUSTMULTIMODALEMOTIONRECOGNITIONWITHMISSINGMODALITIESHaolinZuo1,RuiLiu1;,JinmingZhao2,GuanglaiGao1,HaizhouLi31InnerMongoliaUniversity,Hohhot,China2QiyuanLab,Beijing,China3TheChineseUniversityofHongKong,Shenzhen,Chinazuohaolin0613@163.com,liuruiimu@163.com,zha...

展开>> 收起<<
EXPLOITING MODALITY-INVARIANT FEATURE FOR ROBUST MULTIMODAL EMOTION RECOGNITION WITH MISSING MODALITIES Haolin Zuo1 Rui Liu1 Jinming Zhao2 Guanglai Gao1 Haizhou Li3.pdf

共5页,预览1页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:5 页 大小:603.29KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 5
客服
关注