Missing Modality meets Meta Sampling M3S An Efficient Universal Approach for Multimodal Sentiment Analysis with Missing Modality Haozhe Chi Minghua Yang Junhao Zhu Guanhong Wang Gaoang Wang

2025-05-02 0 0 1.14MB 10 页 10玖币
侵权投诉
Missing Modality meets Meta Sampling (M3S): An Efficient Universal
Approach for Multimodal Sentiment Analysis with Missing Modality
Haozhe Chi Minghua Yang Junhao Zhu Guanhong Wang Gaoang Wang
Zhejiang University-University of Illinois at Urbana-Champaign Institute,
Zhejiang University, China
{haozhe.20, minghua.20, junhao.20, gaoangwang}@intl.zju.edu.cn
guanhongwang@zju.edu.cn
Abstract
Multimodal sentiment analysis (MSA) is an im-
portant way of observing mental activities with
the help of data captured from multiple modal-
ities. However, due to the recording or trans-
mission error, some modalities may include
incomplete data. Most existing works that
address missing modalities usually assume a
particular modality is completely missing and
seldom consider a mixture of missing across
multiple modalities. In this paper, we pro-
pose a simple yet effective meta-sampling ap-
proach for multimodal sentiment analysis with
missing modalities, namely Missing Modality-
based Meta Sampling (M3S). To be specific,
M3S formulates a missing modality sampling
strategy into the modal agnostic meta-learning
(MAML) framework. M3S can be treated as
an efficient add-on training component on ex-
isting models and significantly improve their
performances on multimodal data with a mix-
ture of missing modalities. We conduct exper-
iments on IEMOCAP, SIMS and CMU-MOSI
datasets, and superior performance is achieved
compared with recent state-of-the-art methods.
1 Introduction
Multimodal sentiment analysis (MSA) aims to esti-
mate human mental activities by multimodal data,
such as a combination of audio, video, and text.
Though much progress has been made recently,
there still exist challenges, including missing modal-
ity problem. In reality, missing modality is a com-
mon problem due to the errors in data collection,
storage, and transmission. To address the issue with
missing modality in MSA, many approaches have
been proposed (Ma et al.,2021c;Zhao et al.,2021;
Ma et al.,2021b;Parthasarathy and Sundaram,
2020;Ma et al.,2021a;Tran et al.,2017).
In general, methods that address the missing
modality issue usually only consider the situation
where a certain input modality is severely damaged.
Corresponding author.
Figure 1: M3S helps MMIN model achieve superior
performance.
The strategies of these proposed methods can be
divided into three categories: 1) Designing new ar-
chitectures with a reconstruction network to recover
missing modality with the information from other
modalities (Ma et al.,2021c;Ding et al.,2014); 2)
Formulating innovative and efficient loss functions
to tackle missing modality (Ma et al.,2021a,2022);
3) Improving the encoding and embedding strate-
gies from existing models (Tran et al.,2017;Cai
et al.,2018).
In the MSA tasks, most of the proposed methods
focus on the situation where certain modalities
are completely missing and the other modalities
are complete. However, due to the transmission
or collection errors, each modality may contain
partial information based on a certain missing rate,
while existing methods seldom consider this type
of scenario and they are not suitable to be applied
directly in this situation. Besides, our experiments
also verify the inefficacy of existing methods in
such a challenging situation, which is demonstrated
in Section 5.
To address the aforementioned problem, in this
paper, we propose a simple yet effective solution to
the
M
issing
M
odality problem with
M
eta
S
ampling
in the MSA task, namely M
3
S. To be specific, M
3
S
combines the augmented missing modality trans-
arXiv:2210.03428v1 [cs.CV] 7 Oct 2022
form in sampling, following the model-agnostic
meta-learning (MAML) framework (Finn et al.,
2017). M
3
S maintains the advantage of meta-
learning and makes models easily adapt to data
with different missing rates. M
3
S can be treated
as an efficient add-on training component on ex-
isting models and significantly improve their per-
formances on multimodal data with a mixture of
missing modalities. We conduct experiments on
IEMOCAP (Busso et al.,2008), SIMS (Yu et al.,
2020) and CMU-MOSI (Zadeh et al.,2016) datasets
and superior performance is achieved compared
with recent state-of-the-art (SOTA) methods. A
simple example is shown in Figure 1, demonstrating
the effectiveness of our proposed M
3
S compared
with other methods. More details are provided in
the experiment section.
The main contributions of our work are as fol-
lows:
We formulate a simple yet effective meta-
training framework to address the problem
of a mixture of partial missing modalities in
the MSA tasks.
The proposed method M
3
S can be treated as
an efficient add-on training component on ex-
isting models and significantly improve their
performances on dealing with missing modal-
ity.
We conduct comprehensive experiments on
widely used datasets in MSA, including IEMO-
CAP, SIMS, and CMU-MOSI. Superior per-
formance is achieved compared with recent
SOTA methods.
2 Related Work
2.1 Emotion Recognition
Emotion recognition aims to identify and predict
emotions through these physiological and behav-
ioral responses. Emotions are expressed in a variety
of modality forms. However, early studies on emo-
tion recognition are often single modality. Shaheen
et al. (2014) and Calefato et al. (2017) present
novel approaches to automatic emotion recognition
from text. Burkert et al. (2015) and Deng et al.
(2020) conduct researches on facial expressions
and the emotions behind them. Koolagudi and Rao
(2012) and Yoon et al. (2019) exploit acoustic data
in different types of speeches for emotional recogni-
tion and classification tasks. Though much progress
has been made for emotion recognition with sin-
gle modality data, how to combine information
from diverse modalities has become an interesting
direction in this area.
2.2 Multimodal Sentiment Analysis
Multimodal sentiment analysis (MSA) is a popu-
lar area of research in the present since the world
we live in has several modality forms. When the
dataset consists of more than one modality infor-
mation, traditional single modality methods are
difficult to deal with. MSA mainly focuses on three
modalities: text, audio, and video. It makes use
of the complementarity of multimodal information
to improve the accuracy of emotion recognition.
However, the heterogeneity of data and signals
bring significant challenges because it creates dis-
tributional modality gaps. Hazarika et al. (2020)
propose a novel framework, MISA, which projects
each modality to two distinct subspaces to aid the
fusion process. And Hori et al. (2017) introduce
a multimodal attention model that can selectively
utilize features from different modalities. Since
the performance of a model highly depends on the
quality of multimodal fusion, Han et al. (2021b)
construct a framework named MultiModal InfoMax
(MMIM) to maximize the mutual information in
unimodal input pairs as well as obtain information
related to tasks through multimodal fusion process.
Besides, Han et al. (2021a) make use of an end-to-
end network Bi-Bimodal Fusion Network (BBFN)
to better utilize the dynamics of independence and
correlation between modalities. Due to the unified
multimodal annotation, previous methods are re-
stricted in capturing differentiated information. Yu
et al. (2021) design a label generation module based
on the self-supervised learning strategy. Then, joint
training the multimodal and unimodal tasks to learn
the consistency and difference. However, limited by
the pre-processed features, the results show that the
generated audio and vision labels are not significant
enough.
2.3 Missing Modality Problem
Compared with unimodal learning method, mul-
timodal learning has achieved great success. It
improves the performance of emotion recognition
tasks by effectively combining the information from
different modalities. However, the multimodal data
may have missing modalities in reality due to a
variety of reasons like signal transmission error
摘要:

MissingModalitymeetsMetaSampling(M3S):AnEcientUniversalApproachforMultimodalSentimentAnalysiswithMissingModalityHaozheChiMinghuaYangJunhaoZhuGuanhongWangGaoangWangZhejiangUniversity-UniversityofIllinoisatUrbana-ChampaignInstitute,ZhejiangUniversity,China{haozhe.20,minghua.20,junhao.20,gaoangwang}@...

展开>> 收起<<
Missing Modality meets Meta Sampling M3S An Efficient Universal Approach for Multimodal Sentiment Analysis with Missing Modality Haozhe Chi Minghua Yang Junhao Zhu Guanhong Wang Gaoang Wang.pdf

共10页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:10 页 大小:1.14MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 10
客服
关注