Missing Modality meets Meta Sampling M3S An Eﬃcient Universal Approach for Multimodal Sentiment Analysis with Missing Modality Haozhe Chi Minghua Yang Junhao Zhu Guanhong Wang Gaoang Wang

2025-05-02 0 0 1.14MB 10 页 10玖币

侵权投诉

Missing Modality meets Meta Sampling (M3S): An Eﬃcient Universal

Approach for Multimodal Sentiment Analysis with Missing Modality

Haozhe Chi Minghua Yang Junhao Zhu Guanhong Wang Gaoang Wang∗

Zhejiang University-University of Illinois at Urbana-Champaign Institute,

Zhejiang University, China

{haozhe.20, minghua.20, junhao.20, gaoangwang}@intl.zju.edu.cn

guanhongwang@zju.edu.cn

Abstract

Multimodal sentiment analysis (MSA) is an im-

portant way of observing mental activities with

the help of data captured from multiple modal-

ities. However, due to the recording or trans-

mission error, some modalities may include

incomplete data. Most existing works that

address missing modalities usually assume a

particular modality is completely missing and

seldom consider a mixture of missing across

multiple modalities. In this paper, we pro-

pose a simple yet eﬀective meta-sampling ap-

proach for multimodal sentiment analysis with

missing modalities, namely Missing Modality-

based Meta Sampling (M3S). To be speciﬁc,

M3S formulates a missing modality sampling

strategy into the modal agnostic meta-learning

(MAML) framework. M3S can be treated as

an eﬃcient add-on training component on ex-

isting models and signiﬁcantly improve their

performances on multimodal data with a mix-

ture of missing modalities. We conduct exper-

iments on IEMOCAP, SIMS and CMU-MOSI

datasets, and superior performance is achieved

compared with recent state-of-the-art methods.

1 Introduction

Multimodal sentiment analysis (MSA) aims to esti-

mate human mental activities by multimodal data,

such as a combination of audio, video, and text.

Though much progress has been made recently,

there still exist challenges, including missing modal-

ity problem. In reality, missing modality is a com-

mon problem due to the errors in data collection,

storage, and transmission. To address the issue with

missing modality in MSA, many approaches have

been proposed (Ma et al.,2021c;Zhao et al.,2021;

Ma et al.,2021b;Parthasarathy and Sundaram,

2020;Ma et al.,2021a;Tran et al.,2017).

In general, methods that address the missing

modality issue usually only consider the situation

where a certain input modality is severely damaged.

∗Corresponding author.

Figure 1: M3S helps MMIN model achieve superior

performance.

The strategies of these proposed methods can be

divided into three categories: 1) Designing new ar-

chitectures with a reconstruction network to recover

missing modality with the information from other

modalities (Ma et al.,2021c;Ding et al.,2014); 2)

Formulating innovative and eﬃcient loss functions

to tackle missing modality (Ma et al.,2021a,2022);

3) Improving the encoding and embedding strate-

gies from existing models (Tran et al.,2017;Cai

et al.,2018).

In the MSA tasks, most of the proposed methods

focus on the situation where certain modalities

are completely missing and the other modalities

are complete. However, due to the transmission

or collection errors, each modality may contain

partial information based on a certain missing rate,

while existing methods seldom consider this type

of scenario and they are not suitable to be applied

directly in this situation. Besides, our experiments

also verify the ineﬃcacy of existing methods in

such a challenging situation, which is demonstrated

in Section 5.

To address the aforementioned problem, in this

paper, we propose a simple yet eﬀective solution to

the

issing

odality problem with

eta

ampling

in the MSA task, namely M

S. To be speciﬁc, M

combines the augmented missing modality trans-

arXiv:2210.03428v1 [cs.CV] 7 Oct 2022

form in sampling, following the model-agnostic

meta-learning (MAML) framework (Finn et al.,

2017). M

S maintains the advantage of meta-

learning and makes models easily adapt to data

with diﬀerent missing rates. M

S can be treated

as an eﬃcient add-on training component on ex-

isting models and signiﬁcantly improve their per-

formances on multimodal data with a mixture of

missing modalities. We conduct experiments on

IEMOCAP (Busso et al.,2008), SIMS (Yu et al.,

2020) and CMU-MOSI (Zadeh et al.,2016) datasets

and superior performance is achieved compared

with recent state-of-the-art (SOTA) methods. A

simple example is shown in Figure 1, demonstrating

the eﬀectiveness of our proposed M

S compared

with other methods. More details are provided in

the experiment section.

The main contributions of our work are as fol-

lows:

•

We formulate a simple yet eﬀective meta-

training framework to address the problem

of a mixture of partial missing modalities in

the MSA tasks.

•

The proposed method M

S can be treated as

an eﬃcient add-on training component on ex-

isting models and signiﬁcantly improve their

performances on dealing with missing modal-

ity.

•

We conduct comprehensive experiments on

widely used datasets in MSA, including IEMO-

CAP, SIMS, and CMU-MOSI. Superior per-

formance is achieved compared with recent

SOTA methods.

2 Related Work

2.1 Emotion Recognition

Emotion recognition aims to identify and predict

emotions through these physiological and behav-

ioral responses. Emotions are expressed in a variety

of modality forms. However, early studies on emo-

tion recognition are often single modality. Shaheen

et al. (2014) and Calefato et al. (2017) present

novel approaches to automatic emotion recognition

from text. Burkert et al. (2015) and Deng et al.

(2020) conduct researches on facial expressions

and the emotions behind them. Koolagudi and Rao

(2012) and Yoon et al. (2019) exploit acoustic data

in diﬀerent types of speeches for emotional recogni-

tion and classiﬁcation tasks. Though much progress

has been made for emotion recognition with sin-

gle modality data, how to combine information

from diverse modalities has become an interesting

direction in this area.

2.2 Multimodal Sentiment Analysis

Multimodal sentiment analysis (MSA) is a popu-

lar area of research in the present since the world

we live in has several modality forms. When the

dataset consists of more than one modality infor-

mation, traditional single modality methods are

diﬃcult to deal with. MSA mainly focuses on three

modalities: text, audio, and video. It makes use

of the complementarity of multimodal information

to improve the accuracy of emotion recognition.

However, the heterogeneity of data and signals

bring signiﬁcant challenges because it creates dis-

tributional modality gaps. Hazarika et al. (2020)

propose a novel framework, MISA, which projects

each modality to two distinct subspaces to aid the

fusion process. And Hori et al. (2017) introduce

a multimodal attention model that can selectively

utilize features from diﬀerent modalities. Since

the performance of a model highly depends on the

quality of multimodal fusion, Han et al. (2021b)

construct a framework named MultiModal InfoMax

(MMIM) to maximize the mutual information in

unimodal input pairs as well as obtain information

related to tasks through multimodal fusion process.

Besides, Han et al. (2021a) make use of an end-to-

end network Bi-Bimodal Fusion Network (BBFN)

to better utilize the dynamics of independence and

correlation between modalities. Due to the uniﬁed

multimodal annotation, previous methods are re-

stricted in capturing diﬀerentiated information. Yu

et al. (2021) design a label generation module based

on the self-supervised learning strategy. Then, joint

training the multimodal and unimodal tasks to learn

the consistency and diﬀerence. However, limited by

the pre-processed features, the results show that the

generated audio and vision labels are not signiﬁcant

enough.

2.3 Missing Modality Problem

Compared with unimodal learning method, mul-

timodal learning has achieved great success. It

improves the performance of emotion recognition

tasks by eﬀectively combining the information from

diﬀerent modalities. However, the multimodal data

may have missing modalities in reality due to a

variety of reasons like signal transmission error

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

MissingModalitymeetsMetaSampling(M3S):AnEcientUniversalApproachforMultimodalSentimentAnalysiswithMissingModalityHaozheChiMinghuaYangJunhaoZhuGuanhongWangGaoangWangZhejiangUniversity-UniversityofIllinoisatUrbana-ChampaignInstitute,ZhejiangUniversity,China{haozhe.20,minghua.20,junhao.20,gaoangwang}@...

展开>> 收起<<

Missing Modality meets Meta Sampling M3S An Eﬃcient Universal Approach for Multimodal Sentiment Analysis with Missing Modality Haozhe Chi Minghua Yang Junhao Zhu Guanhong Wang Gaoang Wang.pdf

共10页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Missing Modality meets Meta Sampling M3S An Eﬃcient Universal Approach for Multimodal Sentiment Analysis with Missing Modality Haozhe Chi Minghua Yang Junhao Zhu Guanhong Wang Gaoang Wang

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: