Transfer Learning with Joint Fine-Tuning for Multimodal Sentiment Analysis Guilherme Lourenc o de Toledo1Ricardo Marcacini1 Abstract

2025-04-26 0 0 182.25KB 4 页 10玖币

侵权投诉

Transfer Learning with Joint Fine-Tuning for Multimodal Sentiment Analysis

Guilherme Lourenc¸o de Toledo 1Ricardo Marcacini 1

Abstract

Most existing methods focus on sentiment anal-

ysis of textual data. However, recently there has

been a massive use of images and videos on so-

cial platforms, motivating sentiment analysis from

other modalities. Current studies show that ex-

ploring other modalities (e.g., images) increases

sentiment analysis performance. State-of-the-art

multimodal models, such as CLIP and Visual-

BERT, are pre-trained on datasets with the text

paired with images. Although the results obtained

by these models are promising, pre-training and

sentiment analysis ﬁne-tuning tasks of these mod-

els are computationally expensive. This paper in-

troduces a transfer learning approach using joint

ﬁne-tuning for sentiment analysis. Our proposal

achieved competitive results using a more straight-

forward alternative ﬁne-tuning strategy that lever-

ages different pre-trained unimodal models and

efﬁciently combines them in a multimodal space.

Moreover, our proposal allows ﬂexibility when

incorporating any pre-trained model for texts and

images during the joint ﬁne-tuning stage, being

especially interesting for sentiment classiﬁcation

in low-resource scenarios.

1. Introduction

Methods for sentiment analysis have been widely studied in

recent years, both in academia and industry (Birjali et al.,

2021). The key idea is to automatically identify sentiment

polarities from data, such as texts and images, in order to

analyze people’s opinions and emotions about products, ser-

vices, or other entities (Zhang et al.,2018). Most existing

methods focus on sentiment analysis on textual data (Poria

et al.,2018). However, recently there has been a massive

Equal contribution

Institute of Mathematics and

Computer Sciences (ICMC), University of S

ao Paulo,

ao Carlos-SP, Brazil. Correspondence to: Guilherme

Louren

c¸

o de Toledo

guitld@usp.br

, Ricardo Marcacini

<ricardo.marcacini@icmc.usp.br>.

Proceedings of the LXAI Workshop at

39 th

International Confer-

ence on Machine Learning, Baltimore, Maryland, USA, PMLR

use of images and videos on social platforms, motivating

sentiment analysis from other modalities (Zhu et al.,2022).

Multimodal sentiment analysis was proposed to deal with

these scenarios and combine the different modalities into

more robust representations to improve sentiment classiﬁca-

tion.

A crucial step for multimodal sentiment analysis in real-

world applications is to obtain sufﬁcient training data, espe-

cially using state-of-the-art methods based on deep neural

models. In unimodal scenarios, such methods already de-

pend on large datasets for model training (Dang et al.,2020).

In the multimodal scenario, there is an extra challenge as-

sociated with the need to align instances of the different

modalities (Zhu et al.,2022). For example, a social media

post must contain both the image and the associated text to

form an instance in the multimodal scenario. Recent meth-

ods, such as CLIP (Radford et al.,2021) and VisualBERT

(Li et al.,2019), are pre-trained on datasets with the text

paired with images. Although the results obtained by these

models are promising, the need for multimodal pre-training

is computationally expensive. On the other hand, dozens

of pre-trained models for images (e.g., ResNet and Incep-

tion) and texts (BERT and Distilbert) have been proposed

and made available for ﬁne-tuning speciﬁc tasks, which

require signiﬁcantly less computational resources. Thus,

we raise the following question: how to ﬁne-tune differ-

ent pre-trained unimodal models considering a multimodal

objective for sentiment analysis tasks?

This paper introduces a transfer learning approach using

joint ﬁne-tuning for sentiment analysis, where joint ﬁne-

tuning is a training technique considering multiple modali-

ties. Pre-existing joint ﬁne-tuning techniques assume that

pre-trained models are originally multimodal (Yao et al.,

2020). On the other hand, our proposal is agnostic to initial

unimodal pre-trained models. Moreover, we allow ﬁne-

tuning of both pre-trained models as a single loss function

during the sentiment classiﬁer training step. In practice, we

are transferring knowledge from unimodal models that have

been pre-trained from different (unpaired) image and text

datasets. Our proposal is a deep neural network that incor-

porates two pre-trained models for both modalities (i.e., text

and image). A fusion layer is added to the output of these

models to project a latent space that uniﬁes both modali-

ties. This latent space is a multimodal feature extractor used

arXiv:2210.05790v1 [cs.LG] 11 Oct 2022

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

TransferLearningwithJointFine-TuningforMultimodalSentimentAnalysisGuilhermeLourenc¸odeToledo1RicardoMarcacini1AbstractMostexistingmethodsfocusonsentimentanal-ysisoftextualdata.However,recentlytherehasbeenamassiveuseofimagesandvideosonso-cialplatforms,motivatingsentimentanalysisfromothermodalities.Cu...

展开>> 收起<<

Transfer Learning with Joint Fine-Tuning for Multimodal Sentiment Analysis Guilherme Lourenc o de Toledo1Ricardo Marcacini1 Abstract.pdf

共4页,预览1页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Transfer Learning with Joint Fine-Tuning for Multimodal Sentiment Analysis Guilherme Lourenc o de Toledo1Ricardo Marcacini1 Abstract

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: