Transfer Learning with Joint Fine-Tuning for Multimodal Sentiment Analysis Guilherme Lourenc o de Toledo1Ricardo Marcacini1 Abstract

2025-04-26 0 0 182.25KB 4 页 10玖币
侵权投诉
Transfer Learning with Joint Fine-Tuning for Multimodal Sentiment Analysis
Guilherme Lourenc¸o de Toledo 1Ricardo Marcacini 1
Abstract
Most existing methods focus on sentiment anal-
ysis of textual data. However, recently there has
been a massive use of images and videos on so-
cial platforms, motivating sentiment analysis from
other modalities. Current studies show that ex-
ploring other modalities (e.g., images) increases
sentiment analysis performance. State-of-the-art
multimodal models, such as CLIP and Visual-
BERT, are pre-trained on datasets with the text
paired with images. Although the results obtained
by these models are promising, pre-training and
sentiment analysis fine-tuning tasks of these mod-
els are computationally expensive. This paper in-
troduces a transfer learning approach using joint
fine-tuning for sentiment analysis. Our proposal
achieved competitive results using a more straight-
forward alternative fine-tuning strategy that lever-
ages different pre-trained unimodal models and
efficiently combines them in a multimodal space.
Moreover, our proposal allows flexibility when
incorporating any pre-trained model for texts and
images during the joint fine-tuning stage, being
especially interesting for sentiment classification
in low-resource scenarios.
1. Introduction
Methods for sentiment analysis have been widely studied in
recent years, both in academia and industry (Birjali et al.,
2021). The key idea is to automatically identify sentiment
polarities from data, such as texts and images, in order to
analyze people’s opinions and emotions about products, ser-
vices, or other entities (Zhang et al.,2018). Most existing
methods focus on sentiment analysis on textual data (Poria
et al.,2018). However, recently there has been a massive
*
Equal contribution
1
Institute of Mathematics and
Computer Sciences (ICMC), University of S
˜
ao Paulo,
S
˜
ao Carlos-SP, Brazil. Correspondence to: Guilherme
Louren
c¸
o de Toledo
<
guitld@usp.br
>
, Ricardo Marcacini
<ricardo.marcacini@icmc.usp.br>.
Proceedings of the LXAI Workshop at
39 th
International Confer-
ence on Machine Learning, Baltimore, Maryland, USA, PMLR
162, 2022. Copyright 2022 by the author(s).
use of images and videos on social platforms, motivating
sentiment analysis from other modalities (Zhu et al.,2022).
Multimodal sentiment analysis was proposed to deal with
these scenarios and combine the different modalities into
more robust representations to improve sentiment classifica-
tion.
A crucial step for multimodal sentiment analysis in real-
world applications is to obtain sufficient training data, espe-
cially using state-of-the-art methods based on deep neural
models. In unimodal scenarios, such methods already de-
pend on large datasets for model training (Dang et al.,2020).
In the multimodal scenario, there is an extra challenge as-
sociated with the need to align instances of the different
modalities (Zhu et al.,2022). For example, a social media
post must contain both the image and the associated text to
form an instance in the multimodal scenario. Recent meth-
ods, such as CLIP (Radford et al.,2021) and VisualBERT
(Li et al.,2019), are pre-trained on datasets with the text
paired with images. Although the results obtained by these
models are promising, the need for multimodal pre-training
is computationally expensive. On the other hand, dozens
of pre-trained models for images (e.g., ResNet and Incep-
tion) and texts (BERT and Distilbert) have been proposed
and made available for fine-tuning specific tasks, which
require significantly less computational resources. Thus,
we raise the following question: how to fine-tune differ-
ent pre-trained unimodal models considering a multimodal
objective for sentiment analysis tasks?
This paper introduces a transfer learning approach using
joint fine-tuning for sentiment analysis, where joint fine-
tuning is a training technique considering multiple modali-
ties. Pre-existing joint fine-tuning techniques assume that
pre-trained models are originally multimodal (Yao et al.,
2020). On the other hand, our proposal is agnostic to initial
unimodal pre-trained models. Moreover, we allow fine-
tuning of both pre-trained models as a single loss function
during the sentiment classifier training step. In practice, we
are transferring knowledge from unimodal models that have
been pre-trained from different (unpaired) image and text
datasets. Our proposal is a deep neural network that incor-
porates two pre-trained models for both modalities (i.e., text
and image). A fusion layer is added to the output of these
models to project a latent space that unifies both modali-
ties. This latent space is a multimodal feature extractor used
arXiv:2210.05790v1 [cs.LG] 11 Oct 2022
摘要:

TransferLearningwithJointFine-TuningforMultimodalSentimentAnalysisGuilhermeLourenc¸odeToledo1RicardoMarcacini1AbstractMostexistingmethodsfocusonsentimentanal-ysisoftextualdata.However,recentlytherehasbeenamassiveuseofimagesandvideosonso-cialplatforms,motivatingsentimentanalysisfromothermodalities.Cu...

展开>> 收起<<
Transfer Learning with Joint Fine-Tuning for Multimodal Sentiment Analysis Guilherme Lourenc o de Toledo1Ricardo Marcacini1 Abstract.pdf

共4页,预览1页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:4 页 大小:182.25KB 格式:PDF 时间:2025-04-26

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 4
客服
关注