DISENTANGLED AND ROBUST REPRESENTATION LEARNING FOR BRAGGING CLASSIFICATION IN SOCIAL MEDIA Xiang Li1 Yucheng Zhou2

2025-05-03 0 0 578.59KB 5 页 10玖币
侵权投诉
DISENTANGLED AND ROBUST REPRESENTATION LEARNING
FOR BRAGGING CLASSIFICATION IN SOCIAL MEDIA
Xiang Li1, Yucheng Zhou2
1College of Intelligence and Computing, Tianjin University,
2Australian AI Institute, School of Computer Science, FEIT, University of Technology Sydney
lixiang eren@tju.edu.cn, yucheng.zhou-1@student.uts.edu.au
ABSTRACT
Researching bragging behavior on social media arouses in-
terest of computational (socio) linguists. However, existing
bragging classification datasets suffer from a serious data
imbalance issue. Because labeling a data-balance dataset is
expensive, most methods introduce external knowledge to im-
prove model learning. Nevertheless, such methods inevitably
introduce noise and non-relevance information from exter-
nal knowledge. To overcome the drawback, we propose a
novel bragging classification method with disentangle-based
representation augmentation and domain-aware adversarial
strategy. Specifically, model learns to disentangle and re-
construct representation and generate augmented features
via disentangle-based representation augmentation. More-
over, domain-aware adversarial strategy aims to constrain
domain of augmented features to improve their robustness.
Experimental results demonstrate that our method achieves
state-of-the-art performance compared to other methods.
Index TermsBragging Classification, Disentangled
Feature, Adversarial Learning, Social Media
1. INTRODUCTION
Bragging classification aims to predict the bragging type for a
social media text. As online communication on social media
is more pervasive and essential in human life, bragging (or
self-promotion) classification has become a significant area
in computational (socio) linguistics [1, 2]. It has been widely
applied in academia and industry, like helping linguists dive
into the context and types of bragging [2], supporting so-
cial scientists to study the relation between bragging and
other traits (e.g., gender, age, economic status, occupation)
[1, 3], enhancing online users’ self-presentation strategies
[4, 5], and many real-world NLP applications in business,
economics and education [6, 7].
Although bragging has been widely studied in the context
of online communication and forum, all these studies depend
on manual analyses on small data sets [8, 4, 9, 10, 3]. To
efficiently research bragging on social media, Jin et al. [2]
collect the first large-scale dataset of bragging classification in
computational linguistics, which contains six bragging types
and a non-bragging type. However, the dataset suffers from
a heavy data imbalance issue. For example, there are 2,838
examples in the non-bragging type, while only 58 to 166 (i.e.,
1% 4%) in the other bragging types. It severely affects the
learning of the model on examples with these bragging types.
To alleviate the data imbalance issue, apart from employ-
ing a weighted loss function to balance sample learning from
different types [11, 12], many researchers attempt to perform
data augmentation by injecting models with external knowl-
edge, such as knowledge graph [13, 14], pre-trained word em-
bedding [15, 16], translation [17] and some related pragmat-
ics tasks, i.e., complaint severity classification [18]. As for
bragging classification, Jin et al. [2] inject language models
with external knowledge from the NRC word-emotion lexi-
con, Linguistic Inquiry and Word Count(LIWC) and vectors
clustered by Word2Vector. Despite their success, improve-
ment of external knowledge injection relies on the relevance
between bragging classification and other pragmatics tasks.
However, knowledge provided by other pragmatic tasks is
fixed and obtained in a model-based manner, which inevitably
brings noise.
To get rid of the noise from external knowledge injec-
tion, we propose a disentangle-based feature augmentation
for disentangled representation and augmented feature learn-
ing without any other external knowledge. Specifically, we
first disentangle content and bragging-type information from
a representation. Next, we generate a reconstructed represen-
tation by integrating disentangled information and then con-
strain consistency between representation and reconstructed
representation. To address the data imbalance problem, we
fuse disentangled information from different examples to gen-
erate augmented features for model training.
Moreover, we propose a domain-aware adversarial strat-
egy to mitigate domain disorder caused by augmented fea-
tures. Specifically, we present a discriminator on top of the
language model, which is trained to distinguish whether the
input is a representation from the encoder or an augmented
feature. Meanwhile, jointing with a classification objective,
the encoder is trained to fool the discriminator, which pushes
arXiv:2210.15180v1 [cs.CL] 27 Oct 2022
摘要:

DISENTANGLEDANDROBUSTREPRESENTATIONLEARNINGFORBRAGGINGCLASSIFICATIONINSOCIALMEDIAXiangLi1,YuchengZhou21CollegeofIntelligenceandComputing,TianjinUniversity,2AustralianAIInstitute,SchoolofComputerScience,FEIT,UniversityofTechnologySydneylixiangeren@tju.edu.cn,yucheng.zhou-1@student.uts.edu.auABSTRACTR...

展开>> 收起<<
DISENTANGLED AND ROBUST REPRESENTATION LEARNING FOR BRAGGING CLASSIFICATION IN SOCIAL MEDIA Xiang Li1 Yucheng Zhou2.pdf

共5页,预览1页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:5 页 大小:578.59KB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 5
客服
关注