Towards Domain-Independent Supervised Discourse Parsing Through Gradient Boosting Patrick Huber and Giuseppe Carenini

2025-05-06 0 0 399.19KB 5 页 10玖币
侵权投诉
Towards Domain-Independent Supervised Discourse Parsing
Through Gradient Boosting
Patrick Huber and Giuseppe Carenini
Department of Computer Science
University of British Columbia
Vancouver, BC, Canada, V6T 1Z4
{huberpat, carenini}@cs.ubc.ca
1 Introduction
Discourse analysis and discourse parsing have
shown great impact on many important prob-
lems in the field of Natural Language Process-
ing (NLP) (e.g., Ji and Smith (2017); Bhatia
et al. (2015); Nejat et al. (2017); Gerani et al.
(2014)). Given the direct impact of discourse
annotations on model performance and inter-
pretability, robustly extracting discourse struc-
tures from arbitrary documents is a key task
to further improve computational models in
NLP. To this end, a variety of complementary
discourse theories have been proposed in the
past, such as the lexicalized discourse frame-
work (Webber et al.,2003), the Segmented Dis-
course Representation Theory (SDRT) (Asher,
1993;Asher et al.,2003), and the Rhetorical
Structure Theory (RST) (Mann and Thomp-
son,1988), with RST focusing on the semantic
and pragmatic structure of complete monologue
documents, as used in this work.
Despite the importance of discourse analysis
and discourse parsing for the field of NLP and
the obvious value of the RST discourse theory
for many downstream applications, one major
limitation for a wider application of discourse
information is the severe data sparsity issue (for
instance, the popular RST-DT (Carlson et al.,
2002) and GUM (Zeldes,2017) treebanks do not
exceed a minuscule number of 400 documents).
Furthermore, while the data sparsity issue has
been a long-standing problem, modern, data-
intensive machine learning approaches further
reinforce its severity.
In general, three modelling alternatives have
been established in the current landscape: (i)
Supervised approaches (e.g., Ji and Eisenstein
(2014); Feng and Hirst (2014); Joty et al. (2015);
Li et al. (2016); Wang et al. (2017); Guz
et al. (2020)), performing well in the domain
in which they are trained, however, obtain
severely reduced performance if a domain shift
is present, as shown in Huber and Carenini
(2019,2020). (ii) Distantly supervised models
(e.g., Huber and Carenini (2019,2020); Nishida
and Nakayama (2020); Karimi and Tang (2019);
Liu et al. (2019); Huber et al. (2021); Xiao
et al. (2020)), aiming to overcome the domain
adaptation problem by exploiting large-scale
supervised datasets from context-sensitive aux-
iliary tasks (e.g., sentiment analysis). (iii) Self-
supervised/unsupervised methods (e.g., Zhu
et al. (2020); Koto et al. (2021); Wu et al.
(2020); Kobayashi et al. (2019); Huber and
Carenini (2021,2022)), predicting discourse
from either pre-trained language models, auto-
encoder style frameworks, or by recursively
computing dissimilarity scores.
In this landscape of models aiming to over-
come the data sparsity and domain dependency
of current discourse parsers, we present a new,
supervised paradigm directly tackling the do-
main adaptation issue. Specifically, we intro-
duce the first fully supervised discourse parser
designed to alleviate the domain dependency
through a staged model of weak classifiers by
introducing the gradient boosting framework
(Schaal and Atkeson,1995;Drucker et al.,1994;
Schwenk and Bengio,1997;Badirli et al.,2020)
into the process of discourse parsing. Using
the underlying assumption that any discourse
treebank contains a mix of frequently appear-
ing, general discourse features (applicable to
any domain) as well as a number of dataset-
related nuances (which are domain-specific), we
postulate that a set of weak classifiers is likely
to learn increasingly specific and rare features
of the training data. Using this assumption,
we can reasonably assume that there exists a
threshold of weak classifiers, which effectively
separates the general features of discourse from
arXiv:2210.09565v1 [cs.CL] 18 Oct 2022
摘要:

TowardsDomain-IndependentSupervisedDiscourseParsingThroughGradientBoostingPatrickHuberandGiuseppeCareniniDepartmentofComputerScienceUniversityofBritishColumbiaVancouver,BC,Canada,V6T1Z4{huberpat,carenini}@cs.ubc.ca1IntroductionDiscourseanalysisanddiscourseparsinghaveshowngreatimpactonmanyimportantpr...

展开>> 收起<<
Towards Domain-Independent Supervised Discourse Parsing Through Gradient Boosting Patrick Huber and Giuseppe Carenini.pdf

共5页,预览1页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:5 页 大小:399.19KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 5
客服
关注