Similarity between Units of Natural Language The Transition from Coarse to Fine Estimation

2025-05-02 0 0 3.56MB 318 页 10玖币
侵权投诉
Similarity between Units of Natural Language:
The Transition from Coarse to Fine Estimation
Submitted by
MUWenchuan
Thesis Advisor
Dr. LIM Kwan Hui
arXiv:2210.14275v1 [cs.CL] 25 Oct 2022
ii
A thesis submitted to the Singapore University of Technology and Design in fulfillment
of the requirement for the degree of Doctor of Philosophy
2022
i
PhD Thesis Examination Committee
TEC Chair: Prof. Yuen Chau
Main Advisor: Prof. Lim Kwan Hui
Internal TEC member 1: Prof. Mohan Rajesh Elara
Internal TEC member 2: Prof. Dorien Herremans
ii
Abstract
Doctor of Philosophy
Similarity between Units of Natural Language: The Transition from Coarse to Fine
Estimation
by MUWenchuan
iii
Capturing the similarities between human language units is crucial for explaining how
humans associate different objects, and therefore its computation has received extensive attention,
research, and applications. With the ever-increasing amount of information around us, calculating
similarity becomes increasingly complex, especially in many cases, such as legal or medical
affairs, measuring similarity requires extra care and precision, as small acts within a language unit
can have significant real-world effects. My research goal in this thesis is to develop regression
models that account for similarities between language units in a more refined way.
Computation of similarity has come a long way, but approaches to debugging the measures
are often based on continually fitting human judgment values. To this end, my goal is to develop
an algorithm that precisely catches loopholes in a similarity calculation. Furthermore, most
methods have vague definitions of the similarities they compute and are often difficult to interpret.
The proposed framework addresses both shortcomings. Itconstantly improves the model through
catching different loopholes. In addition, every refinement of the model provides a reasonable
explanation. The regression model introduced in this thesis is called progressively refined
similarity computation, which combines attack testing with adversarial training. The similarity
regression model of this thesis achieves state-of-the-art performance in handling edge cases.
Chapter 2 is an introductory chapter on similarity computation in general. The main four chapters
of the thesis explore the applications of general similarities, how to capture their omissions, and
how to design a similarity model that can be refined over time.
The first practical work looks at applying similarity classify topics in online discussions on
social networking services such as Twitter. The popularity of these services, and the large number
of tweet, challenges the automatic topic detection models. To complicate matters, these topics
need to be identified in the absence of prior knowledge about their type and number, and expertise
is needed to tune numerous parameters. To address this challenge, I modified the cluster-based
topic modelling algorithm that is originally based on word networks and n-grams co-occurrence.
A more stable similarity scheme based on word embedding is used to construct networks, and the
refined algorithm can better utilise community detection methods to determine topics.
Similarity so far takes a binary value, despite its regression value being calculated. Only a
medium level of precision is required to determine whether similar or not. However, regression
摘要:

SimilaritybetweenUnitsofNaturalLanguage:TheTransitionfromCoarsetoFineEstimationSubmittedbyMUWenchuanThesisAdvisorDr.LIMKwanHuiiiAthesissubmittedtotheSingaporeUniversityofTechnologyandDesigninfulllmentoftherequirementforthedegreeofDoctorofPhilosophy2022iPhDThesisExaminationCommitteeTECChair:Prof.Yue...

展开>> 收起<<
Similarity between Units of Natural Language The Transition from Coarse to Fine Estimation.pdf

共318页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:318 页 大小:3.56MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 318
客服
关注