Multi-view Semantic Matching of Question retrieval using Fine-grained Semantic Representations

2025-05-02 0 0 4.92MB 10 页 10玖币
侵权投诉
Multi-view Semantic Matching of estion retrieval using
Fine-grained Semantic Representations
Li Chong
Renmin University of China
Beijing, China
chongli@ruc.edu.cn
Denghao Ma
Meituan
Beijing, China
madenghao5@gmail.com
Yueguo Chen
Renmin University of China
Beijing, China
chenyueguo@ruc.edu.cn
ABSTRACT
As a key task of question answering, question retrieval has attracted
much attention from the communities of academia and industry.
Previous solutions mainly focus on the translation model, topic
model, and deep learning techniques. Distinct from the previous
solutions, we propose to construct ne-grained semantic repre-
sentations of a question by a learned importance score assigned
to each keyword, so that we can achieve a ne-grained question
matching solution with these semantic representations of dierent
lengths. Accordingly, we propose a multi-view semantic matching
model by reusing the important keywords in multiple semantic
representations.
As a key of constructing ne-grained semantic representations,
we are the rst to use a cross-task weakly supervised extraction
model that applies question-question labelled signals to supervise
the keyword extraction process (i.e. to learn the keyword impor-
tance). The extraction model integrates the deep semantic repre-
sentation and lexical matching information with statistical features
to estimate the importance of keywords. We conduct extensive
experiments on three public datasets and the experimental results
show that our proposed model signicantly outperforms the state-
of-the-art solutions.
CCS CONCEPTS
Computer systems organization Embedded systems
;Re-
dundancy; Robotics; Networks Network reliability.
KEYWORDS
Question answering, Question retrieval, Semantic representation
ACM Reference Format:
Li Chong, Denghao Ma, and Yueguo Chen. 2018. Multi-view Semantic Match-
ing of Question retrieval using Fine-grained Semantic Representations. In
Proceedings of Make sure to enter the correct conference title from your rights
conrmation emai (Conference acronym ’XX). ACM, New York, NY, USA,
10 pages. https://doi.org/XXXXXXX.XXXXXXX
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from permissions@acm.org.
Conference acronym ’XX, June 03–05, 2018, Woodstock, NY
©2018 Association for Computing Machinery.
ACM ISBN 978-1-4503-XXXX-X/18/06. . . $15.00
https://doi.org/XXXXXXX.XXXXXXX
1 INTRODUCTION
Question retrieval is to retrieve semantically equivalent questions
from an archived repository with large quantities of questions. It
has been veried very important to both community-based ques-
tion answering (CQA) [
14
], e.g., Yahoo! Answers,Quora and Baidu
Knows, and domain-specic question answering, e.g., Apple Siri and
Microsoft Azure Bot. Although many researchers from the commu-
nities of academia and industry have paid much attention to the
task, it still suers from a key challenge, i.e., the lexical gap. The lex-
ical gap contains two aspects: 1) textually distinct yet semantically
equivalent; 2) textually similar yet semantically distinct. For exam-
ple in Figure 1, given three questions
𝑞
=“how to keep the mobile
phone cool”, 𝑑1=“stop my iphone from overheating” and 𝑑2=“how
to keep the mobile phone fast”,
𝑞
and
𝑑1
are textually distinct yet
semantically equivalent, while
𝑞
and
𝑑2
are textually similar yet
semantically distinct. The lexical gap hinders the eective retrieval
of semantically equivalent questions from the archives to a user’s
query (a question or some keywords).
To address the challenge of textually distinct yet semantically
equivalent, many solutions have been proposed based on the trans-
lation model [
13
], topic model [
8
], and deep learning techniques[
19
].
The solutions based on translation model [
13
,
30
] learn translation
probabilities for both words and phrases over some parallel corpora,
and then use the translation probabilities to estimate the semantic
similarity of two questions. The solutions based on topic model
[
1
,
8
,
15
] construct the latent topic distributions for questions and
estimate the semantic similarity of the two questions based on
their topic distributions. Deep learning solutions mainly focus on
1) modelling the semantic relations of two questions by dierent
neural network architectures and then estimating their similarity
based on the relations [
16
,
20
,
28
]; 2) constructing good semantic
representations for questions, such as BERT [
5
] and ERNIE [
22
].
These solutions model the semantics of a question by using a global
representation, i.e., a bag of words, topic distribution and embed-
ding, and globally match two questions. Just because of the “global
representation”, questions with similar text yet distinct semantics
have similar topic distribution representation and embedding rep-
resentation; Because of the “globally match”, the above solutions
hardly distinguish the subtle dierences of important keywords of
two questions and thus can not eectively address the challenge of
textually similar yet semantically distinct.
To address the two aspects of the lexical gap, this paper attempts
to nd answers for the following two research questions: 1) How
to represent questions? 2) How to match two questions?
Insight one: multi-level keyword sets.
We propose new in-
sights of reusing important keywords to construct ne-grained
arXiv:2210.11806v2 [cs.IR] 16 Feb 2023
Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Chong and Ma, et al.
Figure 1: Insights of reusing important keywords to construct the ne-grained representations and matchings.
semantic representations of questions and then ne-grained match-
ings. For example in Figure 1, given two questions
𝑞
=“how to keep
the mobile phone cool” and
𝑑1
=“stop my iphone from overheat-
ing”, their corresponding keywords can be extracted and assigned
dierent importance scores; Based on the importance scores, the
multi-level keyword sets can be generated, i.e., the question
𝑞
has 7
keyword sets and
𝑑1
has 5keyword sets; For
𝑞
,
𝐾𝑆1(𝑞)
represents
the nest semantic representation of
𝑞
, and
𝐾𝑆7(𝑞)
represents the
coarsest one of
𝑞
. The multi-level keyword sets can model the ques-
tion semantics of various granularity, i.e., from “nest” to “coarsest”.
The nest keyword set can be used for modelling the global seman-
tics of a question and then addressing the challenge of textually
distinct yet semantically equivalent. The other sets are reusing the
important keywords so that the subtle dierences of important
keywords of two questions can be distinguished and then the chal-
lenge of textually similar yet semantically distinct can be addressed
by the ne-grained matchings. For example in Figure 1, for the
question
𝑞
and
𝑑2
, the matchings between
𝐾𝑆5(𝑞)
and
𝐾𝑆5(𝑑2)
as
well as
𝐾𝑆6(𝑞)
and
𝐾𝑆6(𝑑2)
can eectively identify the semantic
distinction between 𝑞and 𝑑2.
Insight two: comparable keyword set pairs.
Given two ques-
tions and their multi-level keyword sets, how to match them? For
example in Figure 1, the question
𝑞
and
𝑑1
have 7and 5keyword
sets respectively; We match
𝐾𝑆1(𝑞)
to
𝐾𝑆1(𝑑1)
,
𝐾𝑆2(𝑞)
to
𝐾𝑆1(𝑑1)
,
𝐾𝑆3(𝑞)
to
𝐾𝑆2(𝑑1)
,
𝐾𝑆4(𝑞)
to
𝐾𝑆3(𝑑1)
,
𝐾𝑆5(𝑞)
to
𝐾𝑆3(𝑑1)
,
𝐾𝑆6(𝑞)
to
𝐾𝑆4(𝑑1)
and
𝐾𝑆7(𝑞)
to
𝐾𝑆5(𝑑1)
, because each pair is at the same
or similar semantic level and is comparable. Reversely,
𝐾𝑆1(𝑞)
and
𝐾𝑆5(𝑑1)
as well as
𝐾𝑆7(𝑞)
and
𝐾𝑆1(𝑑1)
are not comparable because
they are at dierent semantic level. The matchings on compara-
ble keyword set pairs can benet the similarity estimation of two
questions, while that on the incomparable keyword set pairs will
hurt the similarity estimation. Therefore, to eectively estimate the
similarity of two questions, we need to construct the comparable
keyword set pairs for the two questions.
Solution: ne-grained matching network.
Based on the in-
sights, we propose the ne-grained matching network with two
cascaded units to estimate the similarity of two questions.
Fine-grained Representation Unit. According to the insight one, the
ne-grained representation unit constructs the multi-level keyword
sets of a question for modelling its ne-grained semantics. Because
of the deciency of keyword annotations, we are the rst to use
the question-question labeled signals from the training set of ques-
tion retrieval to supervise the keyword extraction (i.e., learn the
keyword importance), and develop a cross-task weakly supervised
extraction model. In the model, we integrate deep semantic repre-
sentations, lexical information, part-of-speech and statistical fea-
tures to estimate the importance of keywords in a question. Based
on the importance, the multi-level keyword sets are generated by
iteratively removing one keyword with the lowest importance.
Fine-grained Matching Unit. According to the insight two, we de-
sign a pattern-based assignment method to construct the compa-
rable keyword set pairs from the multi-level keyword sets of two
questions. The ne-grained matching unit estimates the semantic
similarity of two questions by matching their comparable keyword
set pairs. First, it matches two questions from the global matching
to local matching by using the comparable pairs of dierent gran-
ularity. Second, it matches every comparable pair from both the
semantic matching and lexical matching perspectives. For the se-
mantic matching, we develop two types of methods, i.e., MLP-based
matching and attention-based matching; For the lexical matching,
we use some widely used textual matching models such as BM25
and the Jaccard similarity. Third, it aggregates these matchings of
multiple comparable pairs by learning their weights, and outputs
an overall score as the semantic similarity of two questions.
Our contributions are concluded as follows:
We propose new insights of reusing important keywords to con-
struct the ne-grained representations and matchings, and design
the ne-grained matching network to estimate the similarity of
two questions from multiple granularities and multiple views.
We are the rst to use question-question labeled signals to su-
pervise the keyword extraction process and develop the cross-task
weakly supervised extraction model.
摘要:

Multi-viewSemanticMatchingofQuestionretrievalusingFine-grainedSemanticRepresentationsLiChongRenminUniversityofChinaBeijing,Chinachongli@ruc.edu.cnDenghaoMaMeituanBeijing,Chinamadenghao5@gmail.comYueguoChenRenminUniversityofChinaBeijing,Chinachenyueguo@ruc.edu.cnABSTRACTAsakeytaskofquestionanswering,...

展开>> 收起<<
Multi-view Semantic Matching of Question retrieval using Fine-grained Semantic Representations.pdf

共10页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:10 页 大小:4.92MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 10
客服
关注