
Multi-view Semantic Matching of estion retrieval using
Fine-grained Semantic Representations
Li Chong
Renmin University of China
Beijing, China
chongli@ruc.edu.cn
Denghao Ma
Meituan
Beijing, China
madenghao5@gmail.com
Yueguo Chen
Renmin University of China
Beijing, China
chenyueguo@ruc.edu.cn
ABSTRACT
As a key task of question answering, question retrieval has attracted
much attention from the communities of academia and industry.
Previous solutions mainly focus on the translation model, topic
model, and deep learning techniques. Distinct from the previous
solutions, we propose to construct ne-grained semantic repre-
sentations of a question by a learned importance score assigned
to each keyword, so that we can achieve a ne-grained question
matching solution with these semantic representations of dierent
lengths. Accordingly, we propose a multi-view semantic matching
model by reusing the important keywords in multiple semantic
representations.
As a key of constructing ne-grained semantic representations,
we are the rst to use a cross-task weakly supervised extraction
model that applies question-question labelled signals to supervise
the keyword extraction process (i.e. to learn the keyword impor-
tance). The extraction model integrates the deep semantic repre-
sentation and lexical matching information with statistical features
to estimate the importance of keywords. We conduct extensive
experiments on three public datasets and the experimental results
show that our proposed model signicantly outperforms the state-
of-the-art solutions.
CCS CONCEPTS
•Computer systems organization →Embedded systems
;Re-
dundancy; Robotics; •Networks →Network reliability.
KEYWORDS
Question answering, Question retrieval, Semantic representation
ACM Reference Format:
Li Chong, Denghao Ma, and Yueguo Chen. 2018. Multi-view Semantic Match-
ing of Question retrieval using Fine-grained Semantic Representations. In
Proceedings of Make sure to enter the correct conference title from your rights
conrmation emai (Conference acronym ’XX). ACM, New York, NY, USA,
10 pages. https://doi.org/XXXXXXX.XXXXXXX
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from permissions@acm.org.
Conference acronym ’XX, June 03–05, 2018, Woodstock, NY
©2018 Association for Computing Machinery.
ACM ISBN 978-1-4503-XXXX-X/18/06. . . $15.00
https://doi.org/XXXXXXX.XXXXXXX
1 INTRODUCTION
Question retrieval is to retrieve semantically equivalent questions
from an archived repository with large quantities of questions. It
has been veried very important to both community-based ques-
tion answering (CQA) [
14
], e.g., Yahoo! Answers,Quora and Baidu
Knows, and domain-specic question answering, e.g., Apple Siri and
Microsoft Azure Bot. Although many researchers from the commu-
nities of academia and industry have paid much attention to the
task, it still suers from a key challenge, i.e., the lexical gap. The lex-
ical gap contains two aspects: 1) textually distinct yet semantically
equivalent; 2) textually similar yet semantically distinct. For exam-
ple in Figure 1, given three questions
𝑞
=“how to keep the mobile
phone cool”, 𝑑1=“stop my iphone from overheating” and 𝑑2=“how
to keep the mobile phone fast”,
𝑞
and
𝑑1
are textually distinct yet
semantically equivalent, while
𝑞
and
𝑑2
are textually similar yet
semantically distinct. The lexical gap hinders the eective retrieval
of semantically equivalent questions from the archives to a user’s
query (a question or some keywords).
To address the challenge of textually distinct yet semantically
equivalent, many solutions have been proposed based on the trans-
lation model [
13
], topic model [
8
], and deep learning techniques[
19
].
The solutions based on translation model [
13
,
30
] learn translation
probabilities for both words and phrases over some parallel corpora,
and then use the translation probabilities to estimate the semantic
similarity of two questions. The solutions based on topic model
[
1
,
8
,
15
] construct the latent topic distributions for questions and
estimate the semantic similarity of the two questions based on
their topic distributions. Deep learning solutions mainly focus on
1) modelling the semantic relations of two questions by dierent
neural network architectures and then estimating their similarity
based on the relations [
16
,
20
,
28
]; 2) constructing good semantic
representations for questions, such as BERT [
5
] and ERNIE [
22
].
These solutions model the semantics of a question by using a global
representation, i.e., a bag of words, topic distribution and embed-
ding, and globally match two questions. Just because of the “global
representation”, questions with similar text yet distinct semantics
have similar topic distribution representation and embedding rep-
resentation; Because of the “globally match”, the above solutions
hardly distinguish the subtle dierences of important keywords of
two questions and thus can not eectively address the challenge of
textually similar yet semantically distinct.
To address the two aspects of the lexical gap, this paper attempts
to nd answers for the following two research questions: 1) How
to represent questions? 2) How to match two questions?
Insight one: multi-level keyword sets.
We propose new in-
sights of reusing important keywords to construct ne-grained
arXiv:2210.11806v2 [cs.IR] 16 Feb 2023