Multi-view Semantic Matching of Question retrieval using Fine-grained Semantic Representations

2025-05-02 0 0 4.92MB 10 页 10玖币

侵权投诉

Multi-view Semantic Matching of estion retrieval using

Fine-grained Semantic Representations

Li Chong

Renmin University of China

Beijing, China

chongli@ruc.edu.cn

Denghao Ma

Meituan

Beijing, China

madenghao5@gmail.com

Yueguo Chen

Renmin University of China

Beijing, China

chenyueguo@ruc.edu.cn

ABSTRACT

As a key task of question answering, question retrieval has attracted

much attention from the communities of academia and industry.

Previous solutions mainly focus on the translation model, topic

model, and deep learning techniques. Distinct from the previous

solutions, we propose to construct ne-grained semantic repre-

sentations of a question by a learned importance score assigned

to each keyword, so that we can achieve a ne-grained question

matching solution with these semantic representations of dierent

lengths. Accordingly, we propose a multi-view semantic matching

model by reusing the important keywords in multiple semantic

representations.

As a key of constructing ne-grained semantic representations,

we are the rst to use a cross-task weakly supervised extraction

model that applies question-question labelled signals to supervise

the keyword extraction process (i.e. to learn the keyword impor-

tance). The extraction model integrates the deep semantic repre-

sentation and lexical matching information with statistical features

to estimate the importance of keywords. We conduct extensive

experiments on three public datasets and the experimental results

show that our proposed model signicantly outperforms the state-

of-the-art solutions.

CCS CONCEPTS

•Computer systems organization →Embedded systems

;Re-

dundancy; Robotics; •Networks →Network reliability.

KEYWORDS

Question answering, Question retrieval, Semantic representation

ACM Reference Format:

Li Chong, Denghao Ma, and Yueguo Chen. 2018. Multi-view Semantic Match-

ing of Question retrieval using Fine-grained Semantic Representations. In

Proceedings of Make sure to enter the correct conference title from your rights

conrmation emai (Conference acronym ’XX). ACM, New York, NY, USA,

10 pages. https://doi.org/XXXXXXX.XXXXXXX

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for prot or commercial advantage and that copies bear this notice and the full citation

on the rst page. Copyrights for components of this work owned by others than ACM

must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,

to post on servers or to redistribute to lists, requires prior specic permission and/or a

fee. Request permissions from permissions@acm.org.

Conference acronym ’XX, June 03–05, 2018, Woodstock, NY

ACM ISBN 978-1-4503-XXXX-X/18/06. . . $15.00

https://doi.org/XXXXXXX.XXXXXXX

1 INTRODUCTION

Question retrieval is to retrieve semantically equivalent questions

from an archived repository with large quantities of questions. It

has been veried very important to both community-based ques-

tion answering (CQA) [

], e.g., Yahoo! Answers,Quora and Baidu

Knows, and domain-specic question answering, e.g., Apple Siri and

Microsoft Azure Bot. Although many researchers from the commu-

nities of academia and industry have paid much attention to the

task, it still suers from a key challenge, i.e., the lexical gap. The lex-

ical gap contains two aspects: 1) textually distinct yet semantically

equivalent; 2) textually similar yet semantically distinct. For exam-

ple in Figure 1, given three questions

𝑞

=“how to keep the mobile

phone cool”, 𝑑1=“stop my iphone from overheating” and 𝑑2=“how

to keep the mobile phone fast”,

𝑞

and

𝑑1

are textually distinct yet

semantically equivalent, while

𝑞

and

𝑑2

are textually similar yet

semantically distinct. The lexical gap hinders the eective retrieval

of semantically equivalent questions from the archives to a user’s

query (a question or some keywords).

To address the challenge of textually distinct yet semantically

equivalent, many solutions have been proposed based on the trans-

lation model [

], topic model [

], and deep learning techniques[

The solutions based on translation model [

] learn translation

probabilities for both words and phrases over some parallel corpora,

and then use the translation probabilities to estimate the semantic

similarity of two questions. The solutions based on topic model

[

] construct the latent topic distributions for questions and

estimate the semantic similarity of the two questions based on

their topic distributions. Deep learning solutions mainly focus on

1) modelling the semantic relations of two questions by dierent

neural network architectures and then estimating their similarity

based on the relations [

]; 2) constructing good semantic

representations for questions, such as BERT [

] and ERNIE [

These solutions model the semantics of a question by using a global

representation, i.e., a bag of words, topic distribution and embed-

ding, and globally match two questions. Just because of the “global

representation”, questions with similar text yet distinct semantics

have similar topic distribution representation and embedding rep-

resentation; Because of the “globally match”, the above solutions

hardly distinguish the subtle dierences of important keywords of

two questions and thus can not eectively address the challenge of

textually similar yet semantically distinct.

To address the two aspects of the lexical gap, this paper attempts

to nd answers for the following two research questions: 1) How

to represent questions? 2) How to match two questions?

Insight one: multi-level keyword sets.

We propose new in-

sights of reusing important keywords to construct ne-grained

arXiv:2210.11806v2 [cs.IR] 16 Feb 2023

Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Chong and Ma, et al.

Figure 1: Insights of reusing important keywords to construct the ne-grained representations and matchings.

semantic representations of questions and then ne-grained match-

ings. For example in Figure 1, given two questions

𝑞

=“how to keep

the mobile phone cool” and

𝑑1

=“stop my iphone from overheat-

ing”, their corresponding keywords can be extracted and assigned

dierent importance scores; Based on the importance scores, the

multi-level keyword sets can be generated, i.e., the question

𝑞

has 7

keyword sets and

𝑑1

has 5keyword sets; For

𝑞

𝐾𝑆1(𝑞)

represents

the nest semantic representation of

𝑞

, and

𝐾𝑆7(𝑞)

represents the

coarsest one of

𝑞

. The multi-level keyword sets can model the ques-

tion semantics of various granularity, i.e., from “nest” to “coarsest”.

The nest keyword set can be used for modelling the global seman-

tics of a question and then addressing the challenge of textually

distinct yet semantically equivalent. The other sets are reusing the

important keywords so that the subtle dierences of important

keywords of two questions can be distinguished and then the chal-

lenge of textually similar yet semantically distinct can be addressed

by the ne-grained matchings. For example in Figure 1, for the

question

𝑞

and

𝑑2

, the matchings between

𝐾𝑆5(𝑞)

and

𝐾𝑆5(𝑑2)

well as

𝐾𝑆6(𝑞)

and

𝐾𝑆6(𝑑2)

can eectively identify the semantic

distinction between 𝑞and 𝑑2.

Insight two: comparable keyword set pairs.

Given two ques-

tions and their multi-level keyword sets, how to match them? For

example in Figure 1, the question

𝑞

and

𝑑1

have 7and 5keyword

sets respectively; We match

𝐾𝑆1(𝑞)

𝐾𝑆1(𝑑1)

𝐾𝑆2(𝑞)

𝐾𝑆1(𝑑1)

𝐾𝑆3(𝑞)

𝐾𝑆2(𝑑1)

𝐾𝑆4(𝑞)

𝐾𝑆3(𝑑1)

𝐾𝑆5(𝑞)

𝐾𝑆3(𝑑1)

𝐾𝑆6(𝑞)

𝐾𝑆4(𝑑1)

and

𝐾𝑆7(𝑞)

𝐾𝑆5(𝑑1)

, because each pair is at the same

or similar semantic level and is comparable. Reversely,

𝐾𝑆1(𝑞)

and

𝐾𝑆5(𝑑1)

as well as

𝐾𝑆7(𝑞)

and

𝐾𝑆1(𝑑1)

are not comparable because

they are at dierent semantic level. The matchings on compara-

ble keyword set pairs can benet the similarity estimation of two

questions, while that on the incomparable keyword set pairs will

hurt the similarity estimation. Therefore, to eectively estimate the

similarity of two questions, we need to construct the comparable

keyword set pairs for the two questions.

Solution: ne-grained matching network.

Based on the in-

sights, we propose the ne-grained matching network with two

cascaded units to estimate the similarity of two questions.

Fine-grained Representation Unit. According to the insight one, the

ne-grained representation unit constructs the multi-level keyword

sets of a question for modelling its ne-grained semantics. Because

of the deciency of keyword annotations, we are the rst to use

the question-question labeled signals from the training set of ques-

tion retrieval to supervise the keyword extraction (i.e., learn the

keyword importance), and develop a cross-task weakly supervised

extraction model. In the model, we integrate deep semantic repre-

sentations, lexical information, part-of-speech and statistical fea-

tures to estimate the importance of keywords in a question. Based

on the importance, the multi-level keyword sets are generated by

iteratively removing one keyword with the lowest importance.

Fine-grained Matching Unit. According to the insight two, we de-

sign a pattern-based assignment method to construct the compa-

rable keyword set pairs from the multi-level keyword sets of two

questions. The ne-grained matching unit estimates the semantic

similarity of two questions by matching their comparable keyword

set pairs. First, it matches two questions from the global matching

to local matching by using the comparable pairs of dierent gran-

ularity. Second, it matches every comparable pair from both the

semantic matching and lexical matching perspectives. For the se-

mantic matching, we develop two types of methods, i.e., MLP-based

matching and attention-based matching; For the lexical matching,

we use some widely used textual matching models such as BM25

and the Jaccard similarity. Third, it aggregates these matchings of

multiple comparable pairs by learning their weights, and outputs

an overall score as the semantic similarity of two questions.

Our contributions are concluded as follows:

•

We propose new insights of reusing important keywords to con-

struct the ne-grained representations and matchings, and design

the ne-grained matching network to estimate the similarity of

two questions from multiple granularities and multiple views.

•

We are the rst to use question-question labeled signals to su-

pervise the keyword extraction process and develop the cross-task

weakly supervised extraction model.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Multi-viewSemanticMatchingofQuestionretrievalusingFine-grainedSemanticRepresentationsLiChongRenminUniversityofChinaBeijing,Chinachongli@ruc.edu.cnDenghaoMaMeituanBeijing,Chinamadenghao5@gmail.comYueguoChenRenminUniversityofChinaBeijing,Chinachenyueguo@ruc.edu.cnABSTRACTAsakeytaskofquestionanswering,...

展开>> 收起<<

Multi-view Semantic Matching of Question retrieval using Fine-grained Semantic Representations.pdf

共10页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Multi-view Semantic Matching of Question retrieval using Fine-grained Semantic Representations

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: