Deploying a Retrieval based Response Model for Task Oriented Dialogues Lahari PoddarGyuri SzarvasCheng Wang Jorge Balazs Pavel Danchenko_2

2025-05-06 0 0 658.08KB 10 页 10玖币

侵权投诉

Deploying a Retrieval based Response Model for Task Oriented Dialogues

Lahari Poddar ∗Gyuri Szarvas ∗Cheng Wang

Jorge Balazs Pavel Danchenko

Amazon

{poddarl, szarvasg, cwngam, jabalazs, danchenk, peernst}@amazon.com

Patrick Ernst

Abstract

Task-oriented dialogue systems in industry

settings need to have high conversational ca-

pability, be easily adaptable to changing sit-

uations and conform to business constraints.

This paper describes a 3-step procedure to

develop a conversational model that satis-

ﬁes these criteria and can efﬁciently scale

to rank a large set of response candidates.

First, we provide a simple algorithm to semi-

automatically create a high-coverage template

set from historic conversations without any an-

notation. Second, we propose a neural archi-

tecture that encodes the dialogue context and

applicable business constraints as proﬁle fea-

tures for ranking the next turn. Third, we de-

scribe a two-stage learning strategy with self-

supervised training, followed by supervised

ﬁne-tuning on limited data collected through

a human-in-the-loop platform. Finally, we de-

scribe ofﬂine experiments and present results

of deploying our model with human-in-the-

loop to converse with live customers online.

1 Introduction

A Task Oriented Dialogue (TOD) system aims to

accomplish speciﬁc tasks such as hotel reserva-

tion (Budzianowski et al.,2018), ﬂight booking,

customer support (Moore et al.,2021) and so on.

An end-to-end TOD system directly takes a multi-

turn dialogue context as input and predicts the next

response with a single model (Wen et al.,2016).

These can be developed using either retrieval-

based approaches (Tao et al.,2021;Chen et al.,

2017) where the model ranks a response from a

pre-constructed response pool; or generative ap-

proaches where a response is sequentially gener-

ated with encoder-decoder architectures (Serban

et al.,2017;Sordoni et al.,2015). Although gen-

erative models are widely studied in literature for

dialogue systems (Hosseini-Asl et al.,2020;Yang

*These authors contributed equally

(a) V0: Response Ranking with Poly-Encoder

(b) V1: Response Ranking with Shared Bert

Figure 1: Production Ranking Models. The dialogue

history, response and proﬁle features are encoded with

transformers (top) or using a shared Bert. Cross-

attention layers learn the semantic correlation between

history, features and candidate response. A score func-

tion computes and ranks candidate responses.

et al.,2021) as they are capable to generate free

text, it is nearly impossible to provide guarantees

on the style, quality and privacy risks for their real-

world applications.

In this work, we focus on the development and

arXiv:2210.14379v1 [cs.CL] 25 Oct 2022

deployment of a retrieval-based conversational

system for an online retail store, in the customer

service domain.

Our main contributions are:

1. We design a simple yet effective algorithm

for generating a large, representative re-

sponse pool from un-annotated dialogues and

show that it can achieve high coverage for

handling natural language conversations.

2. We present an approach which combines self-

supervised training (from human-human con-

versations) and supervised ﬁne-tuning (from

human-in-the-loop interactions) for learning

dialogue models in real industry settings.

3. We enhance state-of-the-art Poly-Encoders

architecture for retrieval based dialogue sys-

tem, incorporating multi-modal information

from dialogue text, and non-textual features

associated with the order and the customer.

4. We present a breakdown of development and

deployment stages of the conversational sys-

tem from ofﬂine evaluation –> controlled

human-in-the-loop setting –> fully online on

live trafﬁc with real customer contacts.

2 Related Work

Retrieval-based dialogue systems (Tao et al.,

2021) involve single- and multi-turn response

matching (Chen et al.,2017;Lu et al.,2019;

Henderson et al.,2019;Gu et al.,2020;Whang

et al.,2020;Poddar et al.,2022;Xu et al.,2021;

Vig and Ramea,2019). The selection of an ap-

propriate response is usually based on comput-

ing and ranking the similarity between context

and response. Two popular model architectures

for such similarity computation between inputs,

is Cross-encoders (Wolf et al.,2019), which per-

form full self-attention over a given input and label

candidate; and Bi-encoders (Dinan et al.,2018),

which encode the input and candidate separately

and combine them at the end for a ﬁnal repre-

sentation. Bi-encoders have the ability to cache

the encoded candidates, and reuse their represen-

tations for fast inference. Cross-encoders, on the

other hand, often achieve higher accuracy but are

prohibitively slow at test time. A recent method,

Poly-encoders (Humeau et al.,2019), combines

the strengths from the two architectures, and al-

lows for caching response representations while

Figure 2: Template coverage on general conversations

for Return Refund intent. Upper bound is established

by adding templates to the pool based on human expert

suggestions through several months of active use.

implementing an attention mechanism between

context and response for improved performance.

Transformer-based architectures (Vaswani et al.,

2017;Devlin et al.,2019) are widely used to en-

code information in TOD systems. For instance,

TOD-BERT (Wu et al.,2020) incorporates user

and system tokens into the masked language mod-

eling task and uses a contrastive objective func-

tion to simulate the response selection task. In this

work, we also adapt the Transformer architectures

and enhance Poly-Encoders to encode conversa-

tional history, response and proﬁle features.

3 Response Pool Creation

We semi-automatically extract a broad template

pool from a large number of anonymized human

dialogues. We ﬁrst select the template texts from

human responses in actual dialogues. This ensures

that the bot language conforms to the desired style.

Our primary selection criteria for response can-

didates are frequency and novelty. We iteratively

select sentences that are (1) most frequently used

in human dialogues, and (2) contain information

different from already selected responses (detailed

algorithm in Appendix A). This directly maxi-

mizes the dialogue model’s coverage, as measured

by the fraction of contexts for which the model has

a suitable response in the pool. An alternative ap-

proach would have been clustering frequent sen-

tences and selecting a representative for each clus-

ter (Hong et al.,2020) as templates. We instead

opted for the deterministic procedure which is

more intuitive for ingesting prior linguistic knowl-

edge and provides interpretability.

Quantitative Evaluation of Coverage: Figure 2

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

DeployingaRetrievalbasedResponseModelforTaskOrientedDialoguesLahariPoddarGyuriSzarvasChengWangJorgeBalazsPavelDanchenkoAmazon{poddarl,szarvasg,cwngam,jabalazs,danchenk,peernst}@amazon.comPatrickErnstAbstractTask-orienteddialoguesystemsinindustrysettingsneedtohavehighconversationalca-pability,beeas...

展开>> 收起<<

Deploying a Retrieval based Response Model for Task Oriented Dialogues Lahari PoddarGyuri SzarvasCheng Wang Jorge Balazs Pavel Danchenko_2.pdf

共10页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Deploying a Retrieval based Response Model for Task Oriented Dialogues Lahari PoddarGyuri SzarvasCheng Wang Jorge Balazs Pavel Danchenko_2

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: