Deploying a Retrieval based Response Model for Task Oriented Dialogues Lahari PoddarGyuri SzarvasCheng Wang Jorge Balazs Pavel Danchenko_2

2025-05-06 0 0 658.08KB 10 页 10玖币
侵权投诉
Deploying a Retrieval based Response Model for Task Oriented Dialogues
Lahari Poddar Gyuri Szarvas Cheng Wang
Jorge Balazs Pavel Danchenko
Amazon
{poddarl, szarvasg, cwngam, jabalazs, danchenk, peernst}@amazon.com
Patrick Ernst
Abstract
Task-oriented dialogue systems in industry
settings need to have high conversational ca-
pability, be easily adaptable to changing sit-
uations and conform to business constraints.
This paper describes a 3-step procedure to
develop a conversational model that satis-
fies these criteria and can efficiently scale
to rank a large set of response candidates.
First, we provide a simple algorithm to semi-
automatically create a high-coverage template
set from historic conversations without any an-
notation. Second, we propose a neural archi-
tecture that encodes the dialogue context and
applicable business constraints as profile fea-
tures for ranking the next turn. Third, we de-
scribe a two-stage learning strategy with self-
supervised training, followed by supervised
fine-tuning on limited data collected through
a human-in-the-loop platform. Finally, we de-
scribe offline experiments and present results
of deploying our model with human-in-the-
loop to converse with live customers online.
1 Introduction
A Task Oriented Dialogue (TOD) system aims to
accomplish specific tasks such as hotel reserva-
tion (Budzianowski et al.,2018), flight booking,
customer support (Moore et al.,2021) and so on.
An end-to-end TOD system directly takes a multi-
turn dialogue context as input and predicts the next
response with a single model (Wen et al.,2016).
These can be developed using either retrieval-
based approaches (Tao et al.,2021;Chen et al.,
2017) where the model ranks a response from a
pre-constructed response pool; or generative ap-
proaches where a response is sequentially gener-
ated with encoder-decoder architectures (Serban
et al.,2017;Sordoni et al.,2015). Although gen-
erative models are widely studied in literature for
dialogue systems (Hosseini-Asl et al.,2020;Yang
*These authors contributed equally
(a) V0: Response Ranking with Poly-Encoder
(b) V1: Response Ranking with Shared Bert
Figure 1: Production Ranking Models. The dialogue
history, response and profile features are encoded with
transformers (top) or using a shared Bert. Cross-
attention layers learn the semantic correlation between
history, features and candidate response. A score func-
tion computes and ranks candidate responses.
et al.,2021) as they are capable to generate free
text, it is nearly impossible to provide guarantees
on the style, quality and privacy risks for their real-
world applications.
In this work, we focus on the development and
arXiv:2210.14379v1 [cs.CL] 25 Oct 2022
deployment of a retrieval-based conversational
system for an online retail store, in the customer
service domain.
Our main contributions are:
1. We design a simple yet effective algorithm
for generating a large, representative re-
sponse pool from un-annotated dialogues and
show that it can achieve high coverage for
handling natural language conversations.
2. We present an approach which combines self-
supervised training (from human-human con-
versations) and supervised fine-tuning (from
human-in-the-loop interactions) for learning
dialogue models in real industry settings.
3. We enhance state-of-the-art Poly-Encoders
architecture for retrieval based dialogue sys-
tem, incorporating multi-modal information
from dialogue text, and non-textual features
associated with the order and the customer.
4. We present a breakdown of development and
deployment stages of the conversational sys-
tem from offline evaluation –> controlled
human-in-the-loop setting –> fully online on
live traffic with real customer contacts.
2 Related Work
Retrieval-based dialogue systems (Tao et al.,
2021) involve single- and multi-turn response
matching (Chen et al.,2017;Lu et al.,2019;
Henderson et al.,2019;Gu et al.,2020;Whang
et al.,2020;Poddar et al.,2022;Xu et al.,2021;
Vig and Ramea,2019). The selection of an ap-
propriate response is usually based on comput-
ing and ranking the similarity between context
and response. Two popular model architectures
for such similarity computation between inputs,
is Cross-encoders (Wolf et al.,2019), which per-
form full self-attention over a given input and label
candidate; and Bi-encoders (Dinan et al.,2018),
which encode the input and candidate separately
and combine them at the end for a final repre-
sentation. Bi-encoders have the ability to cache
the encoded candidates, and reuse their represen-
tations for fast inference. Cross-encoders, on the
other hand, often achieve higher accuracy but are
prohibitively slow at test time. A recent method,
Poly-encoders (Humeau et al.,2019), combines
the strengths from the two architectures, and al-
lows for caching response representations while
Figure 2: Template coverage on general conversations
for Return Refund intent. Upper bound is established
by adding templates to the pool based on human expert
suggestions through several months of active use.
implementing an attention mechanism between
context and response for improved performance.
Transformer-based architectures (Vaswani et al.,
2017;Devlin et al.,2019) are widely used to en-
code information in TOD systems. For instance,
TOD-BERT (Wu et al.,2020) incorporates user
and system tokens into the masked language mod-
eling task and uses a contrastive objective func-
tion to simulate the response selection task. In this
work, we also adapt the Transformer architectures
and enhance Poly-Encoders to encode conversa-
tional history, response and profile features.
3 Response Pool Creation
We semi-automatically extract a broad template
pool from a large number of anonymized human
dialogues. We first select the template texts from
human responses in actual dialogues. This ensures
that the bot language conforms to the desired style.
Our primary selection criteria for response can-
didates are frequency and novelty. We iteratively
select sentences that are (1) most frequently used
in human dialogues, and (2) contain information
different from already selected responses (detailed
algorithm in Appendix A). This directly maxi-
mizes the dialogue model’s coverage, as measured
by the fraction of contexts for which the model has
a suitable response in the pool. An alternative ap-
proach would have been clustering frequent sen-
tences and selecting a representative for each clus-
ter (Hong et al.,2020) as templates. We instead
opted for the deterministic procedure which is
more intuitive for ingesting prior linguistic knowl-
edge and provides interpretability.
Quantitative Evaluation of Coverage: Figure 2
摘要:

DeployingaRetrievalbasedResponseModelforTaskOrientedDialoguesLahariPoddarGyuriSzarvasChengWangJorgeBalazsPavelDanchenkoAmazon{poddarl,szarvasg,cwngam,jabalazs,danchenk,peernst}@amazon.comPatrickErnstAbstractTask-orienteddialoguesystemsinindustrysettingsneedtohavehighconversationalca-pability,beeas...

展开>> 收起<<
Deploying a Retrieval based Response Model for Task Oriented Dialogues Lahari PoddarGyuri SzarvasCheng Wang Jorge Balazs Pavel Danchenko_2.pdf

共10页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:10 页 大小:658.08KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 10
客服
关注