Team Flow at DRC2022 Pipeline System for Travel Destination Recommendation Task in Spoken Dialogue Ryu Hirai1 Atsumoto Ohashi1 Ao Guo1 Hideki Shiroma1 Xulin Zhou1

2025-04-24 0 0 348.25KB 4 页 10玖币
侵权投诉
Team Flow at DRC2022: Pipeline System for Travel Destination
Recommendation Task in Spoken Dialogue
Ryu Hirai1, Atsumoto Ohashi1, Ao Guo1, Hideki Shiroma1, Xulin Zhou1,
Yukihiko Tone1, Shinya Iizuka2and Ryuichiro Higashinaka1
Abstract To improve the interactive capabilities of a dia-
logue system, e.g., to adapt to different customers, the Dialogue
Robot Competition (DRC2022) was held. As one of the teams,
we built a dialogue system with a pipeline structure containing
four modules. The natural language understanding (NLU) and
natural language generation (NLG) modules were GPT-2 based
models, and the dialogue state tracking (DST) and policy
modules were designed on the basis of hand-crafted rules. After
the preliminary round of the competition, we found that the
low variation in training examples for the NLU and failed
recommendation due to the policy used were probably the main
reasons for the limited performance of the system.
I. INTRODUCTION
With the popularization of human-machine dialogue, a
dialogue system is expected to achieve objectives in various
situations, e.g., respond to different customers appropriately
in a customer service task. To improve the interactive ca-
pabilities of a dialogue system, Dialogue Robot Competition
2022 (DRC2022) [1] was held following the past competition
[2]. Each team was required to develop a dialogue system
embedded within a humanoid robot to handle the “travel
destination recommendation task.” In the task, the robot plays
the role of a counter-sales person to satisfy the customer by
helping him/her choose one of two tourist attractions.
This paper reports the work of the team “Flow” in
DRC2022. Our dialogue system was built with a pipeline
composed of four modules: natural language understanding
(NLU), dialogue state tracking (DST), policy, and natural
language generation (NLG). By configuring the system in
a pipeline fashion, (1) it is easy to tune the functionality
of each module, and (2) in the future, we can expect to
introduce a method, such as [3], to integrate all modules and
optimize the dialogue performance of the entire system. We
built the NLU and NLG modules by fine-tuning GPT-2 [4], a
popular large-scale language model, with our crowdsourced
data for the travel destination recommendation task. We
further designed the DST and policy modules with hand-
crafted rules.
After the preliminary round of the competition, we exam-
ined the evaluation results and dialogue histories. We found
two main reasons for the limited performance: (1) the NLU
was not able to track the customer dialogue acts properly due
to the low variation in training examples for GPT-2, and (2)
1Graduate School of Informatics, Nagoya University, Japan
hirai.ryu.k6@s.mail.nagoya-u.ac.jp
2School of Informatics, Nagoya University, Japan
Equal contribution.
NLU
NLG
DST
Policy
ASR
TTS
Customer utterance (text) Customer DA
System utterance (text)
Dialogue state
Customer utterance
(speech)
System utterance
(speech) System DA
Customer
Fig. 1. Diagram of pipeline structure of our spoken dialogue system.
At each turn, customer’s speech recognition results obtained by automatic
speech recognition (ASR) are processed by NLU, DST, policy, and NLG
to generate system’s response text, which is finally converted to speech by
text-to-speech (TTS) to respond to customer.
the rules of the policy resulted in a recommendation strategy
that ignored customers’ preferences.
II. IMPLEMENTATION
Fig. 1 shows the pipeline structure of the spoken dialogue
system our team implemented. At each turn, the customer’s
speech input to the robot is converted into text by the
automatic speech recognition (ASR) module, and the utter-
ance text is input to the NLU module. The NLU predicts
the customer’s dialogue act (DA), which is a semantic
representation of the customer’s utterance. The DST module
then updates the dialogue state on the basis of the customer’s
DA. The dialogue state consists of information such as the
history of the DAs, the customer profile, and the belief state,
which is a set of customers’ preferences toward travel. The
policy module decides the next action to be taken by the
system as the system’s DA by using the dialogue state and
information on tourist attractions from the database. The
NLG module then converts the system’s DA into a system
utterance. Finally, the text-to-speech (TTS) module responds
to the customer by converting the text response to speech.
ASR and TTS were implemented using the Google Speech
Recognition system and Amazon Polly API, respectively,
which were prepared by the competition organizers. The
robot’s expression control and motion control were based
on the expression and motion rules defined for each system
dialogue act.
In the following sections, first, we describe the ontology
of DA for the travel destination recommendation task, NLU,
DST, policy, and NLG modules. Then, we describe the
robot’s facial expression control and motion control. Finally,
we describe and discuss the evaluation results.
arXiv:2210.09518v1 [cs.CL] 18 Oct 2022
摘要:

TeamFlowatDRC2022:PipelineSystemforTravelDestinationRecommendationTaskinSpokenDialogueRyuHirai1,AtsumotoOhashi1,AoGuo1,HidekiShiroma1,XulinZhou1,YukihikoTone1,ShinyaIizuka2andRyuichiroHigashinaka1Abstract—Toimprovetheinteractivecapabilitiesofadia-loguesystem,e.g.,toadapttodifferentcustomers,theDia...

展开>> 收起<<
Team Flow at DRC2022 Pipeline System for Travel Destination Recommendation Task in Spoken Dialogue Ryu Hirai1 Atsumoto Ohashi1 Ao Guo1 Hideki Shiroma1 Xulin Zhou1.pdf

共4页,预览1页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:4 页 大小:348.25KB 格式:PDF 时间:2025-04-24

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 4
客服
关注