Doc2Bot Accessing Heterogeneous Documents via Conversational Bots Haomin Fu12 Yeqin Zhang12 Haiyang Yu2 Jian Sun2 Fei Huang2 Luo Si2 Yongbin Li2yand Cam-Tu Nguyen1y

2025-05-03 0 0 2.57MB 17 页 10玖币
侵权投诉
Doc2Bot: Accessing Heterogeneous Documents via Conversational Bots
Haomin Fu1,2
, Yeqin Zhang1,2
, Haiyang Yu2, Jian Sun2, Fei Huang2, Luo Si2
Yongbin Li2and Cam-Tu Nguyen1
1State Key Laboratory for Novel Software Technology, Nanjing University, China
2Alibaba Group
{haominfu, zhangyeqin}@smail.nju.edu.cn
{yifei.yhy, jian.sun, f.huang, luo.si}@alibaba-inc.com
shuide.lyb@alibaba-inc.com, ncamtu@nju.edu.cn
Abstract
This paper introduces Doc2Bot, a novel
dataset for building machines that help users
seek information via conversations. This is of
particular interest for companies and organiza-
tions that own a large number of manuals or
instruction books. Despite its potential, the na-
ture of our task poses several challenges: (1)
documents contain various structures that hin-
der the ability of machines to comprehend, and
(2) user information needs are often underspec-
ified. Compared to prior datasets that either
focus on a single structural type or overlook
the role of questioning to uncover user needs,
the Doc2Bot dataset is developed to target
such challenges systematically. Our dataset
contains over 100,000 turns based on Chinese
documents from five domains, larger than any
prior document-grounded dialog dataset for in-
formation seeking. We propose three tasks
in Doc2Bot: (1) dialog state tracking to track
user intentions, (2) dialog policy learning to
plan system actions and contents, and (3) re-
sponse generation which generates responses
based on the outputs of the dialog policy. Base-
line methods based on the latest deep learning
models are presented, indicating that our pro-
posed tasks are challenging and worthy of fur-
ther research.
1 Introduction
The last decade has witnessed a dramatic change in
how humans interact with information retrieval sys-
tems. Although traditional search engines still play
an important role in our daily life, the wide adop-
tion of smart devices with small screens requires
systems to answer user requests more concisely.
Early attempts focus on answering independent
questions (Rajpurkar et al.,2016), whereas recent
studies pay attention to handling interconnected
questions via conversations around a single pas-
sage (Pasupat and Liang,2015;Chen et al.,2020)
*Equal contribution.
Corresponding authors.
or documents (Feng et al.,2020,2021). Yet, the
nature of heterogeneous documents and our conver-
sational setting pose challenges that require further
attention. We, therefore, develop Doc2Bot
1
with
these considerations in mind.
The first concerns the nature of heterogeneous
documents, which often contain different types of
structures such as tables and sequences. To answer
questions regarding such structural types, systems
need to acquire various skills. Figure 1shows a
conversation between a user and an agent, where
the agent has access to a collection of documents.
In this conversation, every utterance except the
first one depends on the conversation history and
the grounded documents. The document contains
diverse structures including conditions (
N2
), se-
quences (
N3
), and tables (
N4
). Each type of struc-
ture poses different challenges for the agent. For
example, the agent needs to understand the rule
text in
N2
to ask questions such as
A3
and
A5
. The
agent also needs to be able to summarize the pro-
cedure in
N3
to produce the answer (
A13
) or jump
to a specific step of the procedure if being asked
(
A15
). In addition, the agent needs to be capable of
summarizing the tables (
A19
) or answering ques-
tions with row and column constraints (
A21
). Note
that there have been previous attempts (Pasupat and
Liang,2015;Saeidi et al.,2018;Chen et al.,2020;
Zhang et al.,2022b) to address different structures
in isolation. However, little research has been done
on handling multiple types of structures in the same
document context.
The second concerns the nature of our conversa-
tional setting, which is to help users seek informa-
tion from documents. Since user information needs
are often under-specified, it is desirable for systems
to ask clarifying questions. This resembles the
task of relevance feedback and query refinement
in traditional information retrieval. However, in
Doc2Bot, system feedback is in the form of natural
1https://github.com/Doc2Bot/Doc2Bot
arXiv:2210.11060v3 [cs.CL] 20 Nov 2022
  


 
  



 







 










U0
A1
Hello, I'd like to know about the social security and medical insurance of HZ
City.
Which one do you want to know? Industrial injury insurance, unemployment
insurance or medical insurance?
U2
A3
U4
A5
U6
A7
Tell me more unemployment insurance, I like to know if I can apply for
unemployment benefit?
From January to December 2021, did you terminate the labor contract due to
your own reasons while you have paid for the insurance over a year?
Yes
Is your last insured institution in HZ City before you lost your job?
Yes, I am insured in HZ City
Ok, then you meet the current conditions for HZ City to apply for
unemployment benefits
...
U12
A13
U14
A15
Can you tell me the procedure for the continuation of unemployment
insurance?
Handling process: 1) Application; 2) Receiving; 3) Recheck; 4) Review; 5)
Completion and 6) Delivery. Would you like to know more about the application
step?
Not now, I'd like to know the processing time of the forth step?
1-3 working days
...
U18
A19
U20
A21
What materials do I need to submit to apply for unemployment benefits?
It is required to submit the application form, social security card and the
certificate of termination of labor issued by the employer.
What is the required paper size of the application form?
A4
HZ city > Social security and medical insurance
> Unemployment Insurance
N1
N2
N3
N4
Figure 1: An example dialog (right) grounded on a document (left) with heterogeneous structures. From the top,
the dialog contains 4 segments S1-4 grounded on 4 corresponding document segments N1-4. Here Uand Astand
for user and agent, respectively.
questions, and thus more user-friendly. For exam-
ple, in Figure 1,
A1
is a kind of multiple-choice
question that the agent asks to narrow down the
search for the answer. In contrast,
A3
and
A5
are
to verify user situations to answer questions regard-
ing condition/solution structure. Although learning
to construct questions from a single passage has
been studied in Machine Reading Comprehension
(Saeidi et al.,2018;Guo et al.,2021), such finer-
grained questions are required only when the pas-
sage containing the answer has been found. For
document-grounded dialog systems (DGDS), the
agent needs to have the skills to narrow down the
search (
A1
) as well as to ask finer questions such
as A3 and A5.
Towards such goals, there are several challenges
that we need to address. First, documents come
in different formats, and thus the process of con-
structing our dataset is more difficult than those
from single passages with homogeneous structures.
The difference in formats also hinders the ability
of machines to learn common patterns. Second,
like human-human conversations, it is desirable
to have samples of human-system conversations
that are natural, and coherent while being diverse
for the machine learning purpose. We target such
challenges systematically and make the following
contributions:
We present a unified representation for hetero-
geneous structures, which not only facilitates
our data collection process but also helps sys-
tems to learn patterns across documents.
We propose an agenda-based dialog collec-
tion protocol that controls the diversity and
coherence of dialogues by design. The pro-
tocol also encourages crowd-collaborators to
introduce ambiguities to conversations.
We introduce a new dataset Doc2Bot which is
larger in scale compared to recent datasets for
DGDS (Feng et al.,2020,2021) while intro-
ducing new challenges such as a new language
(Chinese), richer relations (e.g, sections, con-
ditions, tables, sequences) and new tasks (e.g.
dialog policy learning).
We evaluate our proposed tasks with the latest
machine learning methods. The experiments
show that our tasks are still challenging, which
suggests room for further research.
2 Related Works
Our work is most closely related to the document-
grounded dialog systems (DGDS) in the litera-
ture. Based on the conversation objective, we can
roughly categorize the related tasks into chitchat,
comprehension, or information seeking.
Document-grounded chitchat datasets such as
WoW (Dinan et al.,2019), Holl-E (Moghe et al.,
2018), CMU-DoG (Zhou et al.,2018) aim to en-
hance early chitchat systems by using information
from grounded textual passages for answer genera-
tion. The goal is similar to an open chitchat system
as the dialog agent tries to keep users engaged in
long, informative, and interactive conversations.
This is different from our setting because users
of our system often have clear goals (information
needs), and the dialog agent needs to provide users
with accurate information as soon as possible.
For document-grounded “comprehension” such
as CoQA (Reddy et al.,2019), Abg-CoQA (Guo
et al.,2021) and ShARC (Saeidi et al.,2018), the
agent is given a textual paragraph and needs to an-
swer users’ questions about the paragraph. This set-
ting is similar to Machine Reading Comprehension
(MRC). However, the difference is that questions
in MRC may not form a coherent dialog. Notice-
ably, several question strategies have been targeted
in Abg-CoQA and ShARC. For example, in Abg-
CoQA, systems can ask clarifying questions to re-
solve different types of ambiguities. In ShARC,
the authors created conversations where the system
can learn to ask “yes/no” questions to understand
users’ information and provide appropriate answers.
The questioning strategy in ShARC is designed
based on text rules that define the relationship be-
tween “conditions” and “solutions” exhibited in
the given paragraph. Although we also address
question strategies, our tasks are more challenging
because we focus on multiple documents.
The third type of DGDS (Penha et al.,2019;
Feng et al.,2020,2021) is closest to our setting
where the agent needs to provide answers to infor-
mation seekers in the shortest possible time. Mantis
(Penha et al.,2019) was collected from online fo-
rums, and the grounded documents are not given in
advance. As a result, Mantis does not come with a
detailed annotation which is needed to study the ca-
pability of the agents to understand documents. In
contrast, given a set of documents, Doc2dial (Feng
et al.,2020) and Multidoc2dial (Feng et al.,2021)
were collected in 2 stages: 1) dialog flows are first
generated by labeling and linking paragraphs, 2)
crowdsourcers then write conversations based on
the suggested flows. Note that Multidoc2dial was
built by rearranging dialogues from doc2dial so
that one conversation can contain information from
multiple documents. Although we follow simi-
lar steps for constructing the dataset, our dialog
flow generation is essentially different, which ad-
dresses the coherence of the generated dialogues,
and the multi-document grounding issue by design.
In addition, our dataset exceeds Doc2dial and Mul-
tidoc2dial in scale, while also highlighting new
challenges such as under-specified user requests.
3 Dataset Collection
This section details the process of collecting
Doc2Bot, including 4 stages: 1)
document col-
lection
which selects targeted domains and doc-
uments; 2)
document graph construction
which
unifies heterogeneous structures from multiple do-
mains to build document graphs; 3)
dialog flow
generation
that simulates the agenda of a user
seeking information from a document graph; and 4)
dialog collection
where crowd-collaborators write
dialogs based on the generated dialog flows.
3.1 Document Collection
For document collection, we examine several po-
tential domains and select 5 representative ones
including public services, technology, insurance,
health care services, and wikiHow. For each do-
main, documents are selected based on two criteria:
1) the documents should be rich in structural types;
2) each document should have links to other doc-
uments so that we can test the ability of machines
to reason over multiple documents. We design a
simple ranking score based on these criteria and
select the top-ranked documents for each domain.
3.2 Document Graph Construction
Documents from different domains or sources have
vastly different formats (HTML, PDF, etc). To-
wards building scalable dialog systems across do-
mains, it is important to have a unified format for
encoding heterogeneous semantic structures in doc-
uments. Bear in mind that our target is to preserve
those structures in the document context. This is un-
like knowledge graphs and event graphs (Fu et al.,
2020;Ma et al.,2021;Hogan et al.,2021) in which
only entities or events are extracted while other
context information is discarded.
type=section
type=disjunction
type=cond
type=solution
If the insured dies
due to one of the
following
circumstances
The company shall
not be liable
(1) The applicant
intentionally causes the
insured to have an
acute disease type=cond
(2) The insured
intentionally commits
acrime
Exemption
from liability
Figure 2: The structure of a disjunction of conditions
and the associated solution in the insurance domain.
type=object
type=table type=object
type=value
type=value
type=value
type=value
type=attr
type=attr
type=attr
Application form
for tractor
driver’s license
Application
Materials
Number of
copies 1 copy
Material
Specification
A4
Photos
Number of
copies 2 copies
type=attr
Material
Specification 1 inch
Figure 3: The structure of a table and its objects in the
domain of public services.
Document Graph
is defined as a directed graph
where a node corresponds to a span of text in the
document. Inspired by property graphs (Hogan
et al.,2021), we associate each node with a node
type and a set of additional property-value pairs.
Each domain has a root node that connects to do-
main documents via title hierarchy.
A number of node types are defined to cover
common discourse relations exhibited in multi-
ple domains (Das et al.,2018;Stede et al.,2019).
These include
section
type to denote section titles
in documents. The types of disjunction,conjunc-
tion
,
condition
,
solution
,
negation
are used to de-
scribe the condition-solution relation as depicted
in Figure 2. The types of
table
,
object
,
attribute
,
value
are to encode the relations in tables as shown
in Figure 3. The types of
sequence
,
sequence-step
are introduced to indicate the relations of texts in
describing procedures such as
N3
in Figure 1. Last
but not least, the
see-more
type is used to encode
hyperlinks, and the
ordinary
type is assigned to
the nodes belonging to none of the above.
The property-value pairs associated with nodes
are used for additional information. For exam-
ple, each node can be identified with
docid
and
nodeid
. Likewise,
see-more
nodes have prop-
erties such as linked nodeid. Additionally, we in-
troduce is-super-leaf to indicate whether a node
should be targeted in the dialog flow generation.
3.3 Dialog Flow Generation
Studies of human behaviors in goal-oriented di-
alog systems have long recognized the fact that
users have hidden agendas (Schatzmann and Young,
2009) which direct the interactions between users
and chatbots. This is also the idea behind the con-
struction of well-known datasets such as MultiWoz
(Budzianowski et al.,2018). Although the connec-
tion between DGDS in information-seeking sce-
narios and goal-oriented dialog systems has been
suggested (Feng et al.,2020,2021), DGDS have no
explicit schemes, thus hindering the agenda-based
approach to dialog collection. As an alternative, we
exploit the graph structure of the document graph
to build up agendas for simulating dialog flows be-
tween a user and an agent. Here, a dialog flow is
defined as a sequence of goals, each goal corre-
sponds to a node in our document graph. We mark
nodes, that can be used as goals, with is_super_leaf
being true using a semi-automatic method.
Our agenda-based procedure for generating a di-
alog flow is demonstrated in Algorithm 1. Here, the
procedure takes as inputs the document graph
G
,
the transition probabilities
ξ
, the maximum number
of goals
nGoal
, and the initial selected document
d
. The objective is to generate diverse dialog flows
based on which crowd contributors can write con-
versations. For each goal, a prompt can be gener-
ated to suggest questions that can be asked about
the subtree rooted at the goal node (line 6). For
example, given a table in Figure 3as a goal, we
can generate the corresponding prompt by: (1) ran-
domly selecting some “objects” and “attributes” as
constraints, e.g. paper size and application form;
(2) using templates to convert the constraints to
a guideline such as “write a number of question-
answer turns so that the system final answer is A4 -
the paper size of the application form.
We use an agenda stack to contain a list of po-
tential goals that a user might switch to (from the
last goal). The candidates nearer to the top of the
agenda stack are closer to the last goal in the doc-
ument graph. The action of a user switching from
one goal to another is simulated by three factors,
the follow-up rate
ξfl
, the in-jump rate
ξinj
and the
out-jump rate
ξoutj
. When the action is follow-up,
users tend to ask about the related information of
摘要:

Doc2Bot:AccessingHeterogeneousDocumentsviaConversationalBotsHaominFu1,2,YeqinZhang1,2,HaiyangYu2,JianSun2,FeiHuang2,LuoSi2YongbinLi2yandCam-TuNguyen1y1StateKeyLaboratoryforNovelSoftwareTechnology,NanjingUniversity,China2AlibabaGroup{haominfu,zhangyeqin}@smail.nju.edu.cn{yifei.yhy,jian.sun,f.huang,...

展开>> 收起<<
Doc2Bot Accessing Heterogeneous Documents via Conversational Bots Haomin Fu12 Yeqin Zhang12 Haiyang Yu2 Jian Sun2 Fei Huang2 Luo Si2 Yongbin Li2yand Cam-Tu Nguyen1y.pdf

共17页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:17 页 大小:2.57MB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 17
客服
关注