Knowledge-grounded Dialog State Tracking Dian Yu Mingqiu Wang Yuan Cao Izhak Shafran Laurent El Shafey Hagen Soltau Google Research

2025-04-24 0 0 371.36KB 8 页 10玖币
侵权投诉
Knowledge-grounded Dialog State Tracking
Dian Yu
, Mingqiu Wang, Yuan Cao, Izhak Shafran, Laurent El Shafey, Hagen Soltau
Google Research
{dianyu, mingqiuwang, yuancao, izhak, shafey, soltau}@google.com
Abstract
Knowledge (including structured knowledge
such as schema and ontology, and unstructured
knowledge such as web corpus) is a critical
part of dialog understanding, especially for un-
seen tasks and domains. Traditionally, such
domain-specific knowledge is encoded implic-
itly into model parameters for the execution
of downstream tasks, which makes training
inefficient. In addition, such models are not
easily transferable to new tasks with different
schemas. In this work, we propose to perform
dialog state tracking grounded on knowledge
encoded externally. We query relevant knowl-
edge of various forms based on the dialog con-
text where such information can ground the
prediction of dialog states. We demonstrate
superior performance of our proposed method
over strong baselines, especially in the few-
shot learning setting.
1 Introduction
Pre-trained language models (LMs, Radford et al.,
2019;Raffel et al.,2020) are the backbone of
contemporary task-oriented dialog (TOD) models
(Peng et al.,2020;Yang et al.,2021). However,
the models are pre-trained on large generic corpora
so that they do not contain task-specific knowl-
edge. Previous work primarily suggests further
pre-training or fine-tuning the LMs on in-domain
data for adaptation (Wu et al.,2020;Hosseini-Asl
et al.,2020), but it cannot consider information
above the surface level. This makes it challenging
for downstream tasks especially in the few-shot
learning setting because mapping representation
to the output space and encoding knowledge into
the model parameters are entangled, while the lat-
ter may require more training data. Some more
recent research proposes to incorporate external
knowledge for response generation tasks (Dinan
et al.,2019;Shuster et al.,2022;Chen et al.,2022;
work done while at University of California, Davis
Encoder
Decoder
attraction-location:
west; hotel-price
range: moderate;
hotel-parking: free
[CONTEXT] … User:
can I also book a not
expensive hotel with free
parking?
moderate: hotel-price range
yes: hotel-parking
moderate: restaurant-price range
top1
top2
top7
MIPS
… yes: hotel-parking …
moderate: restaurant-price
range moderate: hotel-price
range [CONTEXT] User:
can I also book a not
expensive hotel with free
parking?
KG
embeddings
Query
embeddings
1
3
4
moderate: hotel-price range
yes: hotel-parking
moderate: restaurant-price range
2
Figure 1: Model architecture for our proposed
knowledge-grounded DST. The encoder first encodes
the query and knowledge into representations, and we
find the top-k most relevant knowledge elements to the
context in step 2. We flatten the retrieved elements in
step 3 and append to the query context as the input
to the encoder-decoder model. The retrieved elements
serve as a prior for DST.
Komeili et al.,2022), but it is not clear how to uti-
lize such information for language understanding.
In TOD settings, because the API call structure
is restricted to certain intents, slots, and values, the
schema is often provided. For example, in a flight
booking system, queries like departure location and
airlines are pre-defined. Users, even though not
bounded directly by what they can say to agents,
have a limited and predictable vocabulary set to
some extent. If the schema information is utilized,
a model does not need to learn that “San Francisco”
represents a departure place, rather than a general
city name from the LM. This is particularly im-
portant for new information, such as movie titles
or locations that do not appear in the LM training
corpus. Similar to human annotators, grounding a
dialog model on such knowledge makes it easier
and more accurate in understanding conversations.
In this paper, we investigate knowledge-
arXiv:2210.06656v1 [cs.CL] 13 Oct 2022
grounded understanding for dialog state tracking
(DST). In addition to using structured knowledge
such as the ontology of slot type-value pairs, we
also consider unstructured knowledge from the raw
training data. We train a TOD model to query rel-
evant knowledge for each turn in the context, and
leverage the retrieved knowledge to predict dia-
log state. We evaluate our method on MultiWOZ
(Budzianowski et al.,2018) for both the full-data
and few-shot settings, and show superior perfor-
mance compared to previous methods.
2 Related Work
2.1 Knowledge grounding
To relax the requirement of encoding knowledge
of the whole world into model parameters, one di-
rection is to disentangle knowledge representation
from LMs. Most of these methods are applied to
knowledge-intensive text generation tasks such as
open-domain question answering (Lee et al.,2019;
Karpukhin et al.,2020;Guu et al.,2020;Lewis
et al.,2020;Borgeaud et al.,2021), and response
generation with factual information (Dinan et al.,
2019;Komeili et al.,2022;Thoppilan et al.,2022;
Kim et al.,2020;Thulke et al.,2021;Chen et al.,
2022). Similarly, some work also considers re-
trieving information to serve as a reference to re-
fine the model generation process (Weston et al.,
2018;Gonzalez et al.,2019;Khandelwal et al.,
2021;Zhang et al.,2021). Different from these
approaches, our method focuses on learning and
utilizing available domain-relevant knowledge for
language understanding tasks. Moreover, we pro-
pose to leverage knowledge of various formats.
2.2 Knowledge guided dialog understanding
Encoding domain schema into model parameters
(Hosseini-Asl et al.,2020;Madotto et al.,2020)
may not be efficient for unseen domains and tasks
where the ontology can be different. One line of re-
search (Ren et al.,2018;Wu et al.,2019;Zhou and
Small,2019;Rastogi et al.,2020;Du et al.,2021;
Lee et al.,2021) leverages question-answering tech-
niques to predict values for each slot, or prepend all
slot-value information to the context (Zhao et al.,
2022). However, this method is not scalable when
the number of slot-value pairs is large, especially
in multi-domain TOD systems. In addition, proba-
bly due to blurry attention over long context (Fan
et al.,2021), Lee et al. (2021) find that adding
potential slot values does not improve the model
performance. In contrast, retrieving only relevant
schema effectively solves the scalability problem
by specifying the knowledge with a fixed length.
Alternatively, instead of structured schema
knowledge, recent research proposes to use hand-
crafted demonstrations as prompts (Gupta et al.,
2022) or find similar examples to guide understand-
ing tasks (Yu et al.,2021;Pasupat et al.,2021;
Yao et al.,2021) such as conversational semantic
parsing. However, one turn can contain multiple di-
alog states so that retrieved examples from previous
methods may not be sufficient to provide required
evidence. Furthermore, our method can be applied
to unify different forms of knowledge including
structured and unstructured ones.
3 Methodology
Our proposed method is illustrated in Figure 1.
Given the context
x
, we first retrieve
k
relevant
knowledge entries
e
by the similarity between
Enc(x)
and
Enc(e)
using an encoder
Enc
. Then
we integrate the retrieved entries
e1, e2, ..., ek
with
the original context to form
x0
, where
x0
is used as
the input for the target DST task.
Knowledge retrieval
Different from previous
work (such as question answering) where there is
only one ground-truth knowledge for each query,
multiple entries of the form slot-value pairs may
exist in the ontology base that match the conversa-
tion context. Importantly, unlike passage retrieval
where the query (e.g., a sentence) and the target
(e.g., another sentence or passage) are similar to
the pre-training corpus, structured knowledge such
as schema pairs may have different representation
distribution. Thus, an off-the-shelf encoder may
retrieve noisy elements and degrade final perfor-
mance, especially when training with the target
task optimized on DST generation. Moreover, non-
parametric retrieval methods such as TF-IDF and
BM25 (Robertson and Zaragoza,2009) rely on lex-
ical overlapping, which could be detrimental when
entries in schemas contain high word overlapping
(e.g., same value for different slots).
We therefore train our knowledge retriever to
promote similar representations between a query
and its ground truth knowledge. We started with
optimizing the marginal likelihood over all posi-
tive knowledge entries, but found that it resulted
in peaky distribution centered around specific ele-
ments in our preliminary studies. Instead, we mini-
摘要:

Knowledge-groundedDialogStateTrackingDianYu,MingqiuWang,YuanCao,IzhakShafran,LaurentElShafey,HagenSoltauGoogleResearch{dianyu,mingqiuwang,yuancao,izhak,shafey,soltau}@google.comAbstractKnowledge(includingstructuredknowledgesuchasschemaandontology,andunstructuredknowledgesuchaswebcorpus)isacriticalp...

展开>> 收起<<
Knowledge-grounded Dialog State Tracking Dian Yu Mingqiu Wang Yuan Cao Izhak Shafran Laurent El Shafey Hagen Soltau Google Research.pdf

共8页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:8 页 大小:371.36KB 格式:PDF 时间:2025-04-24

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 8
客服
关注