Knowledge-grounded Dialog State Tracking Dian Yu Mingqiu Wang Yuan Cao Izhak Shafran Laurent El Shafey Hagen Soltau Google Research

2025-04-24 0 0 371.36KB 8 页 10玖币

侵权投诉

Knowledge-grounded Dialog State Tracking

Dian Yu ∗

, Mingqiu Wang, Yuan Cao, Izhak Shafran, Laurent El Shafey, Hagen Soltau

Google Research

{dianyu, mingqiuwang, yuancao, izhak, shafey, soltau}@google.com

Abstract

Knowledge (including structured knowledge

such as schema and ontology, and unstructured

knowledge such as web corpus) is a critical

part of dialog understanding, especially for un-

seen tasks and domains. Traditionally, such

domain-speciﬁc knowledge is encoded implic-

itly into model parameters for the execution

of downstream tasks, which makes training

inefﬁcient. In addition, such models are not

easily transferable to new tasks with different

schemas. In this work, we propose to perform

dialog state tracking grounded on knowledge

encoded externally. We query relevant knowl-

edge of various forms based on the dialog con-

text where such information can ground the

prediction of dialog states. We demonstrate

superior performance of our proposed method

over strong baselines, especially in the few-

shot learning setting.

1 Introduction

Pre-trained language models (LMs, Radford et al.,

2019;Raffel et al.,2020) are the backbone of

contemporary task-oriented dialog (TOD) models

(Peng et al.,2020;Yang et al.,2021). However,

the models are pre-trained on large generic corpora

so that they do not contain task-speciﬁc knowl-

edge. Previous work primarily suggests further

pre-training or ﬁne-tuning the LMs on in-domain

data for adaptation (Wu et al.,2020;Hosseini-Asl

et al.,2020), but it cannot consider information

above the surface level. This makes it challenging

for downstream tasks especially in the few-shot

learning setting because mapping representation

to the output space and encoding knowledge into

the model parameters are entangled, while the lat-

ter may require more training data. Some more

recent research proposes to incorporate external

knowledge for response generation tasks (Dinan

et al.,2019;Shuster et al.,2022;Chen et al.,2022;

∗

work done while at University of California, Davis

Encoder

Decoder

attraction-location:

west; hotel-price

range: moderate;

hotel-parking: free

[CONTEXT] … User:

can I also book a not

expensive hotel with free

parking?

moderate: hotel-price range

yes: hotel-parking

moderate: restaurant-price range

…

top1

top2

top7

MIPS

… yes: hotel-parking …

moderate: restaurant-price

range moderate: hotel-price

range [CONTEXT] User:

can I also book a not

expensive hotel with free

parking?

embeddings

Query

embeddings

moderate: hotel-price range

yes: hotel-parking

moderate: restaurant-price range

…

Figure 1: Model architecture for our proposed

knowledge-grounded DST. The encoder ﬁrst encodes

the query and knowledge into representations, and we

ﬁnd the top-k most relevant knowledge elements to the

context in step 2. We ﬂatten the retrieved elements in

step 3 and append to the query context as the input

to the encoder-decoder model. The retrieved elements

serve as a prior for DST.

Komeili et al.,2022), but it is not clear how to uti-

lize such information for language understanding.

In TOD settings, because the API call structure

is restricted to certain intents, slots, and values, the

schema is often provided. For example, in a ﬂight

booking system, queries like departure location and

airlines are pre-deﬁned. Users, even though not

bounded directly by what they can say to agents,

have a limited and predictable vocabulary set to

some extent. If the schema information is utilized,

a model does not need to learn that “San Francisco”

represents a departure place, rather than a general

city name from the LM. This is particularly im-

portant for new information, such as movie titles

or locations that do not appear in the LM training

corpus. Similar to human annotators, grounding a

dialog model on such knowledge makes it easier

and more accurate in understanding conversations.

In this paper, we investigate knowledge-

arXiv:2210.06656v1 [cs.CL] 13 Oct 2022

grounded understanding for dialog state tracking

(DST). In addition to using structured knowledge

such as the ontology of slot type-value pairs, we

also consider unstructured knowledge from the raw

training data. We train a TOD model to query rel-

evant knowledge for each turn in the context, and

leverage the retrieved knowledge to predict dia-

log state. We evaluate our method on MultiWOZ

(Budzianowski et al.,2018) for both the full-data

and few-shot settings, and show superior perfor-

mance compared to previous methods.

2 Related Work

2.1 Knowledge grounding

To relax the requirement of encoding knowledge

of the whole world into model parameters, one di-

rection is to disentangle knowledge representation

from LMs. Most of these methods are applied to

knowledge-intensive text generation tasks such as

open-domain question answering (Lee et al.,2019;

Karpukhin et al.,2020;Guu et al.,2020;Lewis

et al.,2020;Borgeaud et al.,2021), and response

generation with factual information (Dinan et al.,

2019;Komeili et al.,2022;Thoppilan et al.,2022;

Kim et al.,2020;Thulke et al.,2021;Chen et al.,

2022). Similarly, some work also considers re-

trieving information to serve as a reference to re-

ﬁne the model generation process (Weston et al.,

2018;Gonzalez et al.,2019;Khandelwal et al.,

2021;Zhang et al.,2021). Different from these

approaches, our method focuses on learning and

utilizing available domain-relevant knowledge for

language understanding tasks. Moreover, we pro-

pose to leverage knowledge of various formats.

2.2 Knowledge guided dialog understanding

Encoding domain schema into model parameters

(Hosseini-Asl et al.,2020;Madotto et al.,2020)

may not be efﬁcient for unseen domains and tasks

where the ontology can be different. One line of re-

search (Ren et al.,2018;Wu et al.,2019;Zhou and

Small,2019;Rastogi et al.,2020;Du et al.,2021;

Lee et al.,2021) leverages question-answering tech-

niques to predict values for each slot, or prepend all

slot-value information to the context (Zhao et al.,

2022). However, this method is not scalable when

the number of slot-value pairs is large, especially

in multi-domain TOD systems. In addition, proba-

bly due to blurry attention over long context (Fan

et al.,2021), Lee et al. (2021) ﬁnd that adding

potential slot values does not improve the model

performance. In contrast, retrieving only relevant

schema effectively solves the scalability problem

by specifying the knowledge with a ﬁxed length.

Alternatively, instead of structured schema

knowledge, recent research proposes to use hand-

crafted demonstrations as prompts (Gupta et al.,

2022) or ﬁnd similar examples to guide understand-

ing tasks (Yu et al.,2021;Pasupat et al.,2021;

Yao et al.,2021) such as conversational semantic

parsing. However, one turn can contain multiple di-

alog states so that retrieved examples from previous

methods may not be sufﬁcient to provide required

evidence. Furthermore, our method can be applied

to unify different forms of knowledge including

structured and unstructured ones.

3 Methodology

Our proposed method is illustrated in Figure 1.

Given the context

, we ﬁrst retrieve

relevant

knowledge entries

by the similarity between

Enc(x)

and

Enc(e)

using an encoder

Enc

. Then

we integrate the retrieved entries

e1, e2, ..., ek

with

the original context to form

, where

is used as

the input for the target DST task.

Knowledge retrieval

Different from previous

work (such as question answering) where there is

only one ground-truth knowledge for each query,

multiple entries of the form slot-value pairs may

exist in the ontology base that match the conversa-

tion context. Importantly, unlike passage retrieval

where the query (e.g., a sentence) and the target

(e.g., another sentence or passage) are similar to

the pre-training corpus, structured knowledge such

as schema pairs may have different representation

distribution. Thus, an off-the-shelf encoder may

retrieve noisy elements and degrade ﬁnal perfor-

mance, especially when training with the target

task optimized on DST generation. Moreover, non-

parametric retrieval methods such as TF-IDF and

BM25 (Robertson and Zaragoza,2009) rely on lex-

ical overlapping, which could be detrimental when

entries in schemas contain high word overlapping

(e.g., same value for different slots).

We therefore train our knowledge retriever to

promote similar representations between a query

and its ground truth knowledge. We started with

optimizing the marginal likelihood over all posi-

tive knowledge entries, but found that it resulted

in peaky distribution centered around speciﬁc ele-

ments in our preliminary studies. Instead, we mini-

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Knowledge-groundedDialogStateTrackingDianYu,MingqiuWang,YuanCao,IzhakShafran,LaurentElShafey,HagenSoltauGoogleResearch{dianyu,mingqiuwang,yuancao,izhak,shafey,soltau}@google.comAbstractKnowledge(includingstructuredknowledgesuchasschemaandontology,andunstructuredknowledgesuchaswebcorpus)isacriticalp...

展开>> 收起<<

Knowledge-grounded Dialog State Tracking Dian Yu Mingqiu Wang Yuan Cao Izhak Shafran Laurent El Shafey Hagen Soltau Google Research.pdf

共8页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Knowledge-grounded Dialog State Tracking Dian Yu Mingqiu Wang Yuan Cao Izhak Shafran Laurent El Shafey Hagen Soltau Google Research

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: