A Uniﬁed Framework for Multi-intent Spoken Language Understanding with prompting Feifan Song Lianzhe Huang and Houfeng Wang

2025-04-27 0 0 534.61KB 11 页 10玖币

侵权投诉

A Uniﬁed Framework for Multi-intent Spoken Language Understanding

with prompting

Feifan Song, Lianzhe Huang and Houfeng Wang

MOE Key Laboratory of Computational Linguistics, Peking University, China

songff@stu.pku.edu.cn

{hlz, wanghf}@pku.edu.cn

Abstract

Multi-intent Spoken Language Understanding

has great potential for widespread implemen-

tation. Jointly modeling Intent Detection and

Slot Filling in it provides a channel to ex-

ploit the correlation between intents and slots.

However, current approaches are apt to for-

mulate these two sub-tasks differently, which

leads to two issues: 1) It hinders models

from effective extraction of shared features. 2)

Pretty complicated structures are involved to

enhance expression ability while causing dam-

age to the interpretability of frameworks. In

this work, we describe a Prompt-based Spoken

Language Understanding (PromptSLU) frame-

work, to intuitively unify two sub-tasks into

the same form by offering a common pre-

trained Seq2Seq model. In detail, ID and SF

are completed by concisely ﬁlling the utter-

ance into task-speciﬁc prompt templates as in-

put, and sharing output formats of key-value

pairs sequence. Furthermore, variable intents

are predicted ﬁrst, then naturally embedded

into prompts to guide slot-value pairs infer-

ence from a semantic perspective. Finally, we

are inspired by prevalent multi-task learning to

introduce an auxiliary sub-task, which helps to

learn relationships among provided labels. Ex-

periment results show that our framework out-

performs several state-of-the-art baselines on

two public datasets.

1 Introduction

Spoken Language Understanding (SLU) is a fun-

damental part in the area of Task-oriented Dia-

logue (TOD) modeling. It serves as the entrance

of the pipeline with two sub-tasks: semantic infor-

mation understanding by detecting user intents and

extraction by ﬁlling the prepared slots, called Intent

Detection (ID) and Slot Filling (SF), respectively.

The former is usually modeled into a text classiﬁca-

tion problem and the latter is completed by cutting

out fragments from an input utterance in the form

of sequence tagging.

show me the cheapest fare ...

atis_cheapest atis_ground_service

O B-cost_relativeOO O

Intents

Input

Slots

show me the cheapest fare in the database and

ground transportation san francisco

Figure 1: Illustration of prior paradigm for jointly mod-

elling multi-intent Spoken Language Understanding.

Early works (Ravuri and Stolcke,2015;Kim

et al.,2017) focused on two sub-tasks separately,

but the correlation between two sub-tasks is seen

as the key to further improvement of SLU in recent

years. Thus, such jointly modeling techniques were

offered to bridge the gap of the common features

between Intent Detection and Slot Filling. Chen

et al. (2019) treated BERT as a shared contextual

encoder for two sub-tasks, while Qin et al. (2019)

further brought token-level intent prediction results

into Slot Filling process. Apart from these con-

ventional NLU pathways, Zhang et al. (2021) and

Tassias (2021) chose to formulate SLU as a con-

strained generation task.

However, real-world dialogue is more com-

plex. Each utterance can be longer and contain

more than one intent. As is shown in Figure 1,

the utterance “Show me the cheapest fare in the

database and ground transportation san francisco”

contains two distinct intents (atis_cheapest and

atis_ground_service). For this kind of multi-intent

SLU, similar jointly modelling methods are dis-

cussed (Gangadharaiah and Narayanaswamy,2019;

Qin et al.,2020;Ding et al.,2021;Qin et al.,2021;

Chen et al.,2022;Cai et al.,2022) and work well on

several datasets, where multiple intents detection

is often formulated as a multi-label text classiﬁca-

tion problem (Gangadharaiah and Narayanaswamy,

arXiv:2210.03337v1 [cs.CL] 7 Oct 2022

2019), and potential interaction among intent and

slot labels also plays an essential role (Qin et al.,

2020;Ding et al.,2021;Qin et al.,2021;Chen et al.,

2022;Cai et al.,2022).

Although following the path of jointly modeling,

prior methods tend to integrate ID and SF during

encoding by providing a shared feature extractor,

but treat them individually when decoding. Fig-

ure 1displays different decoding processes of two

sub-tasks as multi-label classiﬁcation and sequence

tagging. The vast distinction in task formulation

between ID and SF is indispensable. It discourages

models from effective shared feature extraction and

in turn, hinders the potential of comprehensively

better performance. Consequently, unifying the

two sub-tasks from start to ﬁnish with a common

framework is signiﬁcant.

In this paper, we propose a framework to lever-

age pre-trained language models (PLMs) and

prompting to handle the aforementioned chal-

lenges, called

Prompt

-based

poken

anguage

nderstanding (PromptSLU) framework. Instead

of using complicated structures to exploit the cor-

relation among labels across two sub-tasks, our

framework just incorporates both of them into text

generation tasks with prompts, where the respec-

tive outputs share a general format of sequences

of key-value pairs, namely Belief Span (Lei et al.,

2018). During inference, given an utterance and

certain task requirement, PromptSLU ﬁrst ﬁlls the

utterance in a task-speciﬁc prompt template and

then inputs it into a pre-trained Seq2Seq model.

Besides, consistency between two sub-tasks is

crucial in SLU. Compared with prior settings, the

multi-intent scenario is more challenging for accu-

rate alignment from intents to slots because of the

greater length of utterances and increased number

of labels. For this issue, we explore an intuitive

way in which intents are driven to restrain SF by

plugging intents into prompt templates, namely

Semantic Intents Guidance (SIG). As is plain to hu-

mans in form, this design also allows the utilization

of semantic information of intents to promote over-

all comprehension of our framework. Furthermore,

inspired by multi-task learning used by Paolini et al.

(2021), Su et al. (2022) and Wang et al. (2022a),

we try an auxiliary sub-task, called Slot Predic-

tion (SP), to steer models to additionally maintain

semantic consistency. Experiments on two datasets

demonstrate that PromptSLU outperforms SOTA

baselines on most metrics, including those using

PLMs. Abundant ablation studies also show the

effectiveness of each component.

In summary, our contributions are as follows:

•

We take the ﬁrst step to introduce prompts

into multi-intent SLU by transforming Intent

Detection and Slot Filling into a common text

generation formulation.

•

We present Semantic Intents Guidance Mech-

anism and a new sub-task Slot Prediction to

provide more intuitive channels for interaction

among texts, intents and slots.

•

Experiments on two public datasets show

better results of PromptSLU compared with

methods with SOTA performance, and the ef-

fectiveness of proposed strategies and seman-

tic information.

2 Related Work

2.1 Spoken Language Understanding

Spoken Language Understanding (SLU), in the

task-oriented dialog system, is mainly composed of

two sub-tasks, Intent Detection (ID) and Slot Fill-

ing (SF). As for task formulations, most existing

methods model ID into a text classiﬁcation prob-

lem and SF into a sequence tagging problem. Early

works separately handled ID and SF (Schapire and

Singer,2000;Haffner et al.,2003;Raymond and

Riccardi,2007;Tur et al.,2011;Ravuri and Stol-

cke,2015;Kim et al.,2017;Wang et al.,2022b).

However, current approaches prefer to modeling

them together, with the consideration of the high

correlation between them, and thus lead to sub-

stantial improvement. Jointly modeling techniques

basically correspond to two different methodolo-

gies:

Parallel Model

where ID and SF get outputs re-

spectively, but share an utterance encoder, in an at-

tempt to exploit the common latent features (Zhang

and Wang,2016;Hakkani-Tür et al.,2016;Zhang

and Wang,2019).

Serial Model

built by detecting intent in the ﬁrst

place and then utilizing intent information to navi-

gate SF (Zhang et al.,2019;Qin et al.,2019;Tas-

sias,2021).

Our framework takes both sides mentioned

above. First, It can be serially implemented ac-

cording to the second methodology, i.e., allowing

intents to beneﬁt SF. Then, the results of two sub-

tasks come from the same Seq2Seq model in this

framework.

2.2 Multi-intent SLU

Multi-intent utterances tend to appear, at a higher

frequency in reality, than those with a single in-

tent. This problem has an increasing popularity

in society (Kim et al.,2017;Gangadharaiah and

Narayanaswamy,2019;Qin et al.,2020;Ding et al.,

2021;Qin et al.,2021;Chen et al.,2022;Cai et al.,

2022). Gangadharaiah and Narayanaswamy (2019)

used an attention-based neural network for jointly

modeling ID and SF in multi-intent scenarios. Qin

et al. (2020) proposed AGIF that organizes intent

labels and slot labels together in the form of graph

structure and focuses on the interaction of labels.

Ding et al. (2021) and Qin et al. (2021) utilize

graph attention network to attend on the associa-

tion among labels. Despite the attempt to build

the bridge between intent and slot labels, ID and

SF are still modeled in different forms, i.e., multi-

label classiﬁcation for ID but sequence tagging for

SF. The essential problem has not been handled

that distinct modeling ways hinder the extraction

of common features. Our text-generation-based

framework differs from the above approaches by

unifying the formulation of ID and SF. On one

hand, this way is plain to humans and naturally

suitable for ID with variable intents. On the other

hand, every step of promotion of our framework

effectively beneﬁts both sub-tasks. Our framework

takes both sides above to be equipped with explic-

itly hierarchical implementation, while the results

of two sub-tasks come from the same Seq2Seq

model.

2.3 Prompting

GPT-3 (Brown et al.,2020) provides a novel in-

sight on the area of Natural Language Processing

with the power of new paradigm (Liu et al.,2021),

namely prompt. Currently, it has been explored

in many directions by transforming the original

task into a text-to-text form (Zhong et al.,2021;

Lester et al.,2021;Cui et al.,2021;Wang et al.,

2022a;Tassias,2021;Su et al.,2022), where ﬁne-

tuning on downstream tasks effectively refers to

knowledge from long-term pre-training processes,

and especially works well on low-resource scenar-

ios. Zhong et al. (2021) tried to align the form

of ﬁne-tuning to next word prediction to optimize

performance on text classiﬁcation tasks. In another

way, Lester et al. (2021) appealed to soft-prompt

to capture implicit and continuous signals. For se-

quence tagging, prompts appear as templates for

text ﬁlling or natural language descriptions of in-

struction (Cui et al.,2021;Wang et al.,2022a). As

to SLU in task-oriented dialogue (TOD), it consists

of two sub-tasks ID and SF which have homolo-

gous forms. Tassias (2021) depended on prior dis-

tributions of intents and slots to generate a prompt

containing all slots to be prdedicted before infer-

ence, lowering the difﬁculty of SF. Su et al. (2022)

proposed a prompt-based framework PPTOD to

integrate several parts in the pipeline of TOD, in-

cluding single-intent detection that is different from

our setting. However, such jointly modeling fash-

ions neglect the complexity of co-occurrence and

semantic similarity among tokens, intents and slots

in multi-intent SLU, consequently suppressing the

potential of PLMs. To this end, we build the Se-

mantic Intents Guidance mechanism and design an

auxiliary sub-task Slot Prediction, to exploit more

detailed generality together.

3 Methodology

In this section, we describe the proposed Prompt-

SLU along the inner ﬂow of data. In this frame-

work, we utilize different prompts to complete dif-

ferent sub-tasks, which is an intuitive way sim-

ilar to Su et al. (2022). However, PromptSLU

is specially designed for multi-intent SLU that

contains a particular intent-slot interaction mech-

anism, namely

Semantic Intent Guidance

(SIG).

We also propose a new auxiliary sub-task

Slot Pre-

diction

(SP) to improve Intent Detection (ID) and

Slot Filling (SF).

Given the utterance

X={x1, x2, x3, ..., xN}

for each of sub-tasks including ID, SF and SP,

PromptSLU inputs to the backbone model the con-

catenation of a task-speciﬁc preﬁx and X, then

makes predictions. During SF or SP, intents can

also be involved. The whole framework is shown

in Figure 2. Three sub-tasks are jointly trained with

a sharing pre-trained language model (PLM).

3.1 Intent Detection

Traditionally, intents are deﬁned as a ﬁxed number

of labels. A model is fed with input text, then maps

it to these labels. Differently, we model the task

in the form of text generation, i.e., the backbone

model produces a sequence of intents correspond-

ing to the input utterance.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

AUniedFrameworkforMulti-intentSpokenLanguageUnderstandingwithpromptingFeifanSong,LianzheHuangandHoufengWangMOEKeyLaboratoryofComputationalLinguistics,PekingUniversity,Chinasongff@stu.pku.edu.cn{hlz,wanghf}@pku.edu.cnAbstractMulti-intentSpokenLanguageUnderstandinghasgreatpotentialforwidespreadimplem...

展开>> 收起<<

A Uniﬁed Framework for Multi-intent Spoken Language Understanding with prompting Feifan Song Lianzhe Huang and Houfeng Wang.pdf

共11页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

A Uniﬁed Framework for Multi-intent Spoken Language Understanding with prompting Feifan Song Lianzhe Huang and Houfeng Wang

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: