A Unified Framework for Multi-intent Spoken Language Understanding with prompting Feifan Song Lianzhe Huang and Houfeng Wang

2025-04-27 0 0 534.61KB 11 页 10玖币
侵权投诉
A Unified Framework for Multi-intent Spoken Language Understanding
with prompting
Feifan Song, Lianzhe Huang and Houfeng Wang
MOE Key Laboratory of Computational Linguistics, Peking University, China
songff@stu.pku.edu.cn
{hlz, wanghf}@pku.edu.cn
Abstract
Multi-intent Spoken Language Understanding
has great potential for widespread implemen-
tation. Jointly modeling Intent Detection and
Slot Filling in it provides a channel to ex-
ploit the correlation between intents and slots.
However, current approaches are apt to for-
mulate these two sub-tasks differently, which
leads to two issues: 1) It hinders models
from effective extraction of shared features. 2)
Pretty complicated structures are involved to
enhance expression ability while causing dam-
age to the interpretability of frameworks. In
this work, we describe a Prompt-based Spoken
Language Understanding (PromptSLU) frame-
work, to intuitively unify two sub-tasks into
the same form by offering a common pre-
trained Seq2Seq model. In detail, ID and SF
are completed by concisely filling the utter-
ance into task-specific prompt templates as in-
put, and sharing output formats of key-value
pairs sequence. Furthermore, variable intents
are predicted first, then naturally embedded
into prompts to guide slot-value pairs infer-
ence from a semantic perspective. Finally, we
are inspired by prevalent multi-task learning to
introduce an auxiliary sub-task, which helps to
learn relationships among provided labels. Ex-
periment results show that our framework out-
performs several state-of-the-art baselines on
two public datasets.
1 Introduction
Spoken Language Understanding (SLU) is a fun-
damental part in the area of Task-oriented Dia-
logue (TOD) modeling. It serves as the entrance
of the pipeline with two sub-tasks: semantic infor-
mation understanding by detecting user intents and
extraction by filling the prepared slots, called Intent
Detection (ID) and Slot Filling (SF), respectively.
The former is usually modeled into a text classifica-
tion problem and the latter is completed by cutting
out fragments from an input utterance in the form
of sequence tagging.
show me the cheapest fare ...
atis_cheapest atis_ground_service
O B-cost_relativeOO O
Intents
Input
Slots
show me the cheapest fare in the database and
ground transportation san francisco
Figure 1: Illustration of prior paradigm for jointly mod-
elling multi-intent Spoken Language Understanding.
Early works (Ravuri and Stolcke,2015;Kim
et al.,2017) focused on two sub-tasks separately,
but the correlation between two sub-tasks is seen
as the key to further improvement of SLU in recent
years. Thus, such jointly modeling techniques were
offered to bridge the gap of the common features
between Intent Detection and Slot Filling. Chen
et al. (2019) treated BERT as a shared contextual
encoder for two sub-tasks, while Qin et al. (2019)
further brought token-level intent prediction results
into Slot Filling process. Apart from these con-
ventional NLU pathways, Zhang et al. (2021) and
Tassias (2021) chose to formulate SLU as a con-
strained generation task.
However, real-world dialogue is more com-
plex. Each utterance can be longer and contain
more than one intent. As is shown in Figure 1,
the utterance “Show me the cheapest fare in the
database and ground transportation san francisco”
contains two distinct intents (atis_cheapest and
atis_ground_service). For this kind of multi-intent
SLU, similar jointly modelling methods are dis-
cussed (Gangadharaiah and Narayanaswamy,2019;
Qin et al.,2020;Ding et al.,2021;Qin et al.,2021;
Chen et al.,2022;Cai et al.,2022) and work well on
several datasets, where multiple intents detection
is often formulated as a multi-label text classifica-
tion problem (Gangadharaiah and Narayanaswamy,
arXiv:2210.03337v1 [cs.CL] 7 Oct 2022
2019), and potential interaction among intent and
slot labels also plays an essential role (Qin et al.,
2020;Ding et al.,2021;Qin et al.,2021;Chen et al.,
2022;Cai et al.,2022).
Although following the path of jointly modeling,
prior methods tend to integrate ID and SF during
encoding by providing a shared feature extractor,
but treat them individually when decoding. Fig-
ure 1displays different decoding processes of two
sub-tasks as multi-label classification and sequence
tagging. The vast distinction in task formulation
between ID and SF is indispensable. It discourages
models from effective shared feature extraction and
in turn, hinders the potential of comprehensively
better performance. Consequently, unifying the
two sub-tasks from start to finish with a common
framework is significant.
In this paper, we propose a framework to lever-
age pre-trained language models (PLMs) and
prompting to handle the aforementioned chal-
lenges, called
Prompt
-based
S
poken
L
anguage
U
nderstanding (PromptSLU) framework. Instead
of using complicated structures to exploit the cor-
relation among labels across two sub-tasks, our
framework just incorporates both of them into text
generation tasks with prompts, where the respec-
tive outputs share a general format of sequences
of key-value pairs, namely Belief Span (Lei et al.,
2018). During inference, given an utterance and
certain task requirement, PromptSLU first fills the
utterance in a task-specific prompt template and
then inputs it into a pre-trained Seq2Seq model.
Besides, consistency between two sub-tasks is
crucial in SLU. Compared with prior settings, the
multi-intent scenario is more challenging for accu-
rate alignment from intents to slots because of the
greater length of utterances and increased number
of labels. For this issue, we explore an intuitive
way in which intents are driven to restrain SF by
plugging intents into prompt templates, namely
Semantic Intents Guidance (SIG). As is plain to hu-
mans in form, this design also allows the utilization
of semantic information of intents to promote over-
all comprehension of our framework. Furthermore,
inspired by multi-task learning used by Paolini et al.
(2021), Su et al. (2022) and Wang et al. (2022a),
we try an auxiliary sub-task, called Slot Predic-
tion (SP), to steer models to additionally maintain
semantic consistency. Experiments on two datasets
demonstrate that PromptSLU outperforms SOTA
baselines on most metrics, including those using
PLMs. Abundant ablation studies also show the
effectiveness of each component.
In summary, our contributions are as follows:
We take the first step to introduce prompts
into multi-intent SLU by transforming Intent
Detection and Slot Filling into a common text
generation formulation.
We present Semantic Intents Guidance Mech-
anism and a new sub-task Slot Prediction to
provide more intuitive channels for interaction
among texts, intents and slots.
Experiments on two public datasets show
better results of PromptSLU compared with
methods with SOTA performance, and the ef-
fectiveness of proposed strategies and seman-
tic information.
2 Related Work
2.1 Spoken Language Understanding
Spoken Language Understanding (SLU), in the
task-oriented dialog system, is mainly composed of
two sub-tasks, Intent Detection (ID) and Slot Fill-
ing (SF). As for task formulations, most existing
methods model ID into a text classification prob-
lem and SF into a sequence tagging problem. Early
works separately handled ID and SF (Schapire and
Singer,2000;Haffner et al.,2003;Raymond and
Riccardi,2007;Tur et al.,2011;Ravuri and Stol-
cke,2015;Kim et al.,2017;Wang et al.,2022b).
However, current approaches prefer to modeling
them together, with the consideration of the high
correlation between them, and thus lead to sub-
stantial improvement. Jointly modeling techniques
basically correspond to two different methodolo-
gies:
Parallel Model
where ID and SF get outputs re-
spectively, but share an utterance encoder, in an at-
tempt to exploit the common latent features (Zhang
and Wang,2016;Hakkani-Tür et al.,2016;Zhang
and Wang,2019).
Serial Model
built by detecting intent in the first
place and then utilizing intent information to navi-
gate SF (Zhang et al.,2019;Qin et al.,2019;Tas-
sias,2021).
Our framework takes both sides mentioned
above. First, It can be serially implemented ac-
cording to the second methodology, i.e., allowing
intents to benefit SF. Then, the results of two sub-
tasks come from the same Seq2Seq model in this
framework.
2.2 Multi-intent SLU
Multi-intent utterances tend to appear, at a higher
frequency in reality, than those with a single in-
tent. This problem has an increasing popularity
in society (Kim et al.,2017;Gangadharaiah and
Narayanaswamy,2019;Qin et al.,2020;Ding et al.,
2021;Qin et al.,2021;Chen et al.,2022;Cai et al.,
2022). Gangadharaiah and Narayanaswamy (2019)
used an attention-based neural network for jointly
modeling ID and SF in multi-intent scenarios. Qin
et al. (2020) proposed AGIF that organizes intent
labels and slot labels together in the form of graph
structure and focuses on the interaction of labels.
Ding et al. (2021) and Qin et al. (2021) utilize
graph attention network to attend on the associa-
tion among labels. Despite the attempt to build
the bridge between intent and slot labels, ID and
SF are still modeled in different forms, i.e., multi-
label classification for ID but sequence tagging for
SF. The essential problem has not been handled
that distinct modeling ways hinder the extraction
of common features. Our text-generation-based
framework differs from the above approaches by
unifying the formulation of ID and SF. On one
hand, this way is plain to humans and naturally
suitable for ID with variable intents. On the other
hand, every step of promotion of our framework
effectively benefits both sub-tasks. Our framework
takes both sides above to be equipped with explic-
itly hierarchical implementation, while the results
of two sub-tasks come from the same Seq2Seq
model.
2.3 Prompting
GPT-3 (Brown et al.,2020) provides a novel in-
sight on the area of Natural Language Processing
with the power of new paradigm (Liu et al.,2021),
namely prompt. Currently, it has been explored
in many directions by transforming the original
task into a text-to-text form (Zhong et al.,2021;
Lester et al.,2021;Cui et al.,2021;Wang et al.,
2022a;Tassias,2021;Su et al.,2022), where fine-
tuning on downstream tasks effectively refers to
knowledge from long-term pre-training processes,
and especially works well on low-resource scenar-
ios. Zhong et al. (2021) tried to align the form
of fine-tuning to next word prediction to optimize
performance on text classification tasks. In another
way, Lester et al. (2021) appealed to soft-prompt
to capture implicit and continuous signals. For se-
quence tagging, prompts appear as templates for
text filling or natural language descriptions of in-
struction (Cui et al.,2021;Wang et al.,2022a). As
to SLU in task-oriented dialogue (TOD), it consists
of two sub-tasks ID and SF which have homolo-
gous forms. Tassias (2021) depended on prior dis-
tributions of intents and slots to generate a prompt
containing all slots to be prdedicted before infer-
ence, lowering the difficulty of SF. Su et al. (2022)
proposed a prompt-based framework PPTOD to
integrate several parts in the pipeline of TOD, in-
cluding single-intent detection that is different from
our setting. However, such jointly modeling fash-
ions neglect the complexity of co-occurrence and
semantic similarity among tokens, intents and slots
in multi-intent SLU, consequently suppressing the
potential of PLMs. To this end, we build the Se-
mantic Intents Guidance mechanism and design an
auxiliary sub-task Slot Prediction, to exploit more
detailed generality together.
3 Methodology
In this section, we describe the proposed Prompt-
SLU along the inner flow of data. In this frame-
work, we utilize different prompts to complete dif-
ferent sub-tasks, which is an intuitive way sim-
ilar to Su et al. (2022). However, PromptSLU
is specially designed for multi-intent SLU that
contains a particular intent-slot interaction mech-
anism, namely
Semantic Intent Guidance
(SIG).
We also propose a new auxiliary sub-task
Slot Pre-
diction
(SP) to improve Intent Detection (ID) and
Slot Filling (SF).
Given the utterance
X={x1, x2, x3, ..., xN}
,
for each of sub-tasks including ID, SF and SP,
PromptSLU inputs to the backbone model the con-
catenation of a task-specific prefix and X, then
makes predictions. During SF or SP, intents can
also be involved. The whole framework is shown
in Figure 2. Three sub-tasks are jointly trained with
a sharing pre-trained language model (PLM).
3.1 Intent Detection
Traditionally, intents are defined as a fixed number
of labels. A model is fed with input text, then maps
it to these labels. Differently, we model the task
in the form of text generation, i.e., the backbone
model produces a sequence of intents correspond-
ing to the input utterance.
摘要:

AUniedFrameworkforMulti-intentSpokenLanguageUnderstandingwithpromptingFeifanSong,LianzheHuangandHoufengWangMOEKeyLaboratoryofComputationalLinguistics,PekingUniversity,Chinasongff@stu.pku.edu.cn{hlz,wanghf}@pku.edu.cnAbstractMulti-intentSpokenLanguageUnderstandinghasgreatpotentialforwidespreadimplem...

展开>> 收起<<
A Unified Framework for Multi-intent Spoken Language Understanding with prompting Feifan Song Lianzhe Huang and Houfeng Wang.pdf

共11页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:11 页 大小:534.61KB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 11
客服
关注