intents to benefit SF. Then, the results of two sub-
tasks come from the same Seq2Seq model in this
framework.
2.2 Multi-intent SLU
Multi-intent utterances tend to appear, at a higher
frequency in reality, than those with a single in-
tent. This problem has an increasing popularity
in society (Kim et al.,2017;Gangadharaiah and
Narayanaswamy,2019;Qin et al.,2020;Ding et al.,
2021;Qin et al.,2021;Chen et al.,2022;Cai et al.,
2022). Gangadharaiah and Narayanaswamy (2019)
used an attention-based neural network for jointly
modeling ID and SF in multi-intent scenarios. Qin
et al. (2020) proposed AGIF that organizes intent
labels and slot labels together in the form of graph
structure and focuses on the interaction of labels.
Ding et al. (2021) and Qin et al. (2021) utilize
graph attention network to attend on the associa-
tion among labels. Despite the attempt to build
the bridge between intent and slot labels, ID and
SF are still modeled in different forms, i.e., multi-
label classification for ID but sequence tagging for
SF. The essential problem has not been handled
that distinct modeling ways hinder the extraction
of common features. Our text-generation-based
framework differs from the above approaches by
unifying the formulation of ID and SF. On one
hand, this way is plain to humans and naturally
suitable for ID with variable intents. On the other
hand, every step of promotion of our framework
effectively benefits both sub-tasks. Our framework
takes both sides above to be equipped with explic-
itly hierarchical implementation, while the results
of two sub-tasks come from the same Seq2Seq
model.
2.3 Prompting
GPT-3 (Brown et al.,2020) provides a novel in-
sight on the area of Natural Language Processing
with the power of new paradigm (Liu et al.,2021),
namely prompt. Currently, it has been explored
in many directions by transforming the original
task into a text-to-text form (Zhong et al.,2021;
Lester et al.,2021;Cui et al.,2021;Wang et al.,
2022a;Tassias,2021;Su et al.,2022), where fine-
tuning on downstream tasks effectively refers to
knowledge from long-term pre-training processes,
and especially works well on low-resource scenar-
ios. Zhong et al. (2021) tried to align the form
of fine-tuning to next word prediction to optimize
performance on text classification tasks. In another
way, Lester et al. (2021) appealed to soft-prompt
to capture implicit and continuous signals. For se-
quence tagging, prompts appear as templates for
text filling or natural language descriptions of in-
struction (Cui et al.,2021;Wang et al.,2022a). As
to SLU in task-oriented dialogue (TOD), it consists
of two sub-tasks ID and SF which have homolo-
gous forms. Tassias (2021) depended on prior dis-
tributions of intents and slots to generate a prompt
containing all slots to be prdedicted before infer-
ence, lowering the difficulty of SF. Su et al. (2022)
proposed a prompt-based framework PPTOD to
integrate several parts in the pipeline of TOD, in-
cluding single-intent detection that is different from
our setting. However, such jointly modeling fash-
ions neglect the complexity of co-occurrence and
semantic similarity among tokens, intents and slots
in multi-intent SLU, consequently suppressing the
potential of PLMs. To this end, we build the Se-
mantic Intents Guidance mechanism and design an
auxiliary sub-task Slot Prediction, to exploit more
detailed generality together.
3 Methodology
In this section, we describe the proposed Prompt-
SLU along the inner flow of data. In this frame-
work, we utilize different prompts to complete dif-
ferent sub-tasks, which is an intuitive way sim-
ilar to Su et al. (2022). However, PromptSLU
is specially designed for multi-intent SLU that
contains a particular intent-slot interaction mech-
anism, namely
Semantic Intent Guidance
(SIG).
We also propose a new auxiliary sub-task
Slot Pre-
diction
(SP) to improve Intent Detection (ID) and
Slot Filling (SF).
Given the utterance
X={x1, x2, x3, ..., xN}
,
for each of sub-tasks including ID, SF and SP,
PromptSLU inputs to the backbone model the con-
catenation of a task-specific prefix and X, then
makes predictions. During SF or SP, intents can
also be involved. The whole framework is shown
in Figure 2. Three sub-tasks are jointly trained with
a sharing pre-trained language model (PLM).
3.1 Intent Detection
Traditionally, intents are defined as a fixed number
of labels. A model is fed with input text, then maps
it to these labels. Differently, we model the task
in the form of text generation, i.e., the backbone
model produces a sequence of intents correspond-
ing to the input utterance.