trainable parameters.
Following the introduction of the transformer in
(Vaswani et al.,2017a), an influx of LLM archi-
tectures have continually progressed state-of-the-
art (SOTA) performance on many natural language
processing (NLP) tasks (Otter et al.,2021). Usu-
ally these models are pre-trained on a general self-
supervised learning task, after which they are fine-
tuned for a specific task. Fine-tuning such a model
can be computationally prohibitive due to the im-
mense number of trainable parameters. Further-
more, (Kaplan et al.,2020) found that the most
important factor for LLM performance is likely
model size, indicating that development of even
larger models is probable. Inspired by in-context
prompting, (Li and Liang,2021) proposed prefix
tuning as a parameter efficient alternative to fine-
tuning for natural language generation (NLG). The
LLM’s parameters are frozen and trainable pre-
fix tokens are prepended to the input sequence.
Prefix-tuning has been adapted to natural language
understanding (NLU) and performs comparably to
full fine-tuning across scales and tasks (Liu et al.,
2022).
We achieve SOTA results by augmenting the
pre-training architecture of ADB open intent clas-
sification (Zhang et al.,2021a) with prefix-tuning.
The combination of prefix-tuning with fine-tuning
only the last transformer layer was motivated by
(Kumar et al.,2022), which discovered that fine-
tuning the entire model can distort pre-trained
features. We find that alone, both prefix-tuning
or fine-tuning the last layer under-performs fine-
tuning all of BERT but when trained in tandem,
exceeds full fine-tuning.
The rest of this paper is structured as follows:
Section 2summarizes prior works in both in-
tent classification and parameter efficient tuning
(PET). Our methodology and model architecture
are defined in Section 3. In Sections 4and 5re-
spectively, we provide our experimentation struc-
ture and corresponding results as well as several
ablations. We finish with a conclusion and brief
discussion regarding limitations and ethics.
2 Related Works
2.1 Financial Virtual Agents
The effectiveness of VAs has led to their adoption
in the financial domain. (Galitsky and Ilvovsky,
2019) demonstrated an exemplary session with a
financial VA where the user queried for invest-
ment advice. CalFE leverages commercial chatbot
frameworks to train a finance-specific VA (Khan
and Rabbani,2020). (Ng et al.,2020) evaluates
the impact of a VA’s social presence on usage in-
tention in VAs for finance. All of these works re-
quire extracting intent from user utterances.
2.2 Intent Detection
Intent classification is a well-established NLU task
but most research limits the problem to known
classes (Zhang et al.,2019;E et al.,2019;Qin
et al.,2019;Zhang et al.,2021b). While hav-
ing prior knowledge of all expected intents is
ideal, this is rarely possible in a production en-
vironment, especially for new dialogue systems.
More realistically, a subset of intents are antici-
pated and new intents are discovered after deploy-
ment. (Brychcín and Král,2017) recognized the
challenge of identifying intents prior to training
and proposed an unsupervised method to group
intents, but by doing so, likely ignored informa-
tion available in the already identified intents. (Xia
et al.,2018) employed zero-shot learning to iden-
tify emerging intents but used an LSTM which
is hindered by non-parallelized learning and chal-
lenges in propagating long-range dependencies.
The same issue is present in DeepUnk, a BiLSTM-
based intent classification method using margin
loss (Lin and Xu,2019). (Zhan et al.,2021) shared
our open intent classification problem formulation
but synthetically generated out-of-domain sam-
ples for training which may not be as realistic as a
fine-grained open class representation.
Our work directly extends the ADB approach to
establishing an open class representation (Zhang
et al.,2021a). The novelty of our adaptation is
in leveraging prefix tuning in combination with
partial fine-tuning to improve the pre-training of
known intent representations without drastically
increasing the number of trainable parameters.
In parallel with our work, (Zhang et al.,2022)
extended their ADB approach to learn distance-
aware intent representations. Doing so resulted
in comparable performance to our modification
of their original approach. However, our tuning
method is model-agnostic and can easily be incor-
porated with their distance-aware representation
learning, likely improving the SOTA further.
2.3 Parameter Efficient Tuning
The desire for PET quickly emerged following
the introduction of LLMs. Adapter modules in-