
1.1 Motivation
In Figure 1, we illustrate a real-world case as to
why personalization and contextualization are very
important, especially due to the specificity in highly
entity-centric domains such as music. In this case,
masking the very last device response, we observe
that there is valuable information scattered across
the user’s requests in the session yet, the device de-
livers sub-par experience by responding defectively
multiple times before finally getting the user’s in-
tent right.
1.2 Notation and Preliminaries
Definition 1.
Let integer
γ
satisfy
1≤γ < ∞
.
A natural language (NL) hypothesis is a mapping,
h:Q→D×I×[E]γ
, where
Q
refers to the
query space,
D
refers to the domain space,
I
refers
to the intent space and
E
refers to the entity space.
The entity space,
E:= ET×EV
, may further be
decomposed into the entity type space
ET
and the
entity value space
EV
. All spaces are defined over
Unicode strings.
As an example, given a query string
q=
“play the
real slim shady”, the corresponding NL hypothesis
is
h(q) =
(Music, PlayMusicIntent, [(SongName,
the real slim shady)]) where the domain is Music,
the intent is PlayMusicIntent, and the entity value
is the real slim shady with SongName entity type.
Definition 2.
Building on Definition 1, our system,
PENTATRON, may be formalized as
Φ:(C, Q)→
EV
where
C
is the user space (anonymized using
a hash function, for privacy, in practice).
In a nutshell, given an input query
q
(with or
without dialogue context), our system essentially
solves the optimization problem,
min
θ
E(c,q,e)∼D [`(Φθ(c, q), e)] (1)
where Dis supported on C×Q×EV.
1.3 Our Contributions and Preview of
Results
On the system design front, we build a retrieval-
based pipeline. Our model backbone is inspired by
attention-based (Vaswani et al.,2017) transformer
encoders (Devlin et al.,2018). We achieve per-
sonalization via a non-parametric index which is
essentially a key-value pair look-up table with the
keys representing users and values representing
the entity lists derived from historical data aggre-
gation. With respect to experimental results, we
Figure 2: Preview of the system performance which
shows consistent significant improvement in going
from a purely personalized system (N) to a fully con-
textual personalized system (CC). Further details are
available in Table 1.
conduct extensive studies on seven different ver-
sions of PENTATRON, involving ablations with
prompts, multi-tasking and non-contextual train-
ing data, and show consistent improvements in Ex-
act Match (EM) of up to 500.97% (relative to the
baseline) as captured by the preview of results in
Figure 2.
2 Background and Related Work
2.1 Query Rewriting
Query Rewriting (QR) in dialogue systems aims
to reduce frictions by reformulating the automatic
speech recognition component’s interpretation of
users’ queries. Initial efforts (Dehghani et al.,2017;
Su et al.,2019) treat QR as a text generation prob-
lem.
Some recent studies (Chen et al.,2020;Yuan
et al.,2021;Fan et al.,2021;Cho et al.,2021) are
based on neural retrieval systems. In the retrieval-
based systems, the rewrite candidate pool is aggre-
gated from users’ habitual or historical queries so
that the rewrite quality can be tightly controlled.
Compared to generation-based systems, retrieval-
based systems may sacrifice flexibility and diver-
sity of the rewrites, but in the meanwhile provide
more stability which is more important in a runtime
production setup.
Personalization
and
Contextualization
are
two popular directions for QR systems. A per-
sonalized system such as Cho et al.,2021 tends to
incorporate diverse affinities and personal prefer-
ences to provide individually tailored user experi-
ence in a single unified system. Contextualization
attempts to utilize multi-turn queries rather than
only leveraging single-turn information. Some pre-