TaskMix Data Augmentation for Meta-Learning of Spoken Intent Understanding Surya Kant Sahu

2025-04-26 0 0 377.53KB 8 页 10玖币
侵权投诉
TaskMix: Data Augmentation for Meta-Learning of Spoken Intent
Understanding
Surya Kant Sahu
Skit.ai
The Learning Machines
surya.oju@pm.me
Abstract
Meta-Learning has emerged as a research di-
rection to better transfer knowledge from re-
lated tasks to unseen but related tasks. How-
ever, Meta-Learning requires many training
tasks to learn representations that transfer well
to unseen tasks; otherwise, it leads to over-
fitting, and the performance degenerates to
worse than Multi-task Learning. We show that
a state-of-the-art data augmentation method
worsens this problem of overfitting when the
task diversity is low. We propose a sim-
ple method, TaskMix, which synthesizes new
tasks by linearly interpolating existing tasks.
We compare TaskMix against many baselines
on an in-house multilingual intent classifica-
tion dataset of N-Best ASR hypotheses derived
from real-life human-machine telephony utter-
ances and two datasets derived from MTOP.
We show that TaskMix outperforms baselines,
alleviates overfitting when task diversity is low,
and does not degrade performance even when
it is high.
1 Introduction
Deep learning has seen a meteoric rise in Speech
and Language related applications, leading to large-
scale applications of Voice-bots, Voice Assistants,
Chatbots, etc., which aim to automate mundane
tasks such as answering users’ queries either in
Spoken or textual modality. In many applications,
users tend to code-switch or use borrowed words
from other languages. A model trained for a partic-
ular language will not understand these borrowed
words, and hence language-specific models are un-
desirable in such scenarios. On the other hand, a
multilingual model can understand and reason what
the user is speaking.
Due to the scale of the applications, data cap-
tured from various sources have different distribu-
tions or have different use-cases. Recently, Meta-
Learning has emerged as a novel research direction
that aims to leverage knowledge from diverse sets
of data to learn a transferable initialization so that
a low amount of training data is required to adapt
to new datasets or tasks.
However, Meta-Learning requires a large num-
ber of training tasks, or else the model would over-
fit to the training tasks and would not generalize
well to new tasks (Yao et al.,2021). In this work,
we propose a novel Data Augmentation method,
TaskMix for meta-learning problems, inspired by
MixUp (Zhang et al.,2018). We investigate our pro-
posed method against baselines such as MetaMix
(Yao et al.,2021), Multitask-Learning, and vanilla
Transfer Learning for multi-domain multi-lingual
Spoken Intent Classification.
2 Preliminaries
In this section we describe the problem formulation
and the prior work which we built upon.
2.1 Problem Formulation
Let
p(T)
be a distribution over tasks from which
training tasks
T0,T1,T2,...,TT1
are sampled.
The Meta-Learning objective is to learn a model
with parameters
θ
such that
θ
quickly adapts to pre-
viously unseen tasks, which are assumed to be sam-
pled from the same underlying distribution
p(T)
;
for this paper, each task is a tuple
X,Y=T
, where
X
is a set of N-Best hypotheses of utterances, and
Y
is a set of corresponding one-hot-encoded intent
classes.
The number of classes in each
Y
may differ, and
utterances from different
X
may be of different
language or a different domain. This formulation
is general and caters to real-life datasets.
Many meta-learning methods divide each train-
ing task into two disjoint sets: support
Xs,Ys
and
query
Xq,Yq
. However, Bai et. al (Bai et al.,
2021) have shown that a query-set is unneces-
sary for meta-learning. Hence, throughout this
arXiv:2210.06341v1 [cs.CL] 26 Sep 2022
Figure 1: (Left) Statistics of the two datasets used in this paper. MTOP-Wide has a high #tasks and a low mean
#examples per task; our In-house dataset has low #tasks, but a high mean #examples per task. (Right) Average
Macro F1 scores of Model-Agnostic Meta-Learning (MAML) and MAML+MetaMix on both datasets. MetaMix
is beneficial for MTOP-Long due to low mean #examples per task, whereas MetaMix worsens the performance
our in-house dataset where the mean #examples per task is high.
Algorithm 1 MAML Update, MetaT rain()
Require: α: Learning rate for the inner loop.
Require: β: Learning rate for the outer loop.
Require: n: Iterations for the inner loop.
Require: L(t, φ)
: Loss function for task
t
w.r.t.
φ
1: for Tip(T)do Sample from support set
2: θiθ Copy weights
3: for j= 1 to ndo
4: Evaluate θL(Ts
i, θ)
5: θiθiαθL(Ts
i, θ)
6: end for
7: end for
8: θθβθPTq
ip(T)L(Tq
i, θi)Update
using query set
work, we do not split the meta-training tasks, i.e.,
Xs=Xq=Xand Ys=Yq=Y
2.2 Model-Agnostic Meta-Learning
MAML (Finn et al.,2017) learns the meta-
parameters
θ
by first, optimizing for multiple steps
on a specific task
Ti
, yielding
θi
which is the opti-
mal task-specific parameters. This is done for each
meta-training task
Tip(T)
. Secondly, The loss
on the held-out query set is computed, which is
back-propagated through the computation graph
through each task. Finally, we update
θ
such that
θ
can be quickly be adapted to each θi.
The procedure is outlined in Algorithm 1.
The authors argued that the held-out query set,
which isn’t used in the inner-loop optimization,
prevents the overfitting of task-specific parame-
ters
θi
and hence improves generalization of meta-
parameters θto new and unknown tasks.
However, (Bai et al.,2021) showed that split-
ting meta-training tasks into the disjoint query and
support sets performs inferior to not splitting at all.
Following these results, we do not split and sample
data from the same set for inner and outer loops.
2.3 MixUp
MixUp (Zhang et al.,2018) is a data augmentation
technique that synthesizes new datapoints by lin-
early combining random datapoints in the training
set, encouraging a simple, linear behavior between
training examples, improving generalization and
robustness to noise. The interpolation parameter
λ
is sampled randomly from the Beta distribution at
each training step. As mixing sequences of discrete
tokens, such as sentences, is not possible, following
(Sun et al.,2020), we only mix the output features
of the transformer model.
MetaMix uses MixUp to intra-task datapoints,
creating new datapoints within the same task.
Whereas our proposed method, TaskMix, extends
MixUp to cross-task datapoints, creating new meta-
training tasks.
2.4 MetaMix
MetaMix (Yao et al.,2021) is an application of
MixUp to the meta-learning setting. MetaMix en-
courages generalization within tasks by combin-
ing query-set datapoints. Fig. 2illustrates how
MAML+MetaMix differs from MAML. MetaMix
introduces an additional gradient for each task
by mixing random datapoints within each task.
MetaMix is a data augmentation method for Meta-
Learning where MixUp is applied to random pairs
of datapoints within a batch of query set datapoints
of each task.
摘要:

TaskMix:DataAugmentationforMeta-LearningofSpokenIntentUnderstandingSuryaKantSahuSkit.aiTheLearningMachinessurya.oju@pm.meAbstractMeta-Learninghasemergedasaresearchdi-rectiontobettertransferknowledgefromre-latedtaskstounseenbutrelatedtasks.How-ever,Meta-Learningrequiresmanytrainingtaskstolearnreprese...

展开>> 收起<<
TaskMix Data Augmentation for Meta-Learning of Spoken Intent Understanding Surya Kant Sahu.pdf

共8页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:8 页 大小:377.53KB 格式:PDF 时间:2025-04-26

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 8
客服
关注