TaskMix Data Augmentation for Meta-Learning of Spoken Intent Understanding Surya Kant Sahu

2025-04-26 3 0 377.53KB 8 页 10玖币

侵权投诉

TaskMix: Data Augmentation for Meta-Learning of Spoken Intent

Understanding

Surya Kant Sahu

Skit.ai

The Learning Machines

surya.oju@pm.me

Abstract

Meta-Learning has emerged as a research di-

rection to better transfer knowledge from re-

lated tasks to unseen but related tasks. How-

ever, Meta-Learning requires many training

tasks to learn representations that transfer well

to unseen tasks; otherwise, it leads to over-

ﬁtting, and the performance degenerates to

worse than Multi-task Learning. We show that

a state-of-the-art data augmentation method

worsens this problem of overﬁtting when the

task diversity is low. We propose a sim-

ple method, TaskMix, which synthesizes new

tasks by linearly interpolating existing tasks.

We compare TaskMix against many baselines

on an in-house multilingual intent classiﬁca-

tion dataset of N-Best ASR hypotheses derived

from real-life human-machine telephony utter-

ances and two datasets derived from MTOP.

We show that TaskMix outperforms baselines,

alleviates overﬁtting when task diversity is low,

and does not degrade performance even when

it is high.

1 Introduction

Deep learning has seen a meteoric rise in Speech

and Language related applications, leading to large-

scale applications of Voice-bots, Voice Assistants,

Chatbots, etc., which aim to automate mundane

tasks such as answering users’ queries either in

Spoken or textual modality. In many applications,

users tend to code-switch or use borrowed words

from other languages. A model trained for a partic-

ular language will not understand these borrowed

words, and hence language-speciﬁc models are un-

desirable in such scenarios. On the other hand, a

multilingual model can understand and reason what

the user is speaking.

Due to the scale of the applications, data cap-

tured from various sources have different distribu-

tions or have different use-cases. Recently, Meta-

Learning has emerged as a novel research direction

that aims to leverage knowledge from diverse sets

of data to learn a transferable initialization so that

a low amount of training data is required to adapt

to new datasets or tasks.

However, Meta-Learning requires a large num-

ber of training tasks, or else the model would over-

ﬁt to the training tasks and would not generalize

well to new tasks (Yao et al.,2021). In this work,

we propose a novel Data Augmentation method,

TaskMix for meta-learning problems, inspired by

MixUp (Zhang et al.,2018). We investigate our pro-

posed method against baselines such as MetaMix

(Yao et al.,2021), Multitask-Learning, and vanilla

Transfer Learning for multi-domain multi-lingual

Spoken Intent Classiﬁcation.

2 Preliminaries

In this section we describe the problem formulation

and the prior work which we built upon.

2.1 Problem Formulation

Let

p(T)

be a distribution over tasks from which

training tasks

T0,T1,T2,...,TT−1

are sampled.

The Meta-Learning objective is to learn a model

with parameters

such that

quickly adapts to pre-

viously unseen tasks, which are assumed to be sam-

pled from the same underlying distribution

p(T)

;

for this paper, each task is a tuple

X,Y=T

, where

is a set of N-Best hypotheses of utterances, and

is a set of corresponding one-hot-encoded intent

classes.

The number of classes in each

may differ, and

utterances from different

may be of different

language or a different domain. This formulation

is general and caters to real-life datasets.

Many meta-learning methods divide each train-

ing task into two disjoint sets: support

Xs,Ys

and

query

Xq,Yq

. However, Bai et. al (Bai et al.,

2021) have shown that a query-set is unneces-

sary for meta-learning. Hence, throughout this

arXiv:2210.06341v1 [cs.CL] 26 Sep 2022

Figure 1: (Left) Statistics of the two datasets used in this paper. MTOP-Wide has a high #tasks and a low mean

#examples per task; our In-house dataset has low #tasks, but a high mean #examples per task. (Right) Average

Macro F1 scores of Model-Agnostic Meta-Learning (MAML) and MAML+MetaMix on both datasets. MetaMix

is beneﬁcial for MTOP-Long due to low mean #examples per task, whereas MetaMix worsens the performance

our in-house dataset where the mean #examples per task is high.

Algorithm 1 MAML Update, MetaT rain()

Require: α: Learning rate for the inner loop.

Require: β: Learning rate for the outer loop.

Require: n: Iterations for the inner loop.

Require: L(t, φ)

: Loss function for task

w.r.t.

1: for Ti∼p(T)do Sample from support set

2: θi←θ  Copy weights

3: for j= 1 to ndo

4: Evaluate ∇θL(Ts

i, θ)

5: θi←θi−α∇θL(Ts

i, θ)

6: end for

7: end for

8: θ←θ−β∇θPTq

i∼p(T)L(Tq

i, θi)Update

using query set

work, we do not split the meta-training tasks, i.e.,

Xs=Xq=Xand Ys=Yq=Y

2.2 Model-Agnostic Meta-Learning

MAML (Finn et al.,2017) learns the meta-

parameters

by ﬁrst, optimizing for multiple steps

on a speciﬁc task

, yielding

θi

which is the opti-

mal task-speciﬁc parameters. This is done for each

meta-training task

Ti∼p(T)

. Secondly, The loss

on the held-out query set is computed, which is

back-propagated through the computation graph

through each task. Finally, we update

such that

can be quickly be adapted to each θi.

The procedure is outlined in Algorithm 1.

The authors argued that the held-out query set,

which isn’t used in the inner-loop optimization,

prevents the overﬁtting of task-speciﬁc parame-

ters

θi

and hence improves generalization of meta-

parameters θto new and unknown tasks.

However, (Bai et al.,2021) showed that split-

ting meta-training tasks into the disjoint query and

support sets performs inferior to not splitting at all.

Following these results, we do not split and sample

data from the same set for inner and outer loops.

2.3 MixUp

MixUp (Zhang et al.,2018) is a data augmentation

technique that synthesizes new datapoints by lin-

early combining random datapoints in the training

set, encouraging a simple, linear behavior between

training examples, improving generalization and

robustness to noise. The interpolation parameter

is sampled randomly from the Beta distribution at

each training step. As mixing sequences of discrete

tokens, such as sentences, is not possible, following

(Sun et al.,2020), we only mix the output features

of the transformer model.

MetaMix uses MixUp to intra-task datapoints,

creating new datapoints within the same task.

Whereas our proposed method, TaskMix, extends

MixUp to cross-task datapoints, creating new meta-

training tasks.

2.4 MetaMix

MetaMix (Yao et al.,2021) is an application of

MixUp to the meta-learning setting. MetaMix en-

courages generalization within tasks by combin-

ing query-set datapoints. Fig. 2illustrates how

MAML+MetaMix differs from MAML. MetaMix

introduces an additional gradient for each task

by mixing random datapoints within each task.

MetaMix is a data augmentation method for Meta-

Learning where MixUp is applied to random pairs

of datapoints within a batch of query set datapoints

of each task.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

TaskMix:DataAugmentationforMeta-LearningofSpokenIntentUnderstandingSuryaKantSahuSkit.aiTheLearningMachinessurya.oju@pm.meAbstractMeta-Learninghasemergedasaresearchdi-rectiontobettertransferknowledgefromre-latedtaskstounseenbutrelatedtasks.How-ever,Meta-Learningrequiresmanytrainingtaskstolearnreprese...

展开>> 收起<<

TaskMix Data Augmentation for Meta-Learning of Spoken Intent Understanding Surya Kant Sahu.pdf

共8页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

TaskMix Data Augmentation for Meta-Learning of Spoken Intent Understanding Surya Kant Sahu

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: