26, 27] on YouTube and SpeechStew, we achieve state-
of-the-art performance on CHiME-6.
While we have limited our scope to ASR in this work, G-
Augment is a general framework that can be applied to any
task where augmentation is utilized.
2. RELATED WORKS
Data augmentation is an effective method for improving the
generalization performance of deep learning models. Do-
main specific augmentations have been utilized for a variety
of domains, from image classification [2] to speech recog-
nition [11]. More recently, automated data augmentation
methods have been utilized for increasing the efficacy of data
augmentation for 2D image classification, where a policy is
(meta)-learned using a large search space of possible data
augmentations [13, 28, 29]. Automated data augmentation
methods have been made more efficient for image classifica-
tion [14, 30, 31] and have been extended to other modalities
in vision [32, 33]. Some efforts to apply automated augmen-
tation search for ASR tasks has also shown success [15, 34].
While these previous attempts learned augmentation pa-
rameters such as application probability and distortion mag-
nitude from the data, they used a manually chosen augmen-
tation structure. For example, AutoAugment learned 25 sub-
policies, each with two layers of augmentation. Decisions
such as whether flips and cropping should be applied were
manually made specific to the dataset [14, 30, 31]. For exam-
ple, Cutout [35] was always applied after the AutoAugment
layers on CIFAR-10 [36], but not on reduced SVHN [37]. In
this work, we fully automate the optimization of the data aug-
mentation policy: the graph structure of the policy and param-
eters of individual augmentations that make up the policy are
learned jointly. In addition, we are able to demonstrate posi-
tive transfer of augmentation policies from the search task to
set-ups with different training dynamics and model sizes. To
our knowledge, our work is the first example of an automated
data augmentation approach that outperforms manually de-
signed policies in speech recognition such as SpecAugment.
Methods for searching over graph spaces have been exten-
sively investigated in the context of neural architecture search
[38, 39]. While we choose to use a relatively simple evolu-
tionary algorithm [18, 19] to search over augmentation poli-
cies in this particular work, a variety of methods have been
employed for such searches in the literature [38, 39, 40, 41,
42], an extensive list of which can be found in [43].
3. G-AUGMENT
To search for both how the augmentations are applied and the
parameters of the augmentations themselves, we parameterize
an augmentation policy as a graph with labeled nodes and
edges. Here we describe the details of this parameterization
and the algorithm we employ to search over this space.
3.1. Search Space
We parameterize an augmentation by a directed acyclic
graph (DAG), consisting of a single input node, a single out-
put node and a number Nof ensemble nodes. The input node
has only outgoing edges that can connect to ensemble nodes.
The output node connects to a single ensemble node via an
edge, which passes its state to the output. Each node repre-
sents an augmented state of the data, while the edges represent
the augmentations themselves.
Each ensemble node of the graph takes two inputs (Figure
3.1). We denote one of the incoming edges the left edge and
the other the right edge of a given ensemble node for con-
venience. We denote the node connected to the tail of the
left/right edge as the left/right input, respectively. The incom-
ing edges are labeled by sampling probabilities pland prthat
sum to unity, and quadruples aland arthat represent aug-
mentations. The state of a given node is obtained by applying
the augmentations al/arto the left/right inputs and sampling
them with probability pl/prrespectively. In other words,
(node state) =(al(left input)w/ probability pl,
ar(right input)w/ probability pr.(1)
We require the graph to be directed and all directed paths
trace back to the input. This is enforced by assigning indices
1,· · · , N to the ensemble nodes, the index 0 to the input node,
and selecting the left/right input indices of node nfrom [0, n−
1]. The output node is always connected to node N.
The augmentation policy is applied to an input by stochas-
tically back-propagating through the graph. Given an input,
the augmentation to be applied to that input is determined by
starting at the output node and back-propagating through the
graph by randomly selecting the path to travel based on the
selection probability of the edges. The path connecting the
input node to the output node sampled this way represents an
augmentation obtained by sequentially composing the aug-
mentations encountered in the path. The probability of a par-
ticular path being selected is obtained by multiplying the se-
lection probabilities. This process is depicted in figure 3.1
along with pseudo-code. Any AutoAugment [13] policy or
RandAugment policy [14] is representable by such a graph.
To make the search space uniform, we represent an aug-
mentation by a quadruple a= (t, q, x1, x2)where tis a string
denoting the type of augmentation, qis the application prob-
ability (not to be confused with the sampling probability) and
x1and x2are the strength parameters for the augmentation.
xiare taken to have 11 discrete integer values from 0 to 10.
We employ 17 augmentations, which we list in section 3.3.
2