flexible variants, e.g., re-sampling only a tunable
share of classes (Tepper et al.,2020) or interpo-
lating between the (imbalanced) data distribution
and an almost perfectly balanced distribution (Ari-
vazhagan et al.,2019), can also further improve
results. Class-aware sampling (
CAS
,Shen et al.,
2016), also referred to as class-balanced sampling,
first chooses a class, and then an instance from
this class. Performance-based re-sampling dur-
ing training, following the idea of Pouyanfar et al.
(2018), works well in multi-class text classification
(Akhbardeh et al.,2021).
Issues in multi-label classification.
In multi-
label classification, label dependencies between
majority and minority classes complicate sampling
approaches, as over-sampling an instance with a
minority label may simultaneously amplify the ma-
jority class count (Charte et al.,2015;Huang et al.,
2021). CAS also suffers from this issue, and addi-
tionally introduces within-class imbalance, as in-
stances of one class are selected with different prob-
abilities depending on the co-assigned labels (Wu
et al.,2020). Effective sampling in such settings is
still an open issue. Existing approaches monitor the
class distributions during sampling (Charte et al.,
2015) or assign instance-based sampling probabili-
ties (Gupta et al.,2019b;Wu et al.,2020).
3.2 Data Augmentation
Increasing the amount of minority class data dur-
ing corpus construction, e.g., by writing additional
examples or selecting examples to be labeled using
Active Learning, can mitigate the class imbalance
problem to some extent (Cho et al.,2020;Ein-Dor
et al.,2020). However, this is particularly labo-
rious in naturally imbalanced settings as it may
require finding “the needle in the haystack,” or may
lead to biased minority class examples, e.g., due
to collection via keyword queries. Synthetically
generating additional minority instances thus is a
promising direction. In this section, we survey data
augmentation methods that have been explicitly
proposed to mitigate class imbalance and that have
been evaluated in combination with DL.
Text augmentation
generates new natural lan-
guage instances of minority classes, ranging from
simple string-based manipulations such as syn-
onym replacements to Transformer-based gener-
ation. Easy Data Augmentation (
EDA
,Wei and
Zou,2019), which uses dictionary-based synonym
replacements, random insertion, random swap, and
random deletion, has been shown to work well
in class-imbalanced settings (Jiang et al.,2021;
Jang et al.,2021;Juuti et al.,2020). Juuti et al.
(2020) generate new minority class instances for
English binary text classification using EDA and
embedding-based synonym replacements, and by
adding a random majority class sentence to a mi-
nority class document. They also prompt the pre-
trained language model GPT-2 (Radford et al.,
2019) with a minority class instance to generate
new minority class samples. Tepper et al. (2020)
evaluate generation with GPT-2 on English multi-
class text classification datasets, coupled with a
flexible balancing policy (see Sec. 3.1).
Similarly, Gaspers et al. (2020) combine
machine-translation based text augmentation with
dataset balancing to build a multi-task model. Both
the main and auxiliary tasks are German intent clas-
sification. Only the training data for the latter is
balanced and enriched with synthetic minority in-
stances. In a long-tailed multi-label setting, Zhang
et al. (2022) learn an attention-based text augmen-
tation that augments instances with text segments
that are relevant to tail classes, leading to small im-
provements. In general, transferring methods such
as EDA or backtranslation to multi-label settings
is difficult (Zhang et al.,2022,2020;Tang et al.,
2020).
Hidden space augmentation
generates new in-
stance vectors that are not directly associated
with a particular natural language string, leverag-
ing the representations of real examples. Using
representation-based augmentations to tackle class
imbalance is not tied to DL. SMOTE (Chawla et al.,
2002), which interpolates minority instances with
randomly chosen examples from their K-nearest
neighbours, is popular in traditional machine learn-
ing (Fernández et al.,2018), but leads to mixed
results in DL-based NLP (Ek and Ghanimifard,
2019;Tran and Litman,2021;Wei et al.,2022).
Inspired by CutMix (Yun et al.,2019), which cuts
and pastes a single pixel region in an image,
Text-
Cut
(Jiang et al.,2021) randomly replaces small
parts of the BERT representation of one instance
with those of the other. In binary and multi-class
text classification experiments, TextCut improves
over non-augmented BERT and EDA.
Good-enough example extrapolation (
GE3
,Wei,
2021) and
REPRINT
(Wei et al.,2022) also oper-
ate in the original representation space. To synthe-
size a new minority instance, GE3 adds the vec-