
rules to estimate the probabilistic class distribution
of each data point. In this work, we use a label
model to integrate the weak labels given by the
rules as pseudo-labels during the training process
to obviate the need for manual labeling.
To select the samples most likely to be clean,
we adopt a selection strategy based on small-loss,
which is a very common method that has been veri-
fied to be effective in many situations (Jiang et al.,
2018;Yu et al.,2019;Yao et al.,2020). Specifi-
cally, deep neural networks, have strong ability of
memorization (Wu et al.,2018;Wei et al.,2021),
will first memorize labels of clean data and then
those of noisy data with the assumption that the
clean data are of the majority in a noisy dataset.
Data with small loss can thus be regarded as clean
examples with high probability. Inspired by this
approach, we propose probabilistic margin score
(PMS) as a criterion to judge whether data are clean.
Instead of using the confidence given by a model
directly, a confidence margin is used for better per-
formance (Ye et al.,2020). We also performed a
comparative experiment on the use of margin ver-
sus the direct use of confidence, as described in
Sec. 3.3.
Sample selection based on weak labels can lead
to severe class imbalance. Consequently, models
trained using these imbalanced subsets can exhibit
both superior performance on majority classes and
inferior performance on minority classes (Cui et al.,
2019). A reweighted loss function can partially ad-
dress this problem. However, performance remains
nonetheless limited by noisy labels, that is, data
with majority-class features may be annotated as
minority-class data incorrectly, which misleads the
training process. Therefore, we propose a sample
selection strategy based on class-wise ranking (CR)
to address imbalanced data. Using this strategy, we
can select relatively balanced sample batches for
training and avoid the strong influence of the ma-
jority class.
To further exploit the expertise of labeling rules,
we also propose another sample selection strategy
called rule-aware ranking (RR). We use aggregated
labels as pseudo-labels in the WS paradigm and dis-
cards weak labels. However, the annotations gen-
erated by rules are likely to contain a considerable
amount of valid information. For example, some
rules yield a high proportion of correct results. The
higher the PMS, the more likely the labeling result
of the rules is to be close to the ground truth. Using
this strategy, we can select batches with clean data
for training and avoid the influence of noise.
The primary contributions of this work are sum-
marized as follows. (1) We propose a general,
model-agnostic weakly supervised leading frame-
work called ARS2 for imbalanced datasets; (2) we
also propose two reliable adaptive sampling strate-
gies to address data imbalance issues. (3) The
results of experiments on four benchmark datasets
are presented to demonstrate that the ARS2 im-
proved on the performance of existing imbalanced
learning and weakly supervised learning methods,
by 2%-57.8% in terms of F1-score.
2 Weakly Supervised Class-imbalanced
Text Classification
2.1 Problem Formulation
In this work, we study class-imbalanced text classi-
fication in a setting with weak supervision. Specifi-
cally, we consider an unlabeled dataset
D
consist-
ing of
N
documents, each of which is denoted by
xi∈ X
. For each document
xi
, the correspond-
ing label
yi∈ Y ={1,2, ..., C}
is unknown to us,
whereas the class prior
p(y)
is given and highly
imbalanced. Our goal is to learn a parameterized
function
f(·;θ) : X −→ ∆C1
which outputs the
class probability
p(y|x)
and can be used to clas-
sify documents during inference.
To address the lack of ground truth training la-
bels, we adopt the two-stage weak supervision
paradigm (Ratner et al.,2016b;Zhang et al.,2021).
In particular, we rely on
k
user-provided heuristic
rules
{ri}i∈{1,...,k}
to provide weak labels. Each
rule
ri
is associated with a particular label
yri∈ Y
,
and we denote by
li
the output of the rule
ri
. It
either assigns the associated label (
li=yri
) to a
given document or abstains (
li=−1
) on this ex-
ample. Note that the user-provided rules could be
noisy and conflict with one another. For the docu-
ment
x
, we concatenate the output weak labels of
k
rules
l1, ..., lk
as
lx
. Throughout this work, we
apply the weak labels output by heuristic rules to
train a text classifier.
2.2 Aggregation of Weak Labels
Label models are used to aggregate weak labels
under the weak supervision paradigm, which are
in turn used to train the desired end model in the
next stage. Existing label models include Major-
ity Voting (MV), Probabilistic Graphical Models
1∆Cis a C-dimension simplex.