humans to interpret their denoising capacity. Onoe
and Durrett (2019) proposed an explicit denois-
ing method, where they learn a filtering function
and a relabeling function to denoise DS data and
then train an entity typing model on the denoised
DS dataset. However, they only utilized a small
scale gold data to learn the filtering and relabeling
function. Besides, their model did not model the
dependency between context and entity phrases.
In this paper, we aim to develop an explicit de-
noising method for distantly supervised ultra-fine
entity typing. Our framework mainly consists of
two modules: a noise modeling component and an
entity typing model. The noise model estimates the
unknown labeling noise distribution over input con-
texts and observed (noisy) type labels. However,
noise modeling is challenging because the noise
information in the DS data is often unavailable,
and noise can vary with different distant labeling
techniques. To model the noise, we perturb the
small-scale gold-labeled dataset’s labels to mimic
the DS’s noise. Additionally, we utilize the
L1
norm regularization on the large-scale DS data to
pursue the sparseness of labeling noise. Our noise
model conditions on the input context sentence and
its noisy labels to measure the underlying noise,
where the denoised data can be recovered from DS
data by subtracting the noise. For the entity typing
model, we adopt a bi-encoder architecture to match
input context and type phrases and train the entity
typing model on gold labeled and denoised data.
Finally, we design an iterative training (Tanaka
et al.,2018;Xie et al.,2020) procedure to train the
noise model and entity typing model iteratively to
enhance each other.
We summarize our contributions as follows:
(i) We propose a denoising enhanced ultra-fine en-
tity typing model under the distant supervised set-
ting, including noise modeling and entity typing
modeling. Unlike previous denoising work (Onoe
and Durrett,2019) to filter low-quality samples, our
noise model directly measures underlying labeling
noise, regardless of DS techniques.
(ii) Onoe and Durrett (2019) learns a relabel func-
tion to directly relabel samples, while, we model
the labeling noise. iii) We evaluate our model
on both the Ultra-Fine entity typing (UFET) and
OntoNotes datasets, which are benchmarks for dis-
tantly supervised ultra-fine entity typing and fine-
grained entity typing tasks. We show that our
model can effectively denoise the DS data and
learn a superior entity typing model through de-
tailed comparison, analysis, and case study.
2 Related Works
2.1 Ultra-Fine Entity Typing
The ultra-fine entity typing task was first proposed
by Choi et al. (2018). They considered a multi-
task objective, where they divide labels into three
bins (general, fine, and ultra-fine), and update la-
bels only in a bin containing at least one positive
label. To further reduce the distant supervision
noise, Xiong et al. (2019) introduces a graph prop-
agation layer to impose a label-relational bias on
entity typing models to implicitly capture type de-
pendencies. Onoe et al. (2021) uses box embed-
ding to capture latent type hierarchies, which is
more robust to the labeling noise comparing to vec-
tor embedding. Dai et al. (2021) proposes to obtain
more weakly supervised training data by prompting
weak labels from language models. Zhang et al.
(2022) leverages retrieval augmentation to resolve
the distant supervision noise.
Among the previous works, Onoe and Durrett
(2019) is the most similar to ours, where the filter-
ing function is used to discard useless instances,
and relabeling function is used to relabel an in-
stance. Through filtering and relabeling, Onoe and
Durrett (2019) explicitly denoise the distant super-
vision data. However, their denoising procedure
is trained only on a small-scale gold-labeled data,
while ignoring the large-scale data with distant su-
pervision labels. In addition, our denoising method
directly models the underlying label noise instead
of brutally filtering all the samples with partial
wrong labels.
2.2 Learning from Noisy Labeled Datasets
We briefly review the broad techniques for learning
from noisy labeled datasets. Traditionally, regu-
larization is an efficient method to deal with the
issue of DNNs easily fitting noisy labels, includ-
ing weight decay, and dropout. Besides, a few
studies achieve noise-robust classification using
noise-tolerant loss functions, such as mean square
error and mean absolute error (Ghosh et al.,2017).
Recently, self-training (Xie et al.,2020) first uses
labeled data to train a good teacher model, then
uses the teacher model to label unlabeled data, and
finally uses the labeled data and unlabeled data to
jointly train a student model. Furthermore, various