weakly-supervised image segmentation, i.e. seed,
expand and constrain (Kolesnikov and Lampert,
2016), we seed with relatively high-quality uni-
grams and bigrams in the texts, then expand them
to extract the candidate spans as accurately as possi-
ble. Secondly, we cast span classification as textual
entailment to naturally incorporate the entity type
information. For example, to determine whether “J.
K. Rowling” in “J. K. Rowling is a British author.”
is a
PERSON
entity or a non-entity, we treat “J. K.
Rowling is a British author.” as a premise, then
construct “J. K. Rowling is a person.” and “J. K.
Rowling is not an entity.” as hypotheses. In such
way, span classification is converted into determin-
ing which hypothesis is true. Moreover, the size of
training data is increased by such converting which
is beneficial for few-shot settings.
In this paper, we propose SEE-Few, a novel
multi-task learning framework (Seed, Expand and
Entail) for Few-shot NER. The seeding and ex-
panding modules are responsible for providing
as accurate candidate spans as possible for the
entailing module. Specifically, the seed selector
chooses some unigrams and bigrams as seeds based
on some metrics, e.g., the Intersection over Fore-
ground. The expanding module takes a seed and
the window around it into account and expands it
to a candidate span. Compared with enumerating
all possible
ˆ
n-gram spans, seeding and expanding
can significantly reduce the number of candidate
spans and alleviate the impact of negative spans
in the subsequent span classification stage. The
entailing module reformulates a span classification
task as a textual entailment task, leveraging contex-
tual clues and entity type information to determine
whether a candidate span is an entity and what
type of entity it is. All the three modules share
the same text encoder and are jointly learned. Ex-
periments were conducted on four NER datasets
under training-from-scratch few-shot setting. Ex-
perimental results show that the proposed approach
outperforms several state-of-the-art baselines.
The main contributions can be summarized as
follows:
•
A novel multi-task learning framework (Seed,
Expand and Entail), SEE-Few, is proposed for
few-shot NER without using source domain
data. In specific, the seeding and expanding
modules provide as accurate candidate spans
as possible for the entailing module. The en-
tailing module reformulates span classifica-
tion as a textual entailment task, leveraging
contextual clues and entity type information.
•
Experiments were conducted on four NER
datasets in training-from-scratch few-shot set-
ting. Experimental results show that the pro-
posed approach outperforms the state-of-the-
art baselines by significant margins.
2 Related Work
2.1 Few-shot NER
Few-shot NER aims at recognizing entities based
on only few labeled instances from each category.
A few approaches have been proposed for few-
shot NER. Methods based on prototypical network
(Snell et al.,2017) require complex episode train-
ing (Fritzler et al.,2019;Hou et al.,2020). Yang
and Katiyar (2020) abandon the complex meta-
training and propose NNShot, a distance-based
method with a simple nearest neighbor classifier.
Huang et al. (2021) investigate three orthogonal
schemes to improve the model generalization abil-
ity for few-shot NER. TemplateNER (Cui et al.,
2021) enumerates all possible text spans in input
text as candidate spans and classifies each span
based on its corresponding template score. Ma
et al. (2021) propose a template-free method to re-
formulate NER tasks as language modeling (LM)
problems without any templates. Tong et al. (2021)
propose to mine the undefined classes from miscel-
laneous other-class words, which also benefits few-
shot NER. Ding et al. (2021) present Few-NERD, a
large-scale human-annotated few-shot NER dataset
to facilitate the research.
However, most of these studies follow the man-
ner of episode training (Fritzler et al.,2019;Hou
et al.,2020;Tong et al.,2021;Ding et al.,2021)
or assume an available rich-resource source do-
main (Yang and Katiyar,2020;Cui et al.,2021),
which is in contrast to the real word application
scenarios that only very limited labeled data is
available for training and validation (Ma et al.,
2021). EntLM (Ma et al.,2021) is implemented
on training-from-scratch few-shot setting, but still
needs distant supervision datasets for label word
searching. The construction of distant supervi-
sion datasets requires additional expert knowledge.
Some works study generating NER datasets au-
tomatically to reduce labeling costs (Kim et al.,
2021;Li et al.,2021b). In this paper, we focus on
the few-shot setting without source domain data