Improving Chinese Named Entity Recognition by Search Engine Augmentation Qinghua Mao and Kui Meng

2025-05-08 1 0 500.87KB 8 页 10玖币

侵权投诉

Improving Chinese Named Entity Recognition by Search Engine

Augmentation

Qinghua Mao and Kui Meng

Shanghai Jiao Tong University

{mmmm2018,mengkui}@sjtu.edu.cn

Jiatong Li

The University of Melbourne

jiatongl3@student.unimelb.edu.au

Abstract

Compared with English, Chinese suffers from

more grammatical ambiguities, like fuzzy

word boundaries and polysemous words. In

this case, contextual information is not sufﬁ-

cient to support Chinese named entity recogni-

tion (NER), especially for rare and emerging

named entities. Semantic augmentation using

external knowledge is a potential way to alle-

viate this problem, while how to obtain and

leverage external knowledge for the NER task

remains a challenge. In this paper, we propose

a neural-based approach to perform semantic

augmentation using external knowledge from

search engine for Chinese NER. In particu-

lar, a multi-channel semantic fusion model is

adopted to generate the augmented input rep-

resentations, which aggregates external related

texts retrieved from the search engine. Experi-

ments have shown the superiority of our model

across 4 NER datasets, including formal and

social media language contexts, which further

prove the effectiveness of our approach.

1 Introduction

Different from English, Chinese is correlated with

word segmentation and suffers from more polyse-

mous words and grammatical ambiguities. Given

that contextual information is limited, external

knowledge is leveraged to support the entity dis-

ambiguation, which is critical to improve Chinese

NER, especially for rare and emerging named enti-

ties.

Apart from lexical information (Gui et al.,2019),

other external sources of information has been

leveraged to perform semantic augmentation for

NER, such as external syntactic features (Li et al.,

2020a), character radical features (Xu et al.,2019)

and domain-speciﬁc knowledge (Zafarian and As-

ghari,2019). However, it takes extra efforts to

extract these information and most of them are

domain-speciﬁc. Search engine is a straightforward

way to retrieve open-domain external knowledge,

which can be evidence for recognizing those am-

biguous named entities. A motivating example is

shown in Figure 1.

失去懂王的日子让我索然无味

Unconventional Named Entities

Original Input External Related Texts

1.懂王，是一个网络流行语，这里指美国前任

总统特朗普。因为自其上台以来，口无遮拦，

全知全能过度自信…

2.马斯克改造推特第一步：解封特朗普禁令

欲召回“懂王”。“马斯克欲解封特朗普”的消息

一早登上微博热搜，引发热议…

3.懂王室友又来给我上课了，一百多块钱

掌握科学刷牙。

4.特朗普为何叫懂王与地堡男孩？其实他还有

好几个外号…

Life without the king who knows

everything makes me dull

Figure 1: A motivating example for recognizing

new entities using external related texts retrived from

the search engine, in which "the king knows every-

thing"("Dong Wang")is an unconventional named en-

tity referring to Donald J. Trump.

In this paper, we suggest to improve Chinese

NER by semantic augmentation through a search

engine. Inspired by Fusion-in-Decoder (Izacard

and Grave,2021), we propose a multi-channel se-

mantic fusion NER model which leverages exter-

nal knowledge to augment the contextual informa-

tion of the original input. Given external related

texts retrieved from the search engine, our model

ﬁrst adopts multi-channel BERT encoder to encode

each texts independently. An attention fusion layer

is utilized to incorporate external knowledge into

the original input representation. Finally, the fused

semantic representation is fed into CRF layer for

decoding.

We implement an external related texts genera-

tion module for optimizing retrieval results from

the search engine. TextRank (Mihalcea and Tarau,

2004) and BM25 (Robertson and Zaragoza,2009)

are utilized to generate external related texts which

are semantically relevant to the original input sen-

tence.

The experimental results on the generic domain

arXiv:2210.12662v1 [cs.CL] 23 Oct 2022

External

Context

Input Sentence BERT

BERT

Fusion Layer

Classifier

CRF Layer

External

Context

External

Context

Input Embedding

External

Embedding

External

Embedding

External

Embedding

Concatenate

... ... ...

...

Figure 2: The multi-channel semantic fusion NER model. Multi-channel BERT encodes each text independently.

Fusion layer generates fused semantic representations based on the attention mechanism, which are fed into CRF

layer for named entity prediction.

and social media domain show the superiority

of our approach. It demonstrates that search en-

gine augmentation can effectively improve Chinese

NER, especially for the social media domain.

2 Model

The proposed approach can be described in two

steps. Given an input sentence, external related

texts are retrieved from a search engine. The origi-

nal input sentence, along with external related texts,

is fed into the multi-channel semantic fusion NER

model to generate fused representations which ag-

gregates external knowledge obtained from the

search engine.

2.1 Multi-channel Semantic Fusion

We view NER task as a sequence labeling problem.

Our multi-channel semantic fusion NER model is

shown in Figure 2, in which BERT-CRF (Souza

et al.,2019) serves as the backbone structure.

Given original input

and

external related

texts

X={˜x1,˜x2, ..., ˜xK}

, multi-channel BERT

encoder is utilized to encode each texts indepen-

dently, from which original input embedding

and external embedding Hexternal are obtained.

[Hx, Hexternal] = BERT ([x, ˜

X]) (1)

where Hexternal ={h˜x1, h ˜x2, ..., h ˜xK}.

Processing texts independently with a multi-

channel encoder means that the computation time

grows linearly as the number of texts scales. So it

makes the model more extensible. Meanwhile, con-

textual information of each channel is independent

to facilitate the subsequent semantic fusion.

We feed input embedding and context embed-

ding into the attention fusion layer to generate

fused semantic representation, which is ﬁnally put

into CRF layer for decoding. In particular, for input

懂王是特朗普

懂王即将归来

attention

score

Context Embedding

𝑝

1 − 𝑝

⊕

Fused Semantic Representation

Input Embedding External Embedding

Figure 3: A token-level illustration of semantic fusion.

embedding

, we compute attention scores over

external embedding

Hexternal

to generate context

embedding

Hcontext

, which fuses external knowl-

edge based on semantic relevance to the original in-

put. The fused semantic representation

Hfusion

acquired by calculating the weighted sum of input

embedding and context embedding. The weights

of input embedding and context embedding are

set to a fusion factor

and

1−p

respectively. A

token-level illustration is shown in Figure 3.

Hcontext =Attention(Hx, Hexternal)(2)

Hfusion =p×Hx+ (1 −p)×Hcontext (3)

Three points are taken into account when design-

ing the fusion layer. First, sequence dependency

should be reserved, which is very important for

NER. Secondly, relation between original input

and external contexts should be considered in the

semantic fusion, i.e., the former is given priority

and external knowledge is supplementary evidence.

Thirdly, not all external related texts are necessary

for semantic augmentation, we should focus on

that part which is of help to accurately identify the

named entity.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ImprovingChineseNamedEntityRecognitionbySearchEngineAugmentationQinghuaMaoandKuiMengShanghaiJiaoTongUniversity{mmmm2018,mengkui}@sjtu.edu.cnJiatongLiTheUniversityofMelbournejiatongl3@student.unimelb.edu.auAbstractComparedwithEnglish,Chinesesuffersfrommoregrammaticalambiguities,likefuzzywordboundarie...

展开>> 收起<<

Improving Chinese Named Entity Recognition by Search Engine Augmentation Qinghua Mao and Kui Meng.pdf

共8页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Improving Chinese Named Entity Recognition by Search Engine Augmentation Qinghua Mao and Kui Meng

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: