Exploring the Value of Pre-trained Language Models for Clinical Named Entity Recognition Samuel Belkadi Lifeng Han Yuping Wu and Goran Nenadic

2025-05-06 0 0 3.16MB 14 页 10玖币

侵权投诉

Exploring the Value of Pre-trained Language Models

for Clinical Named Entity Recognition

Samuel Belkadi∗, Lifeng Han∗, Yuping Wu, and Goran Nenadic

Department of Computer Science, The University of Manchester, UK

*co-ﬁrst authors

samuel.belkadi@student.manchester.ac.uk

lifeng.han, yuping.wu, g.nenadic@manchester.ac.uk

Abstract

The practice of ﬁne-tuning Pre-trained Lan-

guage Models (PLMs) from general or domain-

speciﬁc data to a speciﬁc task with limited re-

sources, has gained popularity within the ﬁeld

of natural language processing (NLP). In this

work, we re-visit this assumption and carry

out an investigation in clinical NLP, speciﬁ-

cally Named Entity Recognition on drugs and

their related attributes. We compare Trans-

former models that are trained from scratch to

ﬁne-tuned BERT-based LLMs namely BERT,

BioBERT, and ClinicalBERT. Furthermore, we

examine the impact of an additional CRF

layer on such models to encourage contex-

tual learning. We use n2c2-2018 shared

task data for model development and eval-

uations. The experimental outcomes show

that 1) CRF layers improved all language

models; 2) referring to BIO-strict span level

evaluation using macro-average F1 score, al-

though the ﬁne-tuned LLMs achieved 0.83+

scores, the TransformerCRF model trained

from scratch achieved 0.78+, demonstrating

comparable performances with much lower

cost - e.g. with 39.80% less training pa-

rameters; 3) referring to BIO-strict span-level

evaluation using weighted-average F1 score,

ClinicalBERT-CRF, BERT-CRF, and Trans-

formerCRF exhibited lower score differences,

with 97.59%/97.44%/96.84% respectively. 4)

applying efﬁcient training by down-sampling

for better data distribution further reduced the

training cost and need for data, while main-

taining similar scores - i.e. around 0.02 points

lower compared to using the full dataset.

Our models will be hosted at

https://github.com/HECTA-UoM/

TransformerCRF.

1 Introduction

Fine-tuning Pre-trained Language Models (PLMs)

has demonstrated state-of-the-art abilities in solv-

ing natural language processing tasks, including

text mining (Zhang et al.,2021), named entity

recognition (Dernoncourt et al.,2017), reading

comprehension (Sun et al.,2020), machine trans-

lation (Vaswani et al.,2017;Devlin et al.,2019),

and summarisation (Gokhan et al.,2021;Wu et al.,

2022). Domain applications of PLMs have spanned

a much wider variety including ﬁnancial, legal, and

biomedical texts, in addition to traditional news

and social media domains. For instance, experi-

mental work on BioBERT (Lee et al.,2019) and

BioMedBERT (Chakraborty et al.,2020) show-

cased high evaluation scores by exploiting BERT’s

(Devlin et al.,2019) structure to train on biomedi-

cal data. Additionally, ﬁne-tuned SciFive, BioGPT,

and BART models produced reasonable experimen-

tal outputs on biomedical abstract simpliﬁcation

tasks (Li et al.,2023).

However, ongoing investigations try to under-

stand the extent to which ﬁne-tuning PLMs in-

creases performances against language models

trained from scratch on domain-speciﬁc tasks (Gu

et al.,2021). Researchers often assume that ﬁne-

tuning becomes indeed helpful when dealing with

tasks that have limited available data, and where

PLMs can leverage additional knowledge acquired

from extensive out-of-domain or domain-related

data. Therefore, an important question arises:

Given a domain-speciﬁc task, how limited should

the available data be for mixed-domain pre-training

to be considered beneﬁcial? Surprisingly, no previ-

ous studies have provided statistics in this regard.

In this paper, our focus is on clinical domain text

mining, and our objective is to examine the afore-

mentioned hypothesis. Speciﬁcally, we aim to de-

termine whether PLMs outperform models trained

from scratch when given access to limited data in

a constrained setting, and to what extent this im-

provement occurs.

In comparison to other domains, clinical text

mining (CTM) is still considered a relatively new

task for PLM applications, as CTM is well known

arXiv:2210.12770v4 [cs.CL] 30 Oct 2023

for data-scarce issues due to a small amount of

human-annotated corpora and privacy concerns.

In this work, we ﬁne-tune PLMs from the gen-

eral domain BERT, biomedical domain BioBERT,

and clinical domain ClinicalBERT, examining how

well they perform on clinical information extrac-

tion task, namely drugs and drug-related attributes

using n2c2-2018 shared task data via adaptation

and ﬁne-tuning. We then compare their results with

ones of a lightweight Transformer model trained

from scratch, and further investigate the impact of

an additional CRF layer on the deployed models.

Section 2gives more details on related works,

Section 3introduces the methodologies for

our investigation, Section 4describes our data-

preprocessing and experimental setups, Section 5

presents the evaluation results and ablation studies,

Section 6further discusses data-constrained train-

ing looking back at n2c2-2018 shared tasks; ﬁnally,

Section 7concludes this paper and opens ideas for

future works. Readers can refer to the Appendix for

more details on experimental analysis and relevant

ﬁndings.

2 Related Work

The integration of pre-trained language models into

applications within the biomedical and clinical do-

mains has emerged as a prominent trend in recent

years. A signiﬁcant contribution to this ﬁeld is

BioBERT (Lee et al.,2019), which was among the

ﬁrst to explore the advantages of training a BERT-

based model from domain-speciﬁc knowledge, i.e.

using biomedical data. BioBERT demonstrated

that training BERT using PubMed abstracts and

PubMed Central (PMC) full-text articles resulted

in superior performances on Named Entity Recog-

nition (NER) and Relation Extraction (RE) tasks,

within the biomedical domain.

However, since BioBERT was pre-trained

on general-domain data such as Wikipedia

or BooksCorpus (Zhu et al.,2015) and then

continuously-trained on biomedical data, PubMed-

BERT (Gu et al.,2021) further examined the ad-

vantages of training a model from scratch solely on

biomedical data, employing the same PubMed data

as BioBERT to avoid inﬂuences of mixed domains.

This choice was motivated by the observation that

word distributions from different domains are rep-

resented differently in their respective vocabularies.

Furthermore, PubMedBERT created a new bench-

mark dataset named BLURB covering more tasks

than BioBERT and including the terms: disease,

drug, gene, organ, and cell.

PubMedBERT and BioBERT both focused on

biomedical knowledge, leaving other closely re-

lated domains such as the clinical one for fu-

ture exploration. Subsequently, (Alsentzer et al.,

2019) demonstrated that ClinicalBERT, trained us-

ing generic clinical text and discharge summaries,

exhibited superior performances on medical lan-

guage inference (i2b2-2010 and 2012) and de-

identiﬁcation tasks (i2b2-2006 and 2014). Simi-

larly, (Huang et al.,2019) found that ClinicalBERT

trained on clinical notes achieved improved pre-

dictive performance for hospital readmission after

ﬁne-tuning on this speciﬁc task.

In our work on the clinical domain, we use

the n2c2-2018 shared task corpus which provides

electric health records (EHR) as semi-structured

letters (their heading specifying drug names, pa-

tient names, doses, relations, etc., and the body

describing the diagnoses and treatment as free text).

We aim to examine how ﬁne-tuned PLMs perform

against domain-speciﬁc transformers trained from

scratch, at biomedical and clinical text mining.

Regarding the usage of Transformer models for

text mining, (Wu et al.,2021) implemented the

Transformer structure with an adaptation layer for

information and communication technology (ICT)

entity extraction. (Al-Qurishi and Souissi,2021)

proposed to add a CRF layer on top of the BERT

model to carry out Arabic NER on mixed do-

main data, such as news and magazines. (Yan

et al.,2019) demonstrated that the Encoder-only

Transformer could improve previous results on tra-

ditional NER tasks in comparison to BiLSTMs.

Other related works include (Zhang and Wang,

2019;Gan et al.,2021;Zheng et al.,2021;Wang

and Su,2022) which applied Transformer and CRF

for spoken language understanding, Chinese NER,

power-meter NER, and forest disease text.

Methodology and Experimental Designs

Figure 1displays the design of our investigation,

which includes the pre-trained LLMs BERT (De-

vlin et al.,2019), BioBERT (Lee et al.,2019),

and ClinicalBERT (Alsentzer et al.,2019), in ad-

dition to an Encoder-only Transformer (Vaswani

et al.,2017) implementing the “distilbert-base-

cased” structure and trained from scratch.

The ﬁrst step is to adapt these models to Named

Entity Recognition by adding an Adaptation (or

Figure 1: Model Designs upon Investigations

Classiﬁcation) layer, resulting in the following

models: BERT-Apt, BioBERT-Apt, ClinicalBERT-

Apt, and Transformer-Apt. This adaptation layer

predicts probability distribution over all labels for

each token independently.

Then, we compare the results of the above mod-

els with the same ones but implementing an addi-

tional Conditional Random Field (CRF) (Lafferty

et al.,2001) layer, obtaining BERT-CRF, BioBERT-

CRF, ClinicalBERT-CRF, and Transformer-CRF

models. Now, instead of independently predicting

labels in a sequence, the CRF layer takes the neigh-

bouring tokens with their corresponding labels to

predict the label of the token under study.

4 Data Pre-processing and Experimental

Setups

In this section, we introduce the n2c2-2018 cor-

pus we utilise for model training and evaluations,

as well as model optimisation strategies, efﬁcient

training, and evaluation metrics.

4.1 Corpus and Model Setting

Regarding the dataset, we utilise the standard n2c2-

2018 shared task data from Track-2 (Henry et al.,

2020): Adverse Drug Events and Medication Ex-

traction in Electric Health Records (EHRs)

. We

note that The World Health Organisation (WHO)

deﬁnes ADE as “an injury resulting from med-

ical intervention related to a drug”, while the Pa-

tient Safety Network (PSNet) deﬁnes it as “harm

experienced by a patient as a result of exposure

to a medication”

. The aim of this task is to

investigate whether “NLP systems can automat-

ically discover drug-to-adverse event relations in

clinical narratives”. Three sub-tasks under this

track include Concepts, Relations, and End-to-

End. Among these, the ﬁrst task is to identify

drug names, dosage, duration, and other entities;

the second task is to identify relations of drugs

with adverse drug events (ADEs) and other enti-

ties given gold standard entities; ﬁnally, the third

task is identical to the second one, but involves

entities that have been predicted by systems. In

total, this track provides 505 annotated ﬁles on dis-

charge summaries from the Medical Information

Mart for Intensive Care III (MIMIC-III) clinical

care database (Johnson et al.,2016), for which

annotation was carried out by four physician as-

1https://portal.dbmi.hms.harvard.edu/

projects/n2c2-2018-t2/

2https://www.who.int/

3https://psnet.ahrq.gov/primer/

medication-errors-and-adverse-drug-events

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ExploringtheValueofPre-trainedLanguageModelsforClinicalNamedEntityRecognitionSamuelBelkadi∗,LifengHan∗,YupingWu,andGoranNenadicDepartmentofComputerScience,TheUniversityofManchester,UK*co-firstauthorssamuel.belkadi@student.manchester.ac.uklifeng.han,yuping.wu,g.nenadic@manchester.ac.ukAbstractTheprac...

展开>> 收起<<

Exploring the Value of Pre-trained Language Models for Clinical Named Entity Recognition Samuel Belkadi Lifeng Han Yuping Wu and Goran Nenadic.pdf

共14页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Exploring the Value of Pre-trained Language Models for Clinical Named Entity Recognition Samuel Belkadi Lifeng Han Yuping Wu and Goran Nenadic

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: