Using Bottleneck Adapters to Identify Cancer in Clinical Notes under Low-Resource Constraints Omid Rohanian16 Hannah Jauncey3 Mohammadmahdi Nouriborji56 Vinod Kumar Chauhan1

2025-05-06 0 0 452.37KB 17 页 10玖币

侵权投诉

Using Bottleneck Adapters to Identify Cancer in Clinical Notes under

Low-Resource Constraints

Omid Rohanian1,6, Hannah Jauncey3, Mohammadmahdi Nouriborji5,6, Vinod Kumar Chauhan1,

Bronner P. Gonçalves2, Christiana Kartsonaki2, ISARIC Clinical Characterisation Group2†, Laura Merson2,

David Clifton1,4

1Department of Engineering Science, University of Oxford, Oxford, UK

2ISARIC, Pandemic Sciences Institute, University of Oxford, Oxford, UK

3Infectious Diseases Data Observatory (IDDO), University of Oxford, UK

4Oxford-Suzhou Centre for Advanced Research, Suzhou, China

5Sharif University of Technology, Tehran, Iran

6NLPie Research, Oxford, UK

{omid.rohanian,david.clifton,vinod.kumar}@eng.ox.ac.uk

{hannah.jauncey,laura.merson,bronner.goncalves}@ndm.ox.ac.uk

m.nouriborji@nlpie.com

christiana.kartsonaki@dph.ox.ac.uk

Abstract

Processing information locked within clinical

health records is a challenging task that remains

an active area of research in biomedical NLP. In

this work, we evaluate a broad set of machine

learning techniques ranging from simple RNNs

to specialised transformers such as BioBERT

on a dataset containing clinical notes along with

a set of annotations indicating whether a sample

is cancer-related or not.

Furthermore, we speciﬁcally employ efﬁcient

ﬁne-tuning methods from NLP, namely, bottle-

neck adapters and prompt tuning, to adapt the

models to our specialised task. Our evaluations

suggest that ﬁne-tuning a frozen BERT model

pre-trained on natural language and with bottle-

neck adapters outperforms all other strategies,

including full ﬁne-tuning of the specialised

BioBERT model. Based on our ﬁndings, we

suggest that using bottleneck adapters in low-

resource situations with limited access to la-

belled data or processing capacity could be a

viable strategy in biomedical text mining. The

code used in the experiments are going to be

made available at [LINK ANONYMIZED].

1 Introduction

Clinical notes involve important information about

patients and their current state and medical history.

Automatic processing of these notes and the terms

that appear in them would help researchers clas-

sify them into standard conditions that can also be

†

Please refer to Appendix A.4 for the full list of collabora-

tors.

looked up in medical knowledge-bases. In com-

bination with other medical signals, this informa-

tion has been shown to be useful in predicting in-

hospital mortality rate (Deznabi et al.,2021), pro-

longed mechanical ventilation (Huang et al.,2020),

or clinical outcome (van Aken et al.,2021), among

others.

In this work, we looked at a real clinical notes

database and designed a pilot experiment in which

a set of different ML models were used to predict

whether a clinical note is cancer-related or not. The

incentive behind this experiment is to help clini-

cians and data curators to automatically search for

and identify notes that signal a particular medical

condition, instead of solely relying on laborious

human annotation and keyword-based search.

The promise of ML is in automating this task rea-

sonably close to human-level performance and ulti-

mately expanding this work to include other con-

ditions in a multi-class scenario. Ideally a model

would be able to identify cancer types that are not

seen during training and would be able to have

some understanding of context and grammar to be

sensitive to negation.

Contributions

In this work, we targeted the task of disease identi-

ﬁcation within a clinical notes dataset. We tested

a range of different models including RNN-based

and transformer-based architectures to tackle this

problem. We particularly focused on efﬁcient ﬁne-

tuning approaches to adapt our pre-trained models

to the biomedical task. The novelty of this work is

arXiv:2210.09440v2 [cs.CL] 7 Jun 2023

in the successful application of bottleneck adapters

to the cancer identiﬁcation task which to the best

of our knowledge has not been explored before.We

compare this method with multiple other strong

baselines and conduct experiments and analyses to

evaluate these different approaches. The systems

developed in this study and those that will follow in

related future work will be added to the data cura-

tion system of a biomedical database with the aim

to enable automatic processing of clinical notes in

real EHR data.

2 Pre-Trained Transformers and

Fine-Tuning

In recent years, the Transformers architecture

(Vaswani et al.,2017) and large language models

(LMs) have become the staple baseline for many

NLP tasks. The conventional paradigm is to ﬁrst

pre-train an LM on a large corpus of general text

(e.g. Wikipedia) with a pre-training objective such

as masked or causal language modeling and then

ﬁne-tune the LM on downstream tasks.

In our task, we focus on transformers pre-trained

with the Masked Language Modeling (MLM) ob-

jective. In MLM, a portion of the text is masked

out and the objective of the model is to learn to

reconstruct the masked portion based on the avail-

able context. The most commonly used model pre-

trained with MLM is named BERT (Devlin et al.,

2019).

Despite BERT’s promising results on many

downstream NLP tasks, it has been shown that large

LMs pre-trained on generic text do not always per-

form well on specialised domains like biomedical

tasks (Lee et al.,2020;Gururangan et al.,2020).

The standard approach, therefore, is to pre-train

models on corpora that are related to the target do-

main. BioBERT (Lee et al.,2020) is an example of

an LM trained on specialized data. It is trained on

a large corpus of general and biomedical texts mak-

ing it a strong model for biomedical text mining.

2.1 Efﬁcient Fine-Tuning Methods

The beneﬁts of ﬁne-tuning large LMs for down-

stream applications are offset by a signiﬁcant com-

putational cost. Some LMs, for example, include

more than

100

billion parameters, making their

ﬁne-tuning costly. Furthermore, complete ﬁne-

tuning may be ineffective when the amount of train-

ing data is small or different from the initial domain

that the model was trained on, which might result

in catastrophic forgetting.

As a response to these limitations, more efﬁ-

cient ﬁne-tuning approaches have been developed,

among which prompt tuning (2.3) and bottleneck

adapters (2.2) are two of the most effective and

well-known.

2.2 Bottleneck Adapters

Bottleneck Adapters (BAs) (Houlsby et al.,2019;

Pfeiffer et al.,2021;Rücklé et al.,2020;Pfeif-

fer et al.,2020) are Multi-layer Perceptron (MLP)

blocks that are made up of a down-projection dense

layer, an activation function, and an up-projection

dense layer with a residual connection. These

blocks are inserted between the frozen attention

and feed-forward blocks of a pre-trained LM, and

only these modules will be updated during ﬁne-

tuning. This method has proven to be effective in

terms of both computational and parameter efﬁ-

ciency.

Houlsby et al. (2019) showed that by training

only around

% of the parameters, BERT trained

with adapters can get competitive results compared

to complete ﬁne-tuning. Adapter tuning can be

expressed in the below equation where

is the

output of the frozen attention or MLP component

of the ith layer of the pre-trained LM.

Oi=fup(Activation(fdown(Xi))) + Xi(1)

2.3 Prompt Tuning

Another efﬁcient method of ﬁne-tuning is called

Prompt Tuning (PT) (Li and Liang,2021;Lester

et al.,2021). PT is mostly used for autoregressive

LMs such as GPT (Brown et al.,2020). In this

approach, a set of learnable vectors (prompt) are

concatenated with the original input and passed to

the LM. During ﬁne-tuning, the objective is to learn

a prompt which is intended to encode task-speciﬁc

knowledge for the downstream task while the orig-

inal model parameters are kept frozen. In some

variations of PT, instead of concatenating a set of

learnable vectors with the input before passing it

to the model once, a set of prompts are learned for

each individual attention layer of the pre-trained

LM (Li and Liang,2021). The PT approach used in

this study can be expressed in the below equation

where

Attentioni

is the attention block of the

ith

layer of the pre-trained transformer and

and

denote the learnable prompts for keys and values

respectively.

Oi=Attentioni(Qi,[Pk

i, Ki],[Pv

i, Vi]) (2)

2.4 Bottleneck Adapters in Biomedical

Domain

BAs are increasingly used for efﬁcient knowledge

extraction and domain adaptation due to their pa-

rameter efﬁciency and low computational cost. Fol-

lowing this trend, there are some works in the

biomedical domain that have used adapters to insert

task-speciﬁc knowledge via pre-training into the

LMs (Grover,2021;Lu et al.,2021), or employed

them in layer adaptation for developing compact

biomedical models (Nouriborji et al.,2022).

3 Challenges of Identifying

Cancer-Related Records

Clinical notes usually involve abbreviated and non-

standard language. A single concept like cancer is

mentioned in different ways depending on cancer

subtype. The same subtype might have a scientiﬁc

and a commonly known variant and both can ap-

pear in the text. Grammar is sometimes broken and

language can appear cryptic. Another issue is the

prevalence of misspellings which further compli-

cates this task.

There are also words that co-occur with a con-

dition and can easily confound the model. For

instance, words like ‘breast’ and ‘lung’ which are

not speciﬁc to cancer appear a lot in cancer-related

samples and the model can mistake them for a can-

cer signal. Another important issue is negation. If

a condition is ruled out, ideally a model should

not return positive. However since most rows that

are classed as positive in the dataset include the

token ‘cancer’, an example like ‘not cancer’ could

be mistaken as positive. Encoding awareness of

negation into the model is a challenge since it is

known that pre-trained LMs lack an innate ability

to handle negation (Hosseini et al.,2021).

4 Dataset and Annotation

The dataset in this pilot experiment was provided

by ISARIC, a global initiative that, among other

things, provides tools and resources to facilitate

clinical research

. The larger dataset contains

The ISARIC COVID-19 Data Platform is a global part-

nership of more than

1,700

institutions across more than

countries. Accreditation of the individuals, institutions and

funders that contributed to this effort can be found in the sup-

plementary material. These partners have combined data and

expertise to accelerate the pandemic response and improve pa-

tient outcomes. For more information on ISARIC, see

https:

//isaric.org

. Data are available for access via application

to the Data Access Committee at www.iddo.org/covid-19.

Mutli-Head Attention

Bottleneck Adapter

Norm

Feed-Forward

Pre-Trained

Language Model

Bottleneck Adapter

Norm

Activation

Feed-Forward

Down Project

Feed-Forward

Up Project

Bottleneck

Adapter

Figure 1: The overall architecture of Adapter Tuning,

Note that the original parameters of the pre-trained

model are kept frozen for the ﬁne-tuning of the model

and only the Bottleneck Adapters in between attention

and feed-forward layers will be updated.

125381

rows corresponding to clinical notes re-

lated to different conditions and patients. For the

purposes of this experiment a portion of this data

was annotated for presence of cancer. The anno-

tated subset contains

2563

rows that include cancer

labels, out of which

343

are repeated notes where

the doctors have written the same cancer-related

note for a different patient. The human experts who

tagged the data for cancer, had access to a set of

cancer-related terms to guide them in the annota-

tion. The negative cohort of 3K rows was gener-

ated by ﬁltering out the larger data by any row that

contained keywords that could potentially signal

cancer deﬁnitively or with a very high possibility.

The details of the lists and more information on

the annotation scheme are included in the appendix

(A.1).

5 Experiments

The experiments in this work are divided into

two categories, namely, attention-based and RNN-

based methods. We conducted all our experiments

on an internal cancer detection dataset with ~

labeled samples with roughly equal instances in

each classes and evaluated them on a gold standard

consisting of 1k samples,

of which were posi-

tive and the rest negative. Note the distributional

shift between training and test sets which reﬂect

the real clinical setting under which the models are

expected to perform.

5.1 Baselines

We used three baselines in this work all of which

are RNN-based. The initial weights in embedding

layer of all the baselines comes from Chen et al.

(2019) which is a

word2vec

model pre-trained on

medical data. The ﬁrst model is a simple Bi-LSTM,

the second uses a 1D-convolution before the Bi-

LSTM (CNN-Bi-LSTM), and the ﬁnal model adds

a multi-head self-attention layer after the CNN-Bi-

LSTM model (CNN-Bi-LSTM-Att). All models

are trained for 24 epochs with a batch size of 64.

5.2 Approach

Our aim was to improve upon the strong RNN

baselines by the use of efﬁcient ﬁne-tuning of pre-

trained transformers, namely, BERT (Devlin et al.,

2019) and BioBERT (Lee et al.,2020). Three ﬁne-

tuning approaches were tried: full ﬁne-tuning, tun-

ing with BAs (Sec. 2.2), and PT (Sec. 2.3).

5.2.1 Tuning with Adapters

The BA used in this work is from Houlsby et al.

(2019) and implemented using Adapter Hub (Pfeif-

fer et al.,2020). The reduction factor of the adapter

is set to

and its activation function is ReLU. The

adapters are used after attention layers and feed-

forward layers of each transformer block while the

parameters of the model are kept frozen. The over-

all architecture of the model used in this work is

depicted in Figure 1.

5.2.2 Tuning with Prompts

For the PT, the approach from (Li and Liang,2021)

with a prompt size of

is used and implemented

with the Adapter Hub library (Pfeiffer et al.,2020).

In this approach, a set of prompts are learned for

each attention layer of the frozen language model.

5.2.3 Encoding knowledge of Negation and

Uncertainty

Negation is not by default understood by any of

the models we have explored in this work. For in-

stance, the phrases ‘Evidence of lung cancer’ and

‘No Evidence of lung cancer’ are both predicted as

cancer-related by a model as the negative samples

in the training set do not include negation patterns

for cancer. To encode some understanding of nega-

tion into the model, we analysed the larger dataset

and identiﬁed a number of different ways a con-

dition can be ruled out with varying degrees of

certainty (full list available at A.2). We used these

examples to generate synthetic negative samples

that include a cancer-related term (e.g. ‘not lym-

phoma’ or ‘Melanoma not formally diagnosed’).

6 Results

Reported results in Table 1are best out of three

subsequent runs. For each approach, the hyperpa-

rameters that seemed to work best during training

were kept ﬁxed for all the runs. Full ﬁne-tuning was

done with

epochs and a learning rate of

2e−5

Tuning with BAs was done with

epochs and a

learning rate of

1e−3

. PT was used with

epochs

and a learning rate of

1e−4

. All approaches used a

batch size of

, AdamW Optimizer, Weight Decay

0.01

, and a cosine scheduler. As can be seen, the

best performing model is the BERT trained with

Adapters (including variants which are equipped

with some notion of negation as explained in 5.2.3).

Analysing the outputs of individual models, we

found that the majority of positive labels in the

test set are correctly identiﬁed by most models.

The bottleneck, however, is the false positives that

happen due to the presence of certain words (e.g.

‘diagnosed with’, ‘lung’, ‘breast’ etc) that co-occur

with cancer and can cause models to incorrectly

label an instance as positive. The best model had

only

false positives and no false negatives. The

values for the confusion matrices of all the models

are provided in A.3.

To alleviate the false positive issue, using the

method explained in 5.2.3, we trained our best

model (BERT with adapter-tuning) with additional

250

and

500

generated negative samples. The

model was subsequently able to predict cases such

as ‘neither cancer nor covid’, ‘lung infection but no

cancer’, and ‘diagnosed with covid but not cancer’

correctly with only minor performance drops.

A point of strength in all the models was their

ability to correctly identify cancer, given rare can-

cer types that had not occurred in the training set.

This generalisation to unseen cancer types indicates

that the models can effectively use information

from the pre-trained resources they rely upon.

7 Conclusion

In this work, we trained and tested a number

of classiﬁcation approaches as part of a prelim-

inary experiment on a dataset of clinical notes

annotated for presence of cancer. We compared

a number of RNN models utilising pre-trained

biomedical embeddings with two different pre-

trained transformer-based models that were ﬁne-

tuned in separate ways. We also addressed the is-

sue of negation by integrating negation patterns

into the negative training samples. Our ﬁnd-

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

UsingBottleneckAdapterstoIdentifyCancerinClinicalNotesunderLow-ResourceConstraintsOmidRohanian1,6,HannahJauncey3,MohammadmahdiNouriborji5,6,VinodKumarChauhan1,BronnerP.Gonçalves2,ChristianaKartsonaki2,ISARICClinicalCharacterisationGroup2†,LauraMerson2,DavidClifton1,41DepartmentofEngineeringScience,U...

展开>> 收起<<

Using Bottleneck Adapters to Identify Cancer in Clinical Notes under Low-Resource Constraints Omid Rohanian16 Hannah Jauncey3 Mohammadmahdi Nouriborji56 Vinod Kumar Chauhan1.pdf

共17页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Using Bottleneck Adapters to Identify Cancer in Clinical Notes under Low-Resource Constraints Omid Rohanian16 Hannah Jauncey3 Mohammadmahdi Nouriborji56 Vinod Kumar Chauhan1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: