Knowledge Injected Prompt Based Fine-tuning for Multi-label Few-shot ICD Coding Zhichao Yang1 Shufan Wang1 Bhanu Pratap Singh Rawat1 Avijit Mitra1 Hong Yu12

2025-05-03 0 0 2.43MB 15 页 10玖币
侵权投诉
Knowledge Injected Prompt Based Fine-tuning for Multi-label Few-shot
ICD Coding
Zhichao Yang1, Shufan Wang1, Bhanu Pratap Singh Rawat1, Avijit Mitra1, Hong Yu1,2
1College of Information and Computer Sciences, University of Massachusetts Amherst
2Department of Computer Science, University of Massachusetts Lowell
{zhichaoyang,shufanwang,brawat,avijitmitra}@umass.edu hong_yu@uml.edu
Abstract
Automatic International Classification of Dis-
eases (ICD) coding aims to assign multiple
ICD codes to a medical note with average
length of 3,000+ tokens. This task is chal-
lenging due to a high-dimensional space of
multi-label assignment (tens of thousands of
ICD codes) and the long-tail challenge: only
a few codes (common diseases) are frequently
assigned while most codes (rare diseases) are
infrequently assigned. This study addresses
the long-tail challenge by adapting a prompt-
based fine-tuning technique with label seman-
tics, which has been shown to be effective un-
der few-shot setting. To further enhance the
performance in medical domain, we propose a
knowledge-enhanced longformer by injecting
three domain-specific knowledge: hierarchy,
synonym, and abbreviation with additional
pretraining using contrastive learning. Experi-
ments on MIMIC-III-full, a benchmark dataset
of code assignment, show that our proposed
method outperforms previous state-of-the-art
method in 14.5% in marco F1 (from 10.3 to
11.8, P<0.001). To further test our model on
few-shot setting, we created a new rare dis-
eases coding dataset, MIMIC-III-rare50, on
which our model improves marco F1 from 17.1
to 30.4 and micro F1 from 17.2 to 32.6 com-
pared to previous method.
1 Introduction
Multi-label learning has many real-word applica-
tions in natural language processing (NLP), in-
cluding but not limited to academic paper label-
ing (Chen et al.,2020), news framing (Akyürek
et al.,2020), waste crises response (Yang et al.,
2020), amazon product labeling (McAuley et al.,
2015;Dahiya et al.,2021), and medical coding
(Atutxa et al.,2019). In contrast to multi-class
classification, an instance in multi-label learning
is frequently linked with more than one class la-
bels, making the task more challenging due to the
combination of potential class labels.
In real-world tasks, there are often insufficient
training data for rare class labels. Taking automatic
international classification of diseases (ICD) cod-
ing as example, given discharge summaries notes
as input, the task is to assign multiple ICD disease
and procedure label codes associated with each
note. The assigned codes need to be accurate and
complete for the billing purposes. As an exam-
ple, the MIMIC-III dataset (Johnson et al.,2016)
contains 8,692 unique ICD-9 codes, among which
4,115 (47.3%) codes occur less than 6 times and
203 (2.3%) occur zero times. Clinical practice re-
quires a high accuracy, hence, it is not acceptable
for a multi-label classifier to fail a disease diagno-
sis (or code assignment) because it is rare, since
such a diagnosis may be of the most clinical im-
portance for the patient. Therefore, the classifier
is required to perform with high precision even for
infrequent codes. This translates to data sparsity
due to availability of few training examples.
To mitigate the data sparsity problem, additional
structured knowledge could be applied. ICD codes
are organized with an ontological/hierarchical
structure where a text description is associated
to each code. For instance, ICD 250 (Diabetes
mellitus), shown in Figure 1, is the parent of sev-
eral child codes including 250.0 (Diabetes mellitus
without mention of complication), 250.1 (Diabetes
with ketoacidosis), and 250.2 (Diabetes with hy-
perosmolarity). Such child ICD codes are more
semantically different from each other than their
parent code 250.
Synonyms including acronyms and abbrevia-
tions are common in medical notes. For instance,
the description of code 250.00 is disease "type II
diabetes mellitus". However, this code can be de-
scribed in different text forms such as "insulin-
resistant diabetes", "non-insulin dependent dia-
betes", "DM2", and "T2DM". Therefore, one naive
way to assign ICD codes is to identify matching
between candidate code descriptions and their syn-
arXiv:2210.03304v2 [cs.CL] 13 Oct 2022
Figure 1: An illustration of self-alignment pretraining from medical knowledge UMLS, including the usage of (a)
Hierarchy, (b) Synonym, (c) Abbreviation. Pink region is the dynamic margin ranges from π/2to πwhere we
wish to pull negatives apart with a dynamic distance.
onyms in medical notes. In this work, we separate
synonyms from both acronyms and abbreviations
due to its importance in medical domain (Yu et al.,
2002). While synonymous relations could be im-
plicitly learned from pretrained language model
(LM) (Michalopoulos et al.,2022;Li et al.,2022),
previous researches show that language models are
only limited biomedical (Sung et al.,2021) or clin-
ical knowledge bases (Yao et al.,2022) due to the
data sparsity challenge in the medical domain. An
explicit way of adding such medical knowledge
into language model should be explored.
In this paper, we present a simple but effective
Knowledge Enhanced PrompT (KEPT) framework.
We implement and evaluate KEPT using a LM
based on Longformer because clinical notes are
typically more than 500 tokens. Specifically, we
first
pretrainmimic
a Longformer LM on MIMIC-
III dataset. Then, we further
pretrainumls
on struc-
tured medical knowledge UMLS (Unified Medical
Language System) using self-alignment learning
with contrastive loss to inject medical knowledge
into pretrained LM. For the downstream ICD-code
assignment fine-tuning, we add a sequence of ICD
code descriptions (label semantics) as prompts in
addition to each clinical note as KEPT LM input.
This allows early fusion of code descriptions and
the input note. Experiments on full disease cod-
ing (MIMIC-III-full) and common disease coding
(MIMIC-III-50) show that our KEPTLongformer
outperforms previous SOTA MSMN (Yuan et al.,
2022). In order to test its few-shot ability, we create
a new few-shot rare diseases coding dataset named
MIMIC-III-rare50, and results show significant im-
provements compared between MSMN and our
method. To facilitate future research, we publicly
release the code and trained models1.
2 Related Work
2.1 Prompt-based Fine-tuning
Prompt-based fine-tuning has been shown to be ef-
fective in few-shot tasks (Le Scao and Rush,2021;
Gao et al.,2021), even when the language model
is relatively small (Schick and Schütze,2021) be-
cause they introduce no new parameter during few
shot fine-tuning. Additional tuning techniques such
as to tune bias-term or language model head have
shown to be efficient on memory and training time
(Ben Zaken et al.,2022;Logan IV et al.,2022).
However, most previous works focus injecting
knowledge into prompt on single-label multi-class
classification task (Hu et al.,2022;Wang et al.,
2022a;Ye et al.,2022). To the best of our knowl-
edge, this is the first work that applies prompting
to multi-label classification task.
2.2 Entity Representation Pretraining
Many recent researches use synonyms to conduct
biomedical entity representation learning (Sung
et al.,2020;Liu et al.,2021;Lai et al.,2021;An-
gell et al.,2021;Zhang et al.,2021;Kong et al.,
1https://github.com/whaleloops/KEPT
2021;Seneviratne et al.,2022). Our work is most
similar to Liu et al. (2021), which uses additional
pretraining scheme that self-aligns the representa-
tion space of biomedical entities from pretrained
medical LM. They collect self-supervised synonym
examples from the biomedical ontology UMLS,
and use multi-similarity contrastive loss to keep
the representation of similar entities closer to each
other, before fine-tuning them to the downstream
specific task. However, their work differs from ours
in (1) their testing being limited to only medical
entity linking tasks and (2) not using hierarchical
information, which has been shown to be useful in
KRISSBERT (Zhang et al.,2021). In contrast to
KRISSBERT, our contrastive learning selects neg-
ative samples from siblings (1-hop nodes) instead
of random nodes in the graph. Our method follows
InfoMin proposition that selected samples should
contain as much task-relevant information while
discarding as much irrelevant information in the
input as possible (Tian et al.,2020).
2.3 ICD Coding
ICD coding uses NLP models to predict expert
labeled ICD codes given discharge summaries as
input. Currently, the most straightforward method
is to take the best language model for encoding
notes, and later use the label attention mechanism
to attend labeled ICD codes to input notes for pre-
diction (Mullenbach et al.,2018). In comparison,
we apply attention between codes and notes way
before within the encoder with the help of prompt.
The label representations in attention played an
important role in many previous works. Li and
Yu (2020) and Vu et al. (2020) first randomly ini-
tialize the label representations. Chen and Ren
(2019); Dong et al. (2021); Zhou et al. (2021) ini-
tialize the label representation with code descrip-
tion from shallow representation using Word2Vec
(Mikolov et al.,2013). Yuan et al. (2022) further
add description synonyms semantic information. In
comparison, we use deep contextual representation
from Longformer pretrained on both MIMIC and
UMLS with contrastive loss. Similar pretrained
language models have shown to be effective in pre-
vious works (Wu et al.,2020;Huang et al.,2022;
DeYoung et al.,2022;Michalopoulos et al.,2022).
As stated previously, the high dimensions of
available label codes, such as 14,000 diagnosis
codes and 3,900 procedure codes in ICD-9 and
80,000 in industry coding (Ziletti et al.,2022),
makes ICD coding challenging. Another challenge
is the long-tail distribution, in which few codes
are frequently used but most codes may only be
used a few times due to the rareness of diseases
(Shi et al.,2017;Xie et al.,2019). Mottaghi et al.
(2020) use active learning with extra human label-
ing to solve this issue. Other recent works focus on
using additional medical domain-specific knowl-
edge to better understand the few training instances
(Cao et al.,2020;Song et al.,2020;Lu et al.,2020;
Falis et al.,2022;Wang et al.,2022b). Wu et al.
(2017) perform entity linking to identify medical
phrase in document note. Xie et al. (2019) map
label codes as entities in medical hierarchy graph.
Compared to a baseline which uses a shallow con-
volutional neural network to learn n-gram features
from notes, they add complex hierarchy structure
between codes by allowing the loss to propagate
through graph convolutional neural network. In
contrast with the previous systems which adopt
complex pipelines and different tools, our method
applies a much simpler training procedure by incor-
porating knowledge into language model without
requiring any knowledge pre or post-processing
(i.e. MedSpacy, Gensim, NLTK) during the fine-
tuning. Additionally, previous methods use knowl-
edge graph as an input source, however, we train
our language model to include knowledge graph as
a target with contrastive loss.
3 Methods
ICD coding:
ICD coding is a multi-label multi-
class classification task. Specifically, considering
thousands of words from an input medical note
t
,
the task is to assign a binary label
yi∈ {0,1}
for
each ICD code in the label space
Y
, where 1 means
that note is positive for an ICD disease or procedure
and
i
range
[1, Nc]
. In this study, we define and
evaluate the number of candidate codes
Nc
as 50,
although
Nc
could be higher or lower depending
on specific applications. Each candidate code has
a short code description phrase
ci
in free text. For
instance, code 250.1 has description diabetes with
ketoacidosis.Code descriptions
c
is the set of all
Ncnumber of ci.
3.1 Encoding Text with Longformer
To solve this task, we first need to encode free text
into hidden representation with a pretrained clinical
longformer. Specifically, we convert free text
a
to a
sequence of tokens
xa
, the vocab embedding then
摘要:

KnowledgeInjectedPromptBasedFine-tuningforMulti-labelFew-shotICDCodingZhichaoYang1,ShufanWang1,BhanuPratapSinghRawat1,AvijitMitra1,HongYu1;21CollegeofInformationandComputerSciences,UniversityofMassachusettsAmherst2DepartmentofComputerScience,UniversityofMassachusettsLowell{zhichaoyang,shufanwang,bra...

展开>> 收起<<
Knowledge Injected Prompt Based Fine-tuning for Multi-label Few-shot ICD Coding Zhichao Yang1 Shufan Wang1 Bhanu Pratap Singh Rawat1 Avijit Mitra1 Hong Yu12.pdf

共15页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:15 页 大小:2.43MB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 15
客服
关注