KG-MTT-BERT Knowledge Graph Enhanced BERT for Multi-Type Medical Text Classification

2025-05-06 0 0 2.46MB 11 页 10玖币
侵权投诉
KG-MTT-BERT: Knowledge Graph Enhanced BERT for
Multi-Type Medical Text Classification
Yong He, Cheng Wang, Shun Zhang, Nan Li, Zhaorong Li, Zhenyu Zeng
{sanyuan.hy,youmiao.wc,changchuan.zs,kido.ln,zhaorong.lzr,zhenyu.zzy}@alibaba-inc.com
Alibaba Group
Hangzhou, China
ABSTRACT
Medical text learning has recently emerged as a promising area to
improve healthcare due to the wide adoption of electronic health
record (EHR) systems. The complexity of the medical text such as
diverse length, mixed text types, and full of medical jargon, poses a
great challenge for developing eective deep learning models. BERT
has presented state-of-the-art results in many NLP tasks, such as
text classication and question answering. However, the standalone
BERT model cannot deal with the complexity of the medical text,
especially the lengthy clinical notes. Herein, we develop a new
model called KG-MTT-BERT (Knowledge Graph Enhanced Multi-
Type Text BERT) by extending the BERT model for long and multi-
type text with the integration of the medical knowledge graph.
Our model can outperform all baselines and other state-of-the-
art models in diagnosis-related group (DRG) classication, which
requires comprehensive medical text for accurate classication.
We also demonstrated that our model can eectively handle multi-
type text and the integration of medical knowledge graph can
signicantly improve the performance.
CCS CONCEPTS
Computing methodologies Language resources
;
Applied
computing Health informatics.
KEYWORDS
EHR, BERT, Knowledge Graph, Multi-Type Medical Text, Text Clas-
sication, Diagnosis-Related Group (DRG)
ACM Reference Format:
Yong He, Cheng Wang, Shun Zhang, Nan Li, Zhaorong Li, Zhenyu Zeng.
2018. KG-MTT-BERT: Knowledge Graph Enhanced BERT for Multi-Type
Medical Text Classication. In Woodstock ’18: ACM Symposium on Neural
Gaze Detection, June 03–05, 2018, Woodstock, NY . ACM, New York, NY, USA,
11 pages.
1 INTRODUCTION
In recent years, the broad usage of electronic health record (EHR)
systems has opened up many opportunities for applying deep learn-
ing methods to improve the quality of healthcare. Many predictive
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from permissions@acm.org.
Woodstock ’18, June 03–05, 2018, Woodstock, NY
©2018 Association for Computing Machinery.
ACM ISBN 978-1-4503-XXXX-X/18/06... $15.00
models have been developed for more eective clinical decision-
making, such as predicting diagnoses and recommending medicine.
With the rapidly aging population in many countries, the rational
usage of scarce medical resources is critical, and developing a ma-
chine learning model for hospital resource management is crucial
to improving healthcare quality and outcomes.
Diagnosis-related groups (DRGs) has been one of the most pop-
ular prospective payment systems introduced to put pressure on
hospitals to optimize the allocation of medical resources in many
countries. The basic idea is to classify inpatients into a limited num-
ber of DRG groups, assuming patients in the same DRG group share
similar demographic and clinical characteristics and are likely to
consume similar amounts of hospital resources. The reimbursement
to the hospital is the same for the same DRG category, regardless of
the variant actual costs for patients within the same DRG category.
As a result, the DRG system can motivate hospitals to reduce costs,
improve resource allocation, and increase operational eciency.
The DRG used for reimbursement purposes is determined by two
steps when a patient is discharged. A medical coder reviews the
clinical documents, assigns standard medical codes, and selects one
principal diagnosis following the guideline of coding and reporting
using the International Classication of Diseases (ICD) or other
classication systems. Then a DRG category is assigned with a set
of rules implemented in a software (i.e. DRG grouper) using the
following variables: principal diagnosis code, secondary diagnosis
codes, procedure codes, age, sex, discharge status, and length of
stay. In addition to post-hospital billing, DRG can be repurposed for
in-hospital operation management and a previous study has shown
that classifying a patient’s DRG during the hospital visit can allow
the hospital to more eectively allocated hospital resources and to
improve the facilitation of the operational planning [
6
]. Directly
inferring DRG from the complex medical text of EHR is a challenge
because the model needs to mimic the professional medical coder’s
expertise and learn the rule of the DRG grouper.
Multiple types of medical texts from EHR are closely related to
DRG, such as diagnosis, procedure, and hospital course. We formu-
late the DRG classication task as a multi-type text classication
problem. The following characteristic of these texts poses a great
challenge for medical text mining. 1) Diverse length: diagnoses and
procedures are typically short texts, while the hospital course is
always lengthy with detailed information of current hospital visit;
2) Multiple Types: the dierent elds in EHR are to serve dierent
purposes and thus texts in dierent elds are of dierent types
(See Appendix A.1 for more details of multi-type medical text); 3)
Medical-domain specic: the free text from EHR contains a lot of
medical terminology and a medical coder assigns diagnosis code
with relevant domain knowledge.
arXiv:2210.03970v1 [cs.CL] 8 Oct 2022
Woodstock ’18, June 03–05, 2018, Woodstock, NY Yong He, Cheng Wang, Shun Zhang, Nan Li, Zhaorong Li, Zhenyu Zeng
BERT[
5
] has been widely used in text classication and typically
multiple texts are concatenated into one long text for modeling
purposes. However, there are three limitations of directly applying
the BERT model for multi-type texts. First, the input of BERT is
limited to 512 tokens, and combining all texts could easily exceed
this limit and lead to information loss. Second, the concatenation
of multiple types of text may not be feasible as some texts are
syntactically dierent or contextually irrelevant, which makes the
direct self-attention module of BERT unnecessary. Lastly, BERT
was pre-trained over open-domain corpora and it does not play
well with EHR data in the medical eld.
To overcome the above limitations for multi-type text classica-
tion, we develop a new model KG-MTT-BERT (Knowledge Graph
Enhanced Multi-Type Text BERT) to extend the BERT model for
multi-type text and integrate medical knowledge graph for do-
main specicity. Our model rst applies the same BERT-Encoder
to process each text, i.e. the encoder of each text of one patient
shares the same parameters. The two levels of encoding outputs
from BERT-Encoder with dierent granularities, one is at text-level
and the other is at token-level, are compared in this paper. Multi-
ple encodings from input texts are concatenated together as the
representation matrix and dierent types of pooling layers are in-
vestigated for better information summation. In addition, we use
the knowledge graph to enhance the model’s ability to handle the
medical domain knowledge. Finally, a classication layer is utilized
to predict DRG categories.
We have conducted experiments on three DRG datasets from
Tertiary hospitals in China (See Appendix A.2 for the denition of
hospital classication). Our model can outperform baselines and
other state-of-the-art models in all datasets. The rest of the paper
is organized as follows. We review related works and techniques
in the Related Works Section and introduce the formulation and
architecture of our model in the Method Section. Then we report
the performances of our model in three datasets and investigate
hyperparameter eects and multiple ablation studies to further the
understanding of the model. Lastly, we conclude our paper and
point out our future directions.
2 RELATED WORKS
Text classication is an important Natural Language Processing
(NLP) task directly related to many real-world applications. The
goal of text classication is to automatically classify the text into
predened labels. The unstructured nature of free text requires the
transformation of text into a numerical representation for modeling.
Over the past decade, text classication has changed from shallow
to deep learning methods, according to the model structures [
20
].
Shallow learning models focus on the feature extraction and classi-
er selection, while deep learning models can perform automatic
high-level feature engineering and then t with the classier to-
gether.
Shallow learning model rst converts text into vectors using text
representation methods like Bag-of-words (BOW), N-gram, term
frequency-inverse document frequency (TF-IDF) [
22
], Word2vec
[
25
], and GloVe [
27
], then trains the shadow model to classify, such
as Naive Bayes [
24
], Support Vector Machine [
12
], Random Forest
[
3
], XGBoost [
4
], and LightGBM [
15
]. In practice, the classier
is trained and routinely selected from the zoo of shallow models.
Therefore, feature extraction and engineering is the critical step for
the performance of text classication.
Various deep learning models have been proposed in recent years
for text classication, which builds on basic deep learning tech-
niques like CNN, RNN, and the attention mechanism. TextCNN [
16
]
applies the convolutional neural network for sentence classication
tasks. The RNNs, such as long short-term memory (LSTM), are
broadly used to capture long-range dependence. Zhang et al. intro-
duce bidirectional long short-term memory networks (BLSTM) to
model the sequential information about all words before and after
it [
35
]. Zhou et al. integrate attention with BLSTM as Att-BLSTM to
capture the most important semantic information in a text [
36
]. Re-
current Convolutional Neural Network (RCNN) combines recurrent
structure to capture the contextual information and max-pooling
layer to judge the key message from features [
18
]. Therefore, RCNN
leverages the advantages of both RNN and CNN. Johnson et al. de-
velop a deep pyramid CNN (DPCNN) model to increase the depth of
CNN by alternating a convolution block and a down-sampling layer
over and over [
13
]. Yang et al. design a hierarchical attention net-
work (HAN) for document classication by aggregating important
words to a sentence and then aggregating importance sentences to
document [
34
]. The appearance of BERT, which uses the masked
language model to pre-train deep bidirectional representation, is a
signicant turning point in developing text classication models
and other NLP technologies. The memory and computation cost
of self-attention grows quadratically with text length and prevents
applications for long sequences. Longformer [
1
] and Reformer [
17
]
are designed to address this limitation.
Domain knowledge serves a principal role in many industries. In
order to integrate domain knowledge, it is necessary to embed the
entities and relationships of the knowledge graph. Many knowledge
graph embedding methods have been proposed, such as TransE [
2
],
TransH [
30
], TransR [
21
], TransD [
10
], KG2E [
7
], and TranSparse
[
11
]. Some existing works show that the knowledge graph can
improve the ability of BERT, such as K-BERT [
23
] and E-BERT [
26
].
Data mining of EHR [
32
] and the biomedical data has been an
increasingly popular research area, such as medical code assign-
ment, medical text classication [
29
], and medical event sequential
learning. Several models have been developed for medical code
assignment, which is a prerequisite of the DRG grouper. Multi-
modal Fusion Architecture SeArch (MUFASA) model investigates
multimodal fusion strategies to predict the diagnosis code from
comprehensive EHR data [
31
]. Yan et al. design a knowledge-driven
generation model for short procedure mention normalization [
33
].
Limited research has been performed on direct DRG prediction.
Gartner et al. apply the shallow learning approach to predict the
DRG from the free text with complicated feature engineering, fea-
ture selection, and 9 classication techniques [
6
]. AMANet [
8
] treats
the DRG classication as a dual-view sequential learning task based
on standardized diagnosis and procedure sequences. BioBERT [
19
]
is a domain-specic language representation model pre-trained on
large-scale biomedical corpora, which is based on BERT. Med-BERT
[
28
] adapts the BERT framework for pre-training contextualized
embedding models on the structured diagnosis ICD code from EHR.
ClinicalBERT [
9
] uses clinical text to train the BERT framework for
KG-MTT-BERT: Knowledge Graph Enhanced BERT for Multi-Type Medical Text Classification Woodstock ’18, June 03–05, 2018, Woodstock, NY
predicting 30-day hospital readmission at various time points of
admission.
3 METHOD
3.1 Problem Formulation
In this section, we dene the notations used in this paper. The goal of
the text classication task is to predict the target
𝑌
given multi-type
input texts
𝑋𝑠
, where
𝑠
stands for the type id. Let
𝑆𝑖={𝑋𝑠𝑖,𝑇𝑖, 𝑦𝑖}
to be the sample with index
𝑖
, where
𝑋𝑠𝑖={𝑋𝑖1, 𝑋𝑖2, ..., 𝑋𝑖𝑘 }
is the
input texts of sample
𝑖
,
𝑋𝑖 𝑗 ={𝑥𝑖 𝑗1, 𝑥𝑖 𝑗 2, ..., 𝑥𝑖 𝑗𝑙𝑖 𝑗 }
is the
𝑗
-th text of
sample
𝑖
,
𝑘
is the number of input text types,
𝑙𝑖 𝑗
is the length of the
text
𝑋𝑖 𝑗
,
𝑥𝑖 𝑗𝑝
is the
𝑝
-th token of the text
𝑋𝑖 𝑗
,
𝑇𝑖={𝑡𝑖1, 𝑡𝑖2, ..., 𝑡𝑖𝑚 }
is the relevant entities of sample
𝑖
, and
𝑦𝑖
is the target label. The
entities, including the principal diagnosis, principal procedure, and
symptoms, are extracted from input texts.
In the DRG classication task, we predict the DRG according
to basic information, chief complaint, history of present illness,
hospital course, procedure text, diagnosis text, and the entities
extracted from the patient’s medical record of the current visit.
This is a multi-class classication task with 𝑑𝑦categories.
3.2 Medical Knowledge Graph
Our model integrates the knowledge graph to ne-tune the Chi-
nese medicine clinical text classication task. And the knowledge
graph is dynamically built from the training set in each dataset by
extracting entities and the co-occurrence relations between these
entities. We build the Knowledge Graph for each dataset indepen-
dently using entities extracted from each
training set
. The medical
knowledge graph includes four types of entities, i.e., DRG, diagno-
sis, procedure, and symptom, and according to the meaning of these
co-occurrence relations, only four of them are left, i.e., DRG-related
diagnosis, DRG-related procedure, diagnosis-related procedure, and
diagnosis-related symptom. The symptom set is from the Chinese
medial knowledge database OMAHA
1
. The Chinese names of di-
agnosis and procedure are from ICD-10 and ICD-9-CM-3 in China,
respectively. The DRG comes from multiple dierent versions. We
extract the entity from the medical text by going through the NER
(named-entity recognition) and the subsequence matching algo-
rithm, and these extracted entities are matched to the standard
entity names through the string similarity measure algorithm based
on the LCS (longest common subsequence). The process is shown in
Figure 1. The symptoms are extracted from the EHR eld chief com-
plaint and history of present illness. The principal diagnosis and
principal procedure entities are extracted from EHR eld diagnosis
and procedure (free text), respectively. DRG is the category label.
The extracted entities are linked if they are in the same hospital
visit. More information on medical knowledge graph building and
statistics can be found in the Appendix Medical Knowledge Graph
Section. A part of the knowledge graph is shown in Figure 2.
3.3 Network Architect
Our model is named KG-MTT-BERT (Knowledge Graph Enhanced
Multi-Type Text BERT), and its structure is shown in Figure 3. The
pseudo-code of our model is in Appendix Pseudo-code Section.
1https://www.omaha.org.cn/
Figure 1: Extracting standard entity names from text.
Figure 2: A part of the medical knowledge graph.
We will open our code to GitHub. In this section, we describe the
modules of our model in detail.
BERT Encoder
.We use the same BERT Encoder to extract the
features of the six types of input texts respectively to solve the
problem that a single text generally doesn’t exceed 512 but the six
types of texts are combined and exceed the length limit. And The
encoder output two types of encoding vectors: text-level encoding
vectors and token-level encoding vectors.
Text-Level Encoding.
It saves the last layer hidden state vector
[𝐶𝐿𝑆 ]
of the rst token [CLS] for the text-level representation of
each text.
𝐸𝑖=Encoder(𝑋𝑠𝑖)=Encoder({𝑋𝑖1, ..., 𝑋𝑖𝑘 })
={{[𝐶𝐿𝑆 ]
𝑖1}, ..., {[𝐶𝐿𝑆 ]
𝑖𝑘 }} (1)
where 𝑖is the sample id, 𝑘is the number of input text types.
Token-Level Encoding.
It saves the last layer hidden state matrix
𝐻
of each text as the token-level encoding. The vector
𝑖 𝑗𝑝
of
𝐻𝑖 𝑗
is the encoding vector of token 𝑥𝑖 𝑗𝑝 of text 𝑋𝑖 𝑗 of sample 𝑖.
𝐸𝑖=Encoder(𝑋𝑠𝑖)=Encoder({𝑋𝑖1, ..., 𝑋𝑖𝑘 })
={{𝐻𝑖1}, ..., {𝐻𝑖𝑘 }}
={{𝑖11, .., ℎ𝑖1𝑙𝑖1}, ..., {𝑖𝑘1, ..., ℎ𝑖𝑘𝑙𝑖𝑘 }}
(2)
where
𝑖
is the sample id,
𝑘
is the number of input text types,
𝑙𝑖𝑘
is
the length of text 𝑋𝑖𝑘 .
Limited by machine resources and to simplify the model, all
types of input texts share the same BERT-Encoder.
Concatenate Layer
.This layer concatenates the encoded vec-
tors or matrices 𝐸𝑖into matrix 𝐶𝑖:
𝐶𝑖={[𝐶𝐿𝑆 ]
𝑖1, ℎ [𝐶𝐿𝑆 ]
𝑖2, ..., ℎ [𝐶𝐿𝑆 ]
𝑖𝑘 }(3)
𝐶𝑖={𝑖11, .., ℎ𝑖1𝑙𝑖1, ..., ℎ𝑖𝑘1, ..., ℎ𝑖𝑘𝑙𝑖𝑘 }(4)
摘要:

KG-MTT-BERT:KnowledgeGraphEnhancedBERTforMulti-TypeMedicalTextClassificationYongHe,ChengWang,ShunZhang,NanLi,ZhaorongLi,ZhenyuZeng{sanyuan.hy,youmiao.wc,changchuan.zs,kido.ln,zhaorong.lzr,zhenyu.zzy}@alibaba-inc.comAlibabaGroupHangzhou,ChinaABSTRACTMedicaltextlearninghasrecentlyemergedasapromisingar...

展开>> 收起<<
KG-MTT-BERT Knowledge Graph Enhanced BERT for Multi-Type Medical Text Classification.pdf

共11页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:11 页 大小:2.46MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 11
客服
关注