KG-MTT-BERT Knowledge Graph Enhanced BERT for Multi-Type Medical Text Classification

2025-05-06 3 0 2.46MB 11 页 10玖币

侵权投诉

KG-MTT-BERT: Knowledge Graph Enhanced BERT for

Multi-Type Medical Text Classification

Yong He, Cheng Wang, Shun Zhang, Nan Li, Zhaorong Li, Zhenyu Zeng

{sanyuan.hy,youmiao.wc,changchuan.zs,kido.ln,zhaorong.lzr,zhenyu.zzy}@alibaba-inc.com

Alibaba Group

Hangzhou, China

ABSTRACT

Medical text learning has recently emerged as a promising area to

improve healthcare due to the wide adoption of electronic health

record (EHR) systems. The complexity of the medical text such as

diverse length, mixed text types, and full of medical jargon, poses a

great challenge for developing eective deep learning models. BERT

has presented state-of-the-art results in many NLP tasks, such as

text classication and question answering. However, the standalone

BERT model cannot deal with the complexity of the medical text,

especially the lengthy clinical notes. Herein, we develop a new

model called KG-MTT-BERT (Knowledge Graph Enhanced Multi-

Type Text BERT) by extending the BERT model for long and multi-

type text with the integration of the medical knowledge graph.

Our model can outperform all baselines and other state-of-the-

art models in diagnosis-related group (DRG) classication, which

requires comprehensive medical text for accurate classication.

We also demonstrated that our model can eectively handle multi-

type text and the integration of medical knowledge graph can

signicantly improve the performance.

CCS CONCEPTS

•Computing methodologies →Language resources

;

•Applied

computing →Health informatics.

KEYWORDS

EHR, BERT, Knowledge Graph, Multi-Type Medical Text, Text Clas-

sication, Diagnosis-Related Group (DRG)

ACM Reference Format:

Yong He, Cheng Wang, Shun Zhang, Nan Li, Zhaorong Li, Zhenyu Zeng.

2018. KG-MTT-BERT: Knowledge Graph Enhanced BERT for Multi-Type

Medical Text Classication. In Woodstock ’18: ACM Symposium on Neural

Gaze Detection, June 03–05, 2018, Woodstock, NY . ACM, New York, NY, USA,

11 pages.

1 INTRODUCTION

In recent years, the broad usage of electronic health record (EHR)

systems has opened up many opportunities for applying deep learn-

ing methods to improve the quality of healthcare. Many predictive

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for prot or commercial advantage and that copies bear this notice and the full citation

on the rst page. Copyrights for components of this work owned by others than ACM

must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,

to post on servers or to redistribute to lists, requires prior specic permission and/or a

fee. Request permissions from permissions@acm.org.

Woodstock ’18, June 03–05, 2018, Woodstock, NY

ACM ISBN 978-1-4503-XXXX-X/18/06... $15.00

models have been developed for more eective clinical decision-

making, such as predicting diagnoses and recommending medicine.

With the rapidly aging population in many countries, the rational

usage of scarce medical resources is critical, and developing a ma-

chine learning model for hospital resource management is crucial

to improving healthcare quality and outcomes.

Diagnosis-related groups (DRGs) has been one of the most pop-

ular prospective payment systems introduced to put pressure on

hospitals to optimize the allocation of medical resources in many

countries. The basic idea is to classify inpatients into a limited num-

ber of DRG groups, assuming patients in the same DRG group share

similar demographic and clinical characteristics and are likely to

consume similar amounts of hospital resources. The reimbursement

to the hospital is the same for the same DRG category, regardless of

the variant actual costs for patients within the same DRG category.

As a result, the DRG system can motivate hospitals to reduce costs,

improve resource allocation, and increase operational eciency.

The DRG used for reimbursement purposes is determined by two

steps when a patient is discharged. A medical coder reviews the

clinical documents, assigns standard medical codes, and selects one

principal diagnosis following the guideline of coding and reporting

using the International Classication of Diseases (ICD) or other

classication systems. Then a DRG category is assigned with a set

of rules implemented in a software (i.e. DRG grouper) using the

following variables: principal diagnosis code, secondary diagnosis

codes, procedure codes, age, sex, discharge status, and length of

stay. In addition to post-hospital billing, DRG can be repurposed for

in-hospital operation management and a previous study has shown

that classifying a patient’s DRG during the hospital visit can allow

the hospital to more eectively allocated hospital resources and to

improve the facilitation of the operational planning [

]. Directly

inferring DRG from the complex medical text of EHR is a challenge

because the model needs to mimic the professional medical coder’s

expertise and learn the rule of the DRG grouper.

Multiple types of medical texts from EHR are closely related to

DRG, such as diagnosis, procedure, and hospital course. We formu-

late the DRG classication task as a multi-type text classication

problem. The following characteristic of these texts poses a great

challenge for medical text mining. 1) Diverse length: diagnoses and

procedures are typically short texts, while the hospital course is

always lengthy with detailed information of current hospital visit;

2) Multiple Types: the dierent elds in EHR are to serve dierent

purposes and thus texts in dierent elds are of dierent types

(See Appendix A.1 for more details of multi-type medical text); 3)

Medical-domain specic: the free text from EHR contains a lot of

medical terminology and a medical coder assigns diagnosis code

with relevant domain knowledge.

arXiv:2210.03970v1 [cs.CL] 8 Oct 2022

Woodstock ’18, June 03–05, 2018, Woodstock, NY Yong He, Cheng Wang, Shun Zhang, Nan Li, Zhaorong Li, Zhenyu Zeng

BERT[

] has been widely used in text classication and typically

multiple texts are concatenated into one long text for modeling

purposes. However, there are three limitations of directly applying

the BERT model for multi-type texts. First, the input of BERT is

limited to 512 tokens, and combining all texts could easily exceed

this limit and lead to information loss. Second, the concatenation

of multiple types of text may not be feasible as some texts are

syntactically dierent or contextually irrelevant, which makes the

direct self-attention module of BERT unnecessary. Lastly, BERT

was pre-trained over open-domain corpora and it does not play

well with EHR data in the medical eld.

To overcome the above limitations for multi-type text classica-

tion, we develop a new model KG-MTT-BERT (Knowledge Graph

Enhanced Multi-Type Text BERT) to extend the BERT model for

multi-type text and integrate medical knowledge graph for do-

main specicity. Our model rst applies the same BERT-Encoder

to process each text, i.e. the encoder of each text of one patient

shares the same parameters. The two levels of encoding outputs

from BERT-Encoder with dierent granularities, one is at text-level

and the other is at token-level, are compared in this paper. Multi-

ple encodings from input texts are concatenated together as the

representation matrix and dierent types of pooling layers are in-

vestigated for better information summation. In addition, we use

the knowledge graph to enhance the model’s ability to handle the

medical domain knowledge. Finally, a classication layer is utilized

to predict DRG categories.

We have conducted experiments on three DRG datasets from

Tertiary hospitals in China (See Appendix A.2 for the denition of

hospital classication). Our model can outperform baselines and

other state-of-the-art models in all datasets. The rest of the paper

is organized as follows. We review related works and techniques

in the Related Works Section and introduce the formulation and

architecture of our model in the Method Section. Then we report

the performances of our model in three datasets and investigate

hyperparameter eects and multiple ablation studies to further the

understanding of the model. Lastly, we conclude our paper and

point out our future directions.

2 RELATED WORKS

Text classication is an important Natural Language Processing

(NLP) task directly related to many real-world applications. The

goal of text classication is to automatically classify the text into

predened labels. The unstructured nature of free text requires the

transformation of text into a numerical representation for modeling.

Over the past decade, text classication has changed from shallow

to deep learning methods, according to the model structures [

Shallow learning models focus on the feature extraction and classi-

er selection, while deep learning models can perform automatic

high-level feature engineering and then t with the classier to-

gether.

Shallow learning model rst converts text into vectors using text

representation methods like Bag-of-words (BOW), N-gram, term

frequency-inverse document frequency (TF-IDF) [

], Word2vec

[

], and GloVe [

], then trains the shadow model to classify, such

as Naive Bayes [

], Support Vector Machine [

], Random Forest

[

], XGBoost [

], and LightGBM [

]. In practice, the classier

is trained and routinely selected from the zoo of shallow models.

Therefore, feature extraction and engineering is the critical step for

the performance of text classication.

Various deep learning models have been proposed in recent years

for text classication, which builds on basic deep learning tech-

niques like CNN, RNN, and the attention mechanism. TextCNN [

]

applies the convolutional neural network for sentence classication

tasks. The RNNs, such as long short-term memory (LSTM), are

broadly used to capture long-range dependence. Zhang et al. intro-

duce bidirectional long short-term memory networks (BLSTM) to

model the sequential information about all words before and after

it [

]. Zhou et al. integrate attention with BLSTM as Att-BLSTM to

capture the most important semantic information in a text [

]. Re-

current Convolutional Neural Network (RCNN) combines recurrent

structure to capture the contextual information and max-pooling

layer to judge the key message from features [

]. Therefore, RCNN

leverages the advantages of both RNN and CNN. Johnson et al. de-

velop a deep pyramid CNN (DPCNN) model to increase the depth of

CNN by alternating a convolution block and a down-sampling layer

over and over [

]. Yang et al. design a hierarchical attention net-

work (HAN) for document classication by aggregating important

words to a sentence and then aggregating importance sentences to

document [

]. The appearance of BERT, which uses the masked

language model to pre-train deep bidirectional representation, is a

signicant turning point in developing text classication models

and other NLP technologies. The memory and computation cost

of self-attention grows quadratically with text length and prevents

applications for long sequences. Longformer [

] and Reformer [

]

are designed to address this limitation.

Domain knowledge serves a principal role in many industries. In

order to integrate domain knowledge, it is necessary to embed the

entities and relationships of the knowledge graph. Many knowledge

graph embedding methods have been proposed, such as TransE [

TransH [

], TransR [

], TransD [

], KG2E [

], and TranSparse

[

]. Some existing works show that the knowledge graph can

improve the ability of BERT, such as K-BERT [

] and E-BERT [

Data mining of EHR [

] and the biomedical data has been an

increasingly popular research area, such as medical code assign-

ment, medical text classication [

], and medical event sequential

learning. Several models have been developed for medical code

assignment, which is a prerequisite of the DRG grouper. Multi-

modal Fusion Architecture SeArch (MUFASA) model investigates

multimodal fusion strategies to predict the diagnosis code from

comprehensive EHR data [

]. Yan et al. design a knowledge-driven

generation model for short procedure mention normalization [

Limited research has been performed on direct DRG prediction.

Gartner et al. apply the shallow learning approach to predict the

DRG from the free text with complicated feature engineering, fea-

ture selection, and 9 classication techniques [

]. AMANet [

] treats

the DRG classication as a dual-view sequential learning task based

on standardized diagnosis and procedure sequences. BioBERT [

]

is a domain-specic language representation model pre-trained on

large-scale biomedical corpora, which is based on BERT. Med-BERT

[

] adapts the BERT framework for pre-training contextualized

embedding models on the structured diagnosis ICD code from EHR.

ClinicalBERT [

] uses clinical text to train the BERT framework for

KG-MTT-BERT: Knowledge Graph Enhanced BERT for Multi-Type Medical Text Classification Woodstock ’18, June 03–05, 2018, Woodstock, NY

predicting 30-day hospital readmission at various time points of

admission.

3 METHOD

3.1 Problem Formulation

In this section, we dene the notations used in this paper. The goal of

the text classication task is to predict the target

𝑌

given multi-type

input texts

𝑋𝑠

, where

𝑠

stands for the type id. Let

𝑆𝑖={𝑋𝑠𝑖,𝑇𝑖, 𝑦𝑖}

to be the sample with index

𝑖

, where

𝑋𝑠𝑖={𝑋𝑖1, 𝑋𝑖2, ..., 𝑋𝑖𝑘 }

is the

input texts of sample

𝑖

𝑋𝑖 𝑗 ={𝑥𝑖 𝑗1, 𝑥𝑖 𝑗 2, ..., 𝑥𝑖 𝑗𝑙𝑖 𝑗 }

is the

𝑗

-th text of

sample

𝑖

𝑘

is the number of input text types,

𝑙𝑖 𝑗

is the length of the

text

𝑋𝑖 𝑗

𝑥𝑖 𝑗𝑝

is the

𝑝

-th token of the text

𝑋𝑖 𝑗

𝑇𝑖={𝑡𝑖1, 𝑡𝑖2, ..., 𝑡𝑖𝑚 }

is the relevant entities of sample

𝑖

, and

𝑦𝑖

is the target label. The

entities, including the principal diagnosis, principal procedure, and

symptoms, are extracted from input texts.

In the DRG classication task, we predict the DRG according

to basic information, chief complaint, history of present illness,

hospital course, procedure text, diagnosis text, and the entities

extracted from the patient’s medical record of the current visit.

This is a multi-class classication task with 𝑑𝑦categories.

3.2 Medical Knowledge Graph

Our model integrates the knowledge graph to ne-tune the Chi-

nese medicine clinical text classication task. And the knowledge

graph is dynamically built from the training set in each dataset by

extracting entities and the co-occurrence relations between these

entities. We build the Knowledge Graph for each dataset indepen-

dently using entities extracted from each

training set

. The medical

knowledge graph includes four types of entities, i.e., DRG, diagno-

sis, procedure, and symptom, and according to the meaning of these

co-occurrence relations, only four of them are left, i.e., DRG-related

diagnosis, DRG-related procedure, diagnosis-related procedure, and

diagnosis-related symptom. The symptom set is from the Chinese

medial knowledge database OMAHA

. The Chinese names of di-

agnosis and procedure are from ICD-10 and ICD-9-CM-3 in China,

respectively. The DRG comes from multiple dierent versions. We

extract the entity from the medical text by going through the NER

(named-entity recognition) and the subsequence matching algo-

rithm, and these extracted entities are matched to the standard

entity names through the string similarity measure algorithm based

on the LCS (longest common subsequence). The process is shown in

Figure 1. The symptoms are extracted from the EHR eld chief com-

plaint and history of present illness. The principal diagnosis and

principal procedure entities are extracted from EHR eld diagnosis

and procedure (free text), respectively. DRG is the category label.

The extracted entities are linked if they are in the same hospital

visit. More information on medical knowledge graph building and

statistics can be found in the Appendix Medical Knowledge Graph

Section. A part of the knowledge graph is shown in Figure 2.

3.3 Network Architect

Our model is named KG-MTT-BERT (Knowledge Graph Enhanced

Multi-Type Text BERT), and its structure is shown in Figure 3. The

pseudo-code of our model is in Appendix Pseudo-code Section.

1https://www.omaha.org.cn/

Figure 1: Extracting standard entity names from text.

Figure 2: A part of the medical knowledge graph.

We will open our code to GitHub. In this section, we describe the

modules of our model in detail.

BERT Encoder

.We use the same BERT Encoder to extract the

features of the six types of input texts respectively to solve the

problem that a single text generally doesn’t exceed 512 but the six

types of texts are combined and exceed the length limit. And The

encoder output two types of encoding vectors: text-level encoding

vectors and token-level encoding vectors.

Text-Level Encoding.

It saves the last layer hidden state vector

ℎ[𝐶𝐿𝑆 ]

of the rst token [CLS] for the text-level representation of

each text.

𝐸𝑖=Encoder(𝑋𝑠𝑖)=Encoder({𝑋𝑖1, ..., 𝑋𝑖𝑘 })

={{ℎ[𝐶𝐿𝑆 ]

𝑖1}, ..., {ℎ[𝐶𝐿𝑆 ]

𝑖𝑘 }} (1)

where 𝑖is the sample id, 𝑘is the number of input text types.

Token-Level Encoding.

It saves the last layer hidden state matrix

𝐻

of each text as the token-level encoding. The vector

ℎ𝑖 𝑗𝑝

𝐻𝑖 𝑗

is the encoding vector of token 𝑥𝑖 𝑗𝑝 of text 𝑋𝑖 𝑗 of sample 𝑖.

𝐸𝑖=Encoder(𝑋𝑠𝑖)=Encoder({𝑋𝑖1, ..., 𝑋𝑖𝑘 })

={{𝐻𝑖1}, ..., {𝐻𝑖𝑘 }}

={{ℎ𝑖11, .., ℎ𝑖1𝑙𝑖1}, ..., {ℎ𝑖𝑘1, ..., ℎ𝑖𝑘𝑙𝑖𝑘 }}

(2)

where

𝑖

is the sample id,

𝑘

is the number of input text types,

𝑙𝑖𝑘

the length of text 𝑋𝑖𝑘 .

Limited by machine resources and to simplify the model, all

types of input texts share the same BERT-Encoder.

Concatenate Layer

.This layer concatenates the encoded vec-

tors or matrices 𝐸𝑖into matrix 𝐶𝑖:

𝐶𝑖={ℎ[𝐶𝐿𝑆 ]

𝑖1, ℎ [𝐶𝐿𝑆 ]

𝑖2, ..., ℎ [𝐶𝐿𝑆 ]

𝑖𝑘 }(3)

𝐶𝑖={ℎ𝑖11, .., ℎ𝑖1𝑙𝑖1, ..., ℎ𝑖𝑘1, ..., ℎ𝑖𝑘𝑙𝑖𝑘 }(4)

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

KG-MTT-BERT:KnowledgeGraphEnhancedBERTforMulti-TypeMedicalTextClassificationYongHe,ChengWang,ShunZhang,NanLi,ZhaorongLi,ZhenyuZeng{sanyuan.hy,youmiao.wc,changchuan.zs,kido.ln,zhaorong.lzr,zhenyu.zzy}@alibaba-inc.comAlibabaGroupHangzhou,ChinaABSTRACTMedicaltextlearninghasrecentlyemergedasapromisingar...

展开>> 收起<<

KG-MTT-BERT Knowledge Graph Enhanced BERT for Multi-Type Medical Text Classification.pdf

共11页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

KG-MTT-BERT Knowledge Graph Enhanced BERT for Multi-Type Medical Text Classification

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: