Improving the Factual Correctness of Radiology Report Generation with Semantic Rewards Jean-Benoit Delbrouck Pierre Chambon Christian Bluethgen

2025-05-08 2 0 1.07MB 13 页 10玖币

侵权投诉

Improving the Factual Correctness of Radiology Report Generation with

Semantic Rewards

Jean-Benoit Delbrouck, Pierre Chambon, Christian Bluethgen,

Emily Tsai, Omar Almusa, Curtis P. Langlotz

Stanford University

jbdel@stanford.edu

Abstract

Neural image-to-text radiology report genera-

tion systems offer the potential to improve ra-

diology reporting by reducing the repetitive

process of report drafting and identifying pos-

sible medical errors. These systems have

achieved promising performance as measured

by widely used NLG metrics such as BLEU

and CIDEr. However, the current systems face

important limitations. First, they present an in-

creased complexity in architecture that offers

only marginal improvements on NLG metrics.

Secondly, these systems that achieve high per-

formance on these metrics are not always fac-

tually complete or consistent due to both inad-

equate training and evaluation. Recent studies

have shown the systems can be substantially

improved by using new methods encouraging

1) the generation of domain entities consistent

with the reference and 2) describing these en-

tities in inferentially consistent ways. So far,

these methods rely on weakly-supervised ap-

proaches (rule-based) and named entity recog-

nition systems that are not speciﬁc to the chest

X-ray domain. To overcome this limitation, we

propose a new method, the RadGraph reward,

to further improve the factual completeness

and correctness of generated radiology reports.

More precisely, we leverage the RadGraph

dataset containing annotated chest X-ray re-

ports with entities and relations between enti-

ties. On two open radiology report datasets,

our system substantially improves the scores

up to 14.2% and 25.3% on metrics evaluating

the factual correctness and completeness of re-

ports.

1 Introduction

An important medical application of natural

language generation (NLG) is to build assistive

systems that take X-ray images of a patient and

generate a textual report describing clinical obser-

vations in the images (Jing et al.,2018;Li et al.,

2018;Chen et al.,2020;Miura et al.,2021). This is

Figure 1: Overview of our radiology report generation

pipeline. First, a neural network generates a radiology

report given a chest X-ray image. We then leverage

RadGraph to create semantic annotations of the output

used to design reinforcement learning rewards.

a clinically important task, offering the potential to

reduce radiologists’ repetitive work and generally

improve clinical communication (Kahn Jr et al.,

2009).

Recently, a lot of attention has been given to new

architectures (Chen et al.,2020,2021;Alfarghaly

et al.,2021) and how the structure of data could

be input into the system (Liu et al.,2021a). These

systems have achieved promising performance as

measured by widely used NLG metrics such as

BLEU (Papineni et al.,2002) and CIDEr (Vedan-

tam et al.,2015). However, these studies face

important limitations. First, they present an

increased complexity in architecture that offers

only marginal improvements on NLG metrics. Sec-

ondly, these systems that achieve high performance

on NLG metrics are not always factually complete

or consistent due to both inadequate training and

evaluation of these systems. Miura et al. (2021)

have shown that existing systems are inadequate

in factual completeness and consistency, and

arXiv:2210.12186v1 [cs.CL] 21 Oct 2022

that an image-to-text radiology report generation

(RRG) system can be substantially improved

by replacing the widely used NLG metrics with

"factually-oriented" methods encouraging 1) the

generation of domain entities consistent with

the reference and 2) describing these entities in

inferentially consistent ways. So far, these new

methods rely on weakly-supervised approaches

(rule-based) to construct NLI models for radiology

reports and biomedical named entity recognition

systems (Zhang et al.,2021) that are not speciﬁc to

chest X-rays.

Despite these "factually-oriented" methods being

weakly supervised or being limited to generic

biomedical entities, their use showed substantial

improvements on a wide range of metrics and

board-certiﬁed radiologists’ evaluations. These

ﬁndings motivate us to propose a new method to

further improve the factual completeness and cor-

rectness of generated radiology reports. More pre-

cisely, we leverage RadGraph (Jain et al.,2021), a

dataset annotated by radiologists containing chest

X-ray radiology reports along with annotated en-

tities and relations. These annotations allow us to

create two semantic graphs, one for the generated

and one for the reference report. We then introduce

three simple rewards that score the differences be-

tween the two graphs in terms of entities and re-

lations. These rewards can be directly optimized

using Reinforcement Learning (RL) to further im-

prove the quality of the generated report by our

systems. By doing so, we show on two popular

chest X-ray datasets that our models are able to

maximize the deﬁned rewards but also outperform

the previous works on various NLG and factually-

oriented metrics.

In summary our contributions are:

•

We propose a simple RRG architecture that 1)

is fast to train and suitable for a RL setup and

2) performs equally well as the previous and

more complex architectures proposed in the

literature.

•

We leverage the RadGraph dataset and the as-

sociated ﬁne-tuned model to design semantic-

based rewards that qualitatively evaluate the

factual correctness and completeness of the

generated reports.

•

We show on two datasets that directly optimiz-

ing these rewards outperforms previous ap-

proaches that prioritize traditional NLG met-

rics.

The paper is structured as follows: ﬁrst, we de-

scribe our factually-oriented graph-based rewards

(§2). More precisely, we begin by examining the

RadGraph dataset (§ 2.1) and how we leveraged

the annotations to create our rewards (§ 2.2). Then,

we explain the architecture of the model (§ 3) that

we used to generate reports and how we trained it

using negative log-likelihood (NLL) and RL. The

sections that follow afterwards are dedicated to

the datasets used for the experiments (§ 4) and

the metrics (§ 5) chosen to evaluate the genera-

tion of reports. This latter section is divided into

two groups: the classic NLG metrics (§ 5.1) and

the factually-oriented metrics (§ 5.2). Finally, we

present the results (§ 6) and end this paper with a

section addressing related works (§ 7).

2 Factually-oriented Graph-Based

Reward

In this section, we present a new semantic graph-

based reward, called the RadGraph reward, used

throughout our experiments. We ﬁrst start by ex-

plaining in Section 2.1 the RadGraph dataset and

how we get the annotations that shape our reward.

In Section 2.2, we explain how we construct the

RadGraph reward and its different variants.

2.1 RadGraph

Figure 2: An example of a report annotated with enti-

ties and relations in the RadGraph dataset

RadGraph (Jain et al.,2021) is a dataset of en-

tities and relations in full-text chest X-ray radi-

ology reports based on a novel information ex-

traction schema designed to structure radiology

reports. The dataset contains board-certiﬁed radi-

ologist annotations of 500 radiology reports from

the MIMIC-CXR dataset (Johnson et al.,2019),

which correspond in total to 14,579 entities and

10,889 relations. In addition, RadGraph also in-

cludes a test dataset of 100 radiology reports, split

between two independent sets of board-certiﬁed

radiologist annotations on reports from MIMIC-

CXR and CheXpert (Smit et al.,2020) datasets (50

reports each).

Entities

An entity is deﬁned as a continuous span

of text that can include one or more adjacent words.

Entities in RadGraph center around two concepts:

Anatomy and Observation. Three uncertainty levels

exist for Observation, leading to four different enti-

ties: Anatomy (ANAT-DP), Observation: Deﬁnitely

Present (OBS-DP), Observation: Uncertain (OBS-

U), and Observation: Deﬁnitely Absent (OBS-DA).

Anatomy refers to an anatomical body part that

occurs in a radiology report, such as a “lung”. Ob-

servation refers to words associated with visual

features, identiﬁable pathophysiologic processes,

or diagnostic disease classiﬁcations. As an exam-

ple, an Observation could be “effusion” or more

general phrases like “increased”.

Relations

A relation is deﬁned as a directed

edge between two entities. Three levels exist:

Suggestive Of (., .),Located At (., .), and Modify

(., .).Suggestive Of (Observation, Observation)

is a relation between two Observation entities

indicating that the presence of the second Observa-

tion is inferred from that of the ﬁrst Observation.

Located At (Observation, Anatomy) is a relation

between an Observation entity and an Anatomy

entity indicating that the Observation is related

to the Anatomy. While Located At often refers

to location, it can also be used to describe other

relations between an Observation and an Anatomy,

such as shape or color. Modify (Observation,

Observation) or Modify (Anatomy, Anatomy) is a

relation between two Observation entities or two

Anatomy entities indicating that the ﬁrst entity

modiﬁes the scope of, or quantiﬁes the degree of,

the second entity.

The authors also released a PubMedBERT (Gu

et al.,2021) model ﬁne-tuned on the RadGraph

dataset. We leverage this trained model to create

the annotations for the datasets used in our exper-

iments. We will refer to this model as RadGraph

model in what follows.

2.2 RadGraph reward

Using RadGraph annotation scheme and model,

we design F-score style rewards that measure

consistency and completeness of generated

radiology reports compared to reference reports.

Each of our rewards leverages the outputs of

Figure 3: Graph view of the RadGraph annotations for

the report in Figure 2.

the released ﬁne-tuned PubMedBERT model

on RadGraph, namely the entities and the re-

lations, on both a generated report and its reference.

The RadGraph annotations of a report can be

represented as a graph

G(V, E)

with the set of

nodes

V={v1, v2, . . . , v|V|}

containing the en-

tities and the set of edges

E={e1, e2, . . . , e|E|}

the relations between pairs of entities. The

graph is directed, meaning that the edge

e= (v1, v2)6= (v2, v1)

. An example is depicted

in Figure 3. Each node or edge of the graph also

has a label, which we denote as

viL

for an entity

(for example "OBS-DP" or "ANAT") and

eijL

for a relation

e= (vi, vj)

(such as "modiﬁed" or

"located at"). We now proceed to describe three of

our rewards.

RGE

This reward focuses only on the nodes

. For the generated report

, we create a new

set of node-label pairs

Vy={(vi, viL)}i∈[1..|V|]

comprising all entities and their corresponding

labels. We proceed to construct the same set for

the reference report ˆyand denote this set ¯

Vˆy.

RGER

This reward focuses on the nodes

and

whether or not a node has a relation in

. For the

generated report

, we create a new set of triplets

Vy={(vi, viL, i)}i∈[1..|V|]

. The value of

i

has a relation in

else 0. We proceed to

construct the same set for the reference report

ˆy

and denote this set ¯

Vˆy.

RGER

This reward focuses on the nodes

and their relations in

. For the gener-

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ImprovingtheFactualCorrectnessofRadiologyReportGenerationwithSemanticRewardsJean-BenoitDelbrouck,PierreChambon,ChristianBluethgen,EmilyTsai,OmarAlmusa,CurtisP.LanglotzStanfordUniversityjbdel@stanford.eduAbstractNeuralimage-to-textradiologyreportgenera-tionsystemsofferthepotentialtoimprovera-diologyr...

展开>> 收起<<

Improving the Factual Correctness of Radiology Report Generation with Semantic Rewards Jean-Benoit Delbrouck Pierre Chambon Christian Bluethgen.pdf

共13页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Improving the Factual Correctness of Radiology Report Generation with Semantic Rewards Jean-Benoit Delbrouck Pierre Chambon Christian Bluethgen

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: