Improving the Factual Correctness of Radiology Report Generation with Semantic Rewards Jean-Benoit Delbrouck Pierre Chambon Christian Bluethgen

2025-05-08 0 0 1.07MB 13 页 10玖币
侵权投诉
Improving the Factual Correctness of Radiology Report Generation with
Semantic Rewards
Jean-Benoit Delbrouck, Pierre Chambon, Christian Bluethgen,
Emily Tsai, Omar Almusa, Curtis P. Langlotz
Stanford University
jbdel@stanford.edu
Abstract
Neural image-to-text radiology report genera-
tion systems offer the potential to improve ra-
diology reporting by reducing the repetitive
process of report drafting and identifying pos-
sible medical errors. These systems have
achieved promising performance as measured
by widely used NLG metrics such as BLEU
and CIDEr. However, the current systems face
important limitations. First, they present an in-
creased complexity in architecture that offers
only marginal improvements on NLG metrics.
Secondly, these systems that achieve high per-
formance on these metrics are not always fac-
tually complete or consistent due to both inad-
equate training and evaluation. Recent studies
have shown the systems can be substantially
improved by using new methods encouraging
1) the generation of domain entities consistent
with the reference and 2) describing these en-
tities in inferentially consistent ways. So far,
these methods rely on weakly-supervised ap-
proaches (rule-based) and named entity recog-
nition systems that are not specific to the chest
X-ray domain. To overcome this limitation, we
propose a new method, the RadGraph reward,
to further improve the factual completeness
and correctness of generated radiology reports.
More precisely, we leverage the RadGraph
dataset containing annotated chest X-ray re-
ports with entities and relations between enti-
ties. On two open radiology report datasets,
our system substantially improves the scores
up to 14.2% and 25.3% on metrics evaluating
the factual correctness and completeness of re-
ports.
1 Introduction
An important medical application of natural
language generation (NLG) is to build assistive
systems that take X-ray images of a patient and
generate a textual report describing clinical obser-
vations in the images (Jing et al.,2018;Li et al.,
2018;Chen et al.,2020;Miura et al.,2021). This is
Figure 1: Overview of our radiology report generation
pipeline. First, a neural network generates a radiology
report given a chest X-ray image. We then leverage
RadGraph to create semantic annotations of the output
used to design reinforcement learning rewards.
a clinically important task, offering the potential to
reduce radiologists’ repetitive work and generally
improve clinical communication (Kahn Jr et al.,
2009).
Recently, a lot of attention has been given to new
architectures (Chen et al.,2020,2021;Alfarghaly
et al.,2021) and how the structure of data could
be input into the system (Liu et al.,2021a). These
systems have achieved promising performance as
measured by widely used NLG metrics such as
BLEU (Papineni et al.,2002) and CIDEr (Vedan-
tam et al.,2015). However, these studies face
important limitations. First, they present an
increased complexity in architecture that offers
only marginal improvements on NLG metrics. Sec-
ondly, these systems that achieve high performance
on NLG metrics are not always factually complete
or consistent due to both inadequate training and
evaluation of these systems. Miura et al. (2021)
have shown that existing systems are inadequate
in factual completeness and consistency, and
arXiv:2210.12186v1 [cs.CL] 21 Oct 2022
that an image-to-text radiology report generation
(RRG) system can be substantially improved
by replacing the widely used NLG metrics with
"factually-oriented" methods encouraging 1) the
generation of domain entities consistent with
the reference and 2) describing these entities in
inferentially consistent ways. So far, these new
methods rely on weakly-supervised approaches
(rule-based) to construct NLI models for radiology
reports and biomedical named entity recognition
systems (Zhang et al.,2021) that are not specific to
chest X-rays.
Despite these "factually-oriented" methods being
weakly supervised or being limited to generic
biomedical entities, their use showed substantial
improvements on a wide range of metrics and
board-certified radiologists’ evaluations. These
findings motivate us to propose a new method to
further improve the factual completeness and cor-
rectness of generated radiology reports. More pre-
cisely, we leverage RadGraph (Jain et al.,2021), a
dataset annotated by radiologists containing chest
X-ray radiology reports along with annotated en-
tities and relations. These annotations allow us to
create two semantic graphs, one for the generated
and one for the reference report. We then introduce
three simple rewards that score the differences be-
tween the two graphs in terms of entities and re-
lations. These rewards can be directly optimized
using Reinforcement Learning (RL) to further im-
prove the quality of the generated report by our
systems. By doing so, we show on two popular
chest X-ray datasets that our models are able to
maximize the defined rewards but also outperform
the previous works on various NLG and factually-
oriented metrics.
In summary our contributions are:
We propose a simple RRG architecture that 1)
is fast to train and suitable for a RL setup and
2) performs equally well as the previous and
more complex architectures proposed in the
literature.
We leverage the RadGraph dataset and the as-
sociated fine-tuned model to design semantic-
based rewards that qualitatively evaluate the
factual correctness and completeness of the
generated reports.
We show on two datasets that directly optimiz-
ing these rewards outperforms previous ap-
proaches that prioritize traditional NLG met-
rics.
The paper is structured as follows: first, we de-
scribe our factually-oriented graph-based rewards
2). More precisely, we begin by examining the
RadGraph dataset (§ 2.1) and how we leveraged
the annotations to create our rewards (§ 2.2). Then,
we explain the architecture of the model (§ 3) that
we used to generate reports and how we trained it
using negative log-likelihood (NLL) and RL. The
sections that follow afterwards are dedicated to
the datasets used for the experiments (§ 4) and
the metrics (§ 5) chosen to evaluate the genera-
tion of reports. This latter section is divided into
two groups: the classic NLG metrics (§ 5.1) and
the factually-oriented metrics (§ 5.2). Finally, we
present the results (§ 6) and end this paper with a
section addressing related works (§ 7).
2 Factually-oriented Graph-Based
Reward
In this section, we present a new semantic graph-
based reward, called the RadGraph reward, used
throughout our experiments. We first start by ex-
plaining in Section 2.1 the RadGraph dataset and
how we get the annotations that shape our reward.
In Section 2.2, we explain how we construct the
RadGraph reward and its different variants.
2.1 RadGraph
Figure 2: An example of a report annotated with enti-
ties and relations in the RadGraph dataset
RadGraph (Jain et al.,2021) is a dataset of en-
tities and relations in full-text chest X-ray radi-
ology reports based on a novel information ex-
traction schema designed to structure radiology
reports. The dataset contains board-certified radi-
ologist annotations of 500 radiology reports from
the MIMIC-CXR dataset (Johnson et al.,2019),
which correspond in total to 14,579 entities and
10,889 relations. In addition, RadGraph also in-
cludes a test dataset of 100 radiology reports, split
between two independent sets of board-certified
radiologist annotations on reports from MIMIC-
CXR and CheXpert (Smit et al.,2020) datasets (50
reports each).
Entities
An entity is defined as a continuous span
of text that can include one or more adjacent words.
Entities in RadGraph center around two concepts:
Anatomy and Observation. Three uncertainty levels
exist for Observation, leading to four different enti-
ties: Anatomy (ANAT-DP), Observation: Definitely
Present (OBS-DP), Observation: Uncertain (OBS-
U), and Observation: Definitely Absent (OBS-DA).
Anatomy refers to an anatomical body part that
occurs in a radiology report, such as a “lung”. Ob-
servation refers to words associated with visual
features, identifiable pathophysiologic processes,
or diagnostic disease classifications. As an exam-
ple, an Observation could be “effusion” or more
general phrases like “increased”.
Relations
A relation is defined as a directed
edge between two entities. Three levels exist:
Suggestive Of (., .),Located At (., .), and Modify
(., .).Suggestive Of (Observation, Observation)
is a relation between two Observation entities
indicating that the presence of the second Observa-
tion is inferred from that of the first Observation.
Located At (Observation, Anatomy) is a relation
between an Observation entity and an Anatomy
entity indicating that the Observation is related
to the Anatomy. While Located At often refers
to location, it can also be used to describe other
relations between an Observation and an Anatomy,
such as shape or color. Modify (Observation,
Observation) or Modify (Anatomy, Anatomy) is a
relation between two Observation entities or two
Anatomy entities indicating that the first entity
modifies the scope of, or quantifies the degree of,
the second entity.
The authors also released a PubMedBERT (Gu
et al.,2021) model fine-tuned on the RadGraph
dataset. We leverage this trained model to create
the annotations for the datasets used in our exper-
iments. We will refer to this model as RadGraph
model in what follows.
2.2 RadGraph reward
Using RadGraph annotation scheme and model,
we design F-score style rewards that measure
consistency and completeness of generated
radiology reports compared to reference reports.
Each of our rewards leverages the outputs of
Figure 3: Graph view of the RadGraph annotations for
the report in Figure 2.
the released fine-tuned PubMedBERT model
on RadGraph, namely the entities and the re-
lations, on both a generated report and its reference.
The RadGraph annotations of a report can be
represented as a graph
G(V, E)
with the set of
nodes
V={v1, v2, . . . , v|V|}
containing the en-
tities and the set of edges
E={e1, e2, . . . , e|E|}
the relations between pairs of entities. The
graph is directed, meaning that the edge
e= (v1, v2)6= (v2, v1)
. An example is depicted
in Figure 3. Each node or edge of the graph also
has a label, which we denote as
viL
for an entity
i
(for example "OBS-DP" or "ANAT") and
eijL
for a relation
e= (vi, vj)
(such as "modified" or
"located at"). We now proceed to describe three of
our rewards.
RGE
This reward focuses only on the nodes
V
. For the generated report
y
, we create a new
set of node-label pairs
¯
Vy={(vi, viL)}i[1..|V|]
comprising all entities and their corresponding
labels. We proceed to construct the same set for
the reference report ˆyand denote this set ¯
Vˆy.
RGER
This reward focuses on the nodes
V
and
whether or not a node has a relation in
E
. For the
generated report
y
, we create a new set of triplets
¯
Vy={(vi, viL, i)}i[1..|V|]
. The value of
i
is
1
if
vi
has a relation in
E
else 0. We proceed to
construct the same set for the reference report
ˆy
and denote this set ¯
Vˆy.
RGER
This reward focuses on the nodes
V
and their relations in
E
. For the gener-
摘要:

ImprovingtheFactualCorrectnessofRadiologyReportGenerationwithSemanticRewardsJean-BenoitDelbrouck,PierreChambon,ChristianBluethgen,EmilyTsai,OmarAlmusa,CurtisP.LanglotzStanfordUniversityjbdel@stanford.eduAbstractNeuralimage-to-textradiologyreportgenera-tionsystemsofferthepotentialtoimprovera-diologyr...

展开>> 收起<<
Improving the Factual Correctness of Radiology Report Generation with Semantic Rewards Jean-Benoit Delbrouck Pierre Chambon Christian Bluethgen.pdf

共13页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:13 页 大小:1.07MB 格式:PDF 时间:2025-05-08

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 13
客服
关注