
CHARD:Clinical Health-Aware Reasoning Across Dimensions for Text
Generation Models
Steven Y. Feng1∗
, Vivek Khetan2, Bogdan Sacaleanu2, Anatole Gershman3, Eduard Hovy3
1Stanford University, 2Accenture Labs, SF, 3Carnegie Mellon University
syfeng@stanford.edu
{vivek.a.khetan,bogdan.e.sacaleanu}@accenture.com
{anatoleg,hovy}@cs.cmu.edu
Abstract
We motivate and introduce CHARD:Clinical
Health-Aware Reasoning across Dimensions,
to investigate the capability of text genera-
tion models to act as implicit clinical knowl-
edge bases and generate free-flow textual ex-
planations about various health-related condi-
tions across several dimensions. We collect
and present an associated dataset, CHARDat,
consisting of explanations about 52 health con-
ditions across three clinical dimensions. We
conduct extensive experiments using BART
and T5 along with data augmentation, and per-
form automatic, human, and qualitative analy-
ses. We show that while our models can per-
form decently, CHARD is very challenging
with strong potential for further exploration.
1 Introduction
Pretrained language models (PLM) have seen in-
creasing popularity for NLP tasks and applications,
including text generation. Researchers have be-
come interested in the extent to which PLMs can:
1) act as knowledge bases, 2) reason like humans.
Rather than using external databases, exposure
to large amounts of data during training combined
with their large number of parameters, has given
PLMs the ability to store knowledge that can be
extracted through effective probing strategies such
as text infilling (Donahue et al.,2020), prompt-
ing (Liu et al.,2021), and QA (Jiang et al.,2021).
PLMs imitate a more high-level information store,
allowing for greater abstractness, flexibility, and
generalizability. They are also able to better exploit
contextual information than simple retrieval.
Studies have also shown that as PLMs scale
up, they have have emergent abilities (Wei et al.,
2022a), including reasoning. There has been in-
creasing attention on their commonsense reasoning
through works like COMET (Bosselut et al.,2019).
However, studies show that even large PLMs strug-
∗Work done while at CMU.
Template Full Text with Explanation
A person with Costochondri-
tis has a/an exercise risk fac-
tor because/since/as {expla-
nation}
A person with Costochondritis has an
exercise risk factor because costochon-
dritis can be aggravated by any activity
that places stress on your chest area.
A person with gout has a/an
lose weight prevention be-
cause/since/as {explanation}
A person with gout has a lose weight
prevention because losing weight can
lower uric acid levels in your body and
significantly reduce the chance of gout
attacks.
A person with rheumatoid
has a/an therapy treatment be-
cause/since/as {explanation}
A person with rheumatoid has a therapy
treatment because physiotherapy helps
rheumatoid patients with pain control,
reducing inflammation and joint stiff-
ness and to return to the normal activ-
ities of daily living or sports.
Table 1:
Examples of
CHARD
templates with explanations
(from
CHARDat
). The human was asked to write the entire
output text (not just the explanation) by infilling the template.
gle with commonsense tasks that humans can rea-
son through very easily (Talmor et al.,2020). There
are works that investigate more complicated reason-
ing tasks, e.g. arithmetic and symbolic reasoning
(Wei et al.,2022b). PLMs inherently have some
extent of reasoning capability, and many more com-
plex reasoning tasks are easier to carry out over
abstract PLM embedding space.
In this paper, we are interested in the intersection
of these areas. Can PLMs act as knowledge bases
and also reliably reason using their own knowl-
edge? We investigate whether PLMs can learn and
reason through health-related knowledge. Work on
generation-based reasoning for health has been lim-
ited, with most prior work exploring retrieval-based
methods. Generation-based reasoning is more diffi-
cult, as such a specialized domain contains esoteric
information not prevalent in the PLM’s training
data, and involves a higher degree of specialized
reasoning to handle domain-specific problems.
Healthcare is an important domain that deals
with human lives. It is a large application area
for machine learning and NLP. The need for au-
tomation in healthcare rises, as countless studies
show that healthcare workers are overworked and
burned out, especially recently due to the COVID-
19 pandemic (Portoghese et al.,2014;Brophy et al.,
Code: https://github.com/styfeng/CHARD
arXiv:2210.04191v2 [cs.CL] 13 Feb 2023