
approach, which embodies linguistic insights to convert data to text by applying a series
of sequential steps.
The emergence of neural-based NLG systems in the recent years has changed the
field: provided there is enough labeled data for training a machine learning model, learn-
ing a direct mapping from structured input to textual output has become reality [Li 2017].
This has led to the recent development of deep learning end-to-end models, which directly
learn input-output mappings and rely far less on explicit intermediary representations and
linguistic insights.
Even though it is technically feasible to use neural end-to-end methods in real
world applications, this does not necessarily mean that they are superior to rule-based
approaches in every scenario. Recent empirical studies have demonstrated that a combi-
nation of template and pipeline systems produce texts that are more appropriate than the
neural-based approaches, which frequently hallucinate content unsupported by the seman-
tic input [Ferreira et al. 2019]. For the particular task of automated journalism, reporting
inaccurate data would seriously undermine a robot’s credibility and could have serious
implications on sensitive domains, such as environmental reports. A modular model also
has the advantage of allowing for auditing, while neural end-to-end approaches behave as
black-boxes [Campos et al. 2020].
In this paper, we compare the three most frequently used architectures for auto-
mated journalism – template-based, pipeline-based and end-to-end neural models – us-
ing a common domain, the Blue Amazon. With an offshore area of 3.6 million square
kilometers along the Brazilian coast, the Blue Amazon is Brazil’s exclusive economic
zone (EEZ); it is a oceanic region brimming with marine species and energy resources
[Thompson and Muggah 2015]. Ocean monitoring, climate change and environmental
sustainability are promising fields for automated journalism applications. The oceans are
severely damaged environments, and if current trends continues, there will be disastrous
consequences for the planet as it is essential to halt climate change, fostering economic
growth and preserving biodiversity [e Costa et al. 2022]. Although connecting with pub-
lic audiences in an approachable way typically requires coverage by trained human jour-
nalists, accurate and low latency information reports can be very helpful. There is a vast
and ever-growing body of information about the oceans; clearly, society can benefit from
a robot journalism system. To address this issue, we created our robot journalism applica-
tion which combines different NLG approaches to generate daily reports about the Blue
Amazon and publish them on Twitter. 2
A corpus of verbalizations of non-linguistic data in Brazilian Portuguese was cre-
ated based on syntactical and lexical patterning abstracted from data collected from pub-
licly available sources. Intermediate representations were annotated for each entry in
order to develop our corpus. A combination of automatic and human evaluation together
with a qualitative analysis was then carried out to measure the fluency, semantics and
lexical variety of the generated texts.
This main contributions of this work are the construction of a publicly avail-
able Brazilian Portuguese NLG dataset, a comparison between the three most frequently
used automated journalism architectures and an application which combines different ap-
2https://twitter.com/BLAB_Reporter