
BLAB Reporter: Automated journalism covering the Blue Amazon
Yan Vianna Sym
Escola Politécnica
Universidade de São Paulo
São Paulo, Brazil
yan.sym@usp.br
João Gabriel Moura Campos
Escola Politécnica
Universidade de São Paulo
São Paulo, Brazil
joaogcampos@usp.br
Fabio Gagliardi Cozman
Escola Politécnica
Universidade de São Paulo
São Paulo, Brazil
fgcozman@usp.br
Abstract
This demo paper introduces the BLAB Re-
porter, a robot-journalist covering the Brazil-
ian Blue Amazon. The Reporter is based
on a pipeline architecture for Natural Lan-
guage Generation; it offers daily reports,
news summaries and curious facts in Brazil-
ian Portuguese. By collecting, storing and
analysing structured data from publicly avail-
able sources, the robot-journalist uses domain
knowledge to generate and publish texts in
Twitter. Code and corpus are publicly avail-
able 1.
1 Introduction
Data-to-text Natural Language Generation (NLG)
is the computational process of generating mean-
ingful and coherent natural text or speech to de-
scribe non-linguistic input data (Reiter and Dale,
2000). Successful examples of data-to-text systems
can be found in both academia and industry, with
applications in weather forecasting (Belz,2008),
image captions and chatbots (Adamopoulou and
Moussiades,2020). Amongst NLG applications,
robot-journalism is one of the most prominent en-
deavors thanks to the high volume of structured
data streams available, which enables automated
systems to report recurrent information with high-
fidelity and lexical variety (Teixeira et al.,2020).
An interesting domain for data-to-text genera-
tion is ocean monitoring. For instance, global at-
tention was drawn in 2021 to a container ship that
obstructed the Suez Canal for six consecutive days,
causing a global shortage of essential commodities,
including medical supplies and medicines during
the coronavirus pandemic. Accurate and low la-
tency information reports can be very helpful in
these situations, but communicating to general au-
diences usually demands coverage by specialized
human journalists. To address this issue, we present
1https://github.com/C4AI/blab-reporter
our robot-journalist named BLAB Reporter, a NLG
system based on a pipeline architecture that gen-
erates daily reports, news, content summarization
and curious facts about the Blue Amazon and pub-
lishes them on Twitter in Brazilian Portuguese
2
.
The Blue Amazon is the exclusive economic zone
(EEZ) of Brazil, with an offshore area of 3.6 mil-
lion square kilometers along the Brazilian coast,
an area rich in marine biodiversity and energy re-
sources (Wiesebron,2013). The BLue Amazon
Brain (BLAB) is a project aiming to address com-
plex questions about the marine ecosystem, and
integrates a number of services aimed at dissemi-
nating information about the Blue Amazon.
2 System overview
Our system follows a pipeline architecture that con-
verts non-linguistic data into text in 6 steps: Con-
tent Selection, Discourse Ordering, Text Structur-
ing, Lexicalization, Referring Expression Genera-
tion and Textual Realization (Ferreira et al.,2019).
Our system also comprises two additional steps:
Data Acquisition (for extracting and storing infor-
mation from multiple data streams in a structured
format) and Summarization (for summarizing news
in the form of small consecutive tweets). This kind
of architecture, depicted in Figure 1, allows for
trustworthy output as well as easy access to and
maintenance of sub-modules.
The grammar used by the model was built by
first running the content selection step in previ-
ous data and generating 30 non-linguistic reports.
These non-linguistic reports were then manually
verbalized and the input and output representations
for each pipeline module were manually annotated.
When deployed, each module draws on the se-
lected combination of templates using rule-based
approaches. Because we deal with a sensitive do-
main, we opted to use the pipeline architecture
2https://twitter.com/BLAB_Reporter
arXiv:2210.06431v1 [cs.CL] 8 Oct 2022