developing remotely accessible microservices.
On top of Margo, HEPnOS uses the Yokan microser-
vice [20] to provide key/value storage capabilities and the
Bedrock microservice [21] to provide bootstrapping and con-
figuration capabilities.
HEPnOS stores HEP data in the form of a hierarchy of
datasets,runs,subruns,events, and products, the products
carrying most of the payload in the form of serialized C++
objects. These constructs are mapped onto a flat key/value
namespace in a distributed set of Yokan databases instances.
For more technical details on how this mapping is done, we
refer the reader to HEPnOS’ extensive online documentation.1
Of interest to the present work is the fact that some
configuration parameters of HEPnOS are critical to its per-
formance, including its number of database instances, how
these databases map to threads, how it schedules its operations,
down to low-level decisions such as whether to use blocking
epoll or busy spinning for network progress.
Thanks to the Mochi Bedrock component, which provides
configuration and bootstrapping capabilities to Mochi services,
all these parameters can easily be provided from a single JSON
file that describes which components form the service and
how they should be configured. This extensive configurability
is critical to the work presented in this paper, and is what
distinguishes a storage service such as HEPnOS from more
traditional storage systems such as a parallel file system.
B. NOvA event-selection workflow
As shown in Figure 1, HEPnOS is used as a distributed, in-
memory storage system for HEP workflows. In this work we
focus on the NOvA event-selection workflow, which consists
of two steps: data loading, and parallel event processing. A
set of HDF5 files containing tables of event data is read by
a parallel application, the data loader, which converts them
into arrays of C++ objects that are then stored in HEPnOS as
products associated with events. In the event-selection step, all
the events contained in a given dataset are read and processed
in parallel to search for events matching specific criteria.
1) Data loading
In practice, the event selection workflow does not operate
directly on data as it is produced by the particle accelerator.
The raw data is first stored into files, either in HDF5 [22]
or in ROOT [23] format. While these files can be shared
across institutions easily, they produce an I/O bottleneck when
it comes to reading them from a large number of processes.
Hence in HEPnOS-based workflows they need to be loaded
into HEPnOS prior to their data being processed.
The dataloader is in charge of this task. It is a parallel, MPI-
based application that takes a list of HDF5 files, converts them
into C++ objects, and stores them into HEPnOS. Since the
amount of data differs across HDF5 files, the dataloader does
not distribute the work in a static manner across its processes.
Instead, a list of files is maintained in one process, and all the
processes pull work from this shared list of files until all the
files have been loaded.
1https://hepnos.readthedocs.io
Several optimizations are available in the dataloader, includ-
ing batching of events and products (the mapping of events
and products to HDF5 files on the one hand and to databases
in HEPnOS on the other hand is such that all the events
coming from the same file will end up in the same database,
and similarly for products) and overlapping the loading of
a file with the storage of data from a previous file into
HEPnOS. These optimizations can be turned on or off and
configured in various ways. Along with job-related parameters
(number processes, number of threads, mapping to CPUs), the
dataloader offers many configuration parameters that can be
tuned to achieve good performance.
2) Parallel event processing
The second step of the workflow, parallel event processing
(PEP), consists of reading the events and some products
associated with them, and performing some computation on
the data to determine events of interest. If events are stored
in Ndatabases in HEPnOS, Nprocesses of the PEP appli-
cation will list them, each accessing one database. They will
end up filling a local list of events (<dataset id, run
number, subrun number, event number> tuples).
All the processes pull events from either their local queue
or by requesting batches from other processes. Each event is
processed first by loading the data products associated with it,
then by performing computation on these products.
The PEP application provides a benchmark of its I/O part
(loading events and products) that simulates computation. We
use this benchmark in place of the real PEP application in
this paper since we are interested in autotuning only the I/O
aspects of this workflow.
Just like HEPnOS and the data loader, optimizations are
in place to improve I/O performance in the PEP application
and benchmark: look-ahead prefetching when reading from
HEPnOS, batching of events when they are loaded and when
they are sent from one process to another, batching of data
products, and multithreaded processing of events inside each
process. All these optimizations come with their own set of
tunable parameters that can influence the overall performance
of the workflow.
C. Challenges of (auto)tuning the workflow
While the parameters of a single workflow component
cannot be tuned independently from one another, they also
cannot be tuned independently from the parameters of other
components. As an example, what could seem like an optimal
number of threads in the PEP application could in turn
influence the optimal number of databases in HEPnOS, which
could influence the batch size used by the dataloader when
storing events into HEPnOS. Manually tuning such a workflow
becomes rapidly intractable, in particular as new optimizations
(hence new parameters) are implemented, new steps are added
to the workflow when the workflow scales up, or when it
is ported to a new platform. This situation motivated us to
investigate ways of automatically tuning such a workflow
using parameter space exploration and machine learning.
Parameter-space exploration enables defining the list of