
Multi-Objective GFlowNets
4. Related Work
Evolutionary Algorithms (EA) Traditionally, evolution-
ary algorithms such as NSGA-II have been widely used
in various multi-objective optimization problems (Ehrgott,
2005;Konak et al.,2006;Blank & Deb,2020). More re-
cently, Miret et al. (2022) incorporated graph neural net-
works into evolutionary algorithms enabling them to tackle
large combinatorial spaces. Unlike MOGFNs, evolutionary
algorithms are required to solve each instance of a MOO
from scratch rather than by amortizing computation during
training in order to quickly generate solutions at run-time.
Evolutionary algorithms, however, can be augmented with
MOGFNs for generating mutations to improve efficiency,
as in Section 3.2.
Multi-Objective Reinforcement Learning MOO prob-
lems have also received significant interest in the RL liter-
ature (Hayes et al.,2022). Traditional approaches broadly
consist of learning sets of Pareto-dominant policies (Roijers
et al.,2013;Van Moffaert & Now
´
e,2014;Reymond et al.,
2022). Recent work has focused on extending Deep RL
algorithms for multi-objective settings, e.g., with Envelope-
MOQ (Yang et al.,2019), MO-MPO (Abdolmaleki et al.,
2020;2021) , and MOReinforce (Lin et al.,2021). A gen-
eral shortcoming of RL-based approaches is their objective
focuses on discovering a single mode of the reward func-
tion, and thus hardly generate diverse candidates, an issue
that also persists in the multi-objective setting. In contrast,
MOGFNs sample candidates proportional to the reward,
implicitly resulting in diverse candidates.
Multi-Objective Bayesian Optimization (MOBO)
Bayesian optimization (BO) has been used in the context
of MOO when the objectives are expensive to evaluate
and sample efficiency is a key consideration. MOBO
approaches consist of learning a surrogate model of
the true objective functions, which is used to define
an acquisition function such as expected hypervolume
improvement (Emmerich et al.,2011;Daulton et al.,2020;
2021) and max-value entropy search (Belakaria et al.,2019),
as well as scalarization-based approaches (Paria et al.,2020;
Zhang & Golovin,2020). Abdolshah et al. (2019) and
Lin et al. (2022) study the MOBO problem in the setting
with preferences over the different objectives. Stanton et al.
(2022) proposed LaMBO, which uses language models in
conjunction with BO for multi-objective sequence design
problems. While recent work (Konakovic Lukovic et al.,
2020;Maus et al.,2022) studies the problem of generating
diverse candidates in the context of MOBO, it is limited
to local optimization near Pareto-optimal candidates in
low-dimensional continuous problems. As such, the key
drawbacks of MOBO approaches are that they typically do
not consider the need for diversity in generated candidates
and that they mainly consider continuous low-dimensional
state spaces. As we discuss in Section 3.2, MOBO
approaches can be augmented with GFlowNets for diverse
candidate generation in discrete spaces.
Other Approaches Zhao et al. (2022) introduced LaMOO
which tackles the MOO problem by iteratively splitting
the candidate space into smaller regions, whereas Daulton
et al. (2022) introduce MORBO, which performs BO in
parallel on multiple local regions of the candidate space.
Both these methods, however, are limited to continuous
candidate spaces.
5. Empirical Results
In this section, we present our empirical findings across
a wide range of tasks ranging from sequence design to
molecule generation. Through our experiments, we aim
to answer the following questions:
Q1
Can MOGFNs model the preference-conditional re-
ward distribution?
Q2 Can MOGFNs sample Pareto-optimal candidates?
Q3 Are candidates sampled by MOGFNs diverse?
Q4
Do MOGFNs scale to high-dimensional problems rele-
vant in practice?
We obtain positive experimental evidence for Q1-Q4.
Metrics: We rely on standard MOO metrics such as the
Hypervolume (HV) and
R2
indicators, as well as the Gen-
erational Distance+ (GD+). To measure diversity we use
the Top-K Diversity and Top-K Reward metrics of Bengio
et al. (2021a). We detail all metrics in Appendix C. For
all our empirical evaluations we follow the same protocol.
First, we sample a set of preferences which are fixed for all
the methods. For each preference we sample
128
candidates
from which we pick the top
10
, compute their scalarized
reward and diversity, and report the averages over prefer-
ences. We then use these samples to compute the HV and
R2
indicators. We pick the best hyperparameters for all
methods based on the HV and report the mean and standard
deviation over 3seeds for all quantities.
Baselines: We consider the closely related MORein-
force (Lin et al.,2021) as a baseline. We also study
its variants MOSoftQL and MOA2C which use Soft Q-
Learning (Haarnoja et al.,2017) and A2C (Mnih et al.,
2016) in place of REINFORCE. We additionally compare
against Envelope-MOQ (Yang et al.,2019), another pop-
ular multi-objective reinforcement learning method. For
fragment-based molecule generation we consider an addi-
tional baseline, MARS (Xie et al.,2021), a relevant MCMC
approach for this task. Notably, we do not consider base-
lines like LaMOO (Zhao et al.,2022) and MORBO (Daulton
5