
Inability of a graph neural network heuristic to outperform greedy algorithms in
solving combinatorial optimization problems like Max-Cut
Stefan Boettcher
Department of Physics, Emory University, Atlanta, GA 30322; USA
Matters Arising from Martin J. A.
Schuetz et al. Nature Machine Intelligence
https://doi.org/10.1038/s42256-022-00468-6 (2022).
In Ref. [1], Schuetz et al provide a scheme to employ
graph neural networks (GNN) as a heuristic to solve a
variety of classical, NP-hard combinatorial optimization
problems. It describes how the network is trained on
sample instances and the resulting GNN heuristic is eval-
uated applying widely used techniques to determine its
ability to succeed. Clearly, the idea of harnessing the
powerful abilities of such networks to “learn” the intrica-
cies of complex, multimodal energy landscapes in such a
hands-off approach seems enticing. And based on the ob-
served performance, the heuristic promises to be highly
scalable, with a computational cost linear in the input
size n, although there is likely a significant overhead in
the pre-factor due to the GNN itself. However, closer in-
spection shows that the reported results for this GNN are
only minutely better than those for gradient descent and
get outperformed by a greedy algorithm, for example, for
Max-Cut. The discussion also highlights what I believe
are some common misconceptions in the evaluations of
heuristics.
Among a variety of QUBO problems Ref. [1] consider
in their numerical evaluation of their GNN, I want to
focus the discussion here on Max-Cut. As explained in
the context of Eq. (7), it is derived from an Ising spin-
glass Hamiltonian on a d-regular random graph [2] for
d= 3. (In the physics literature, for historical reason
such a graph is often referred to as a Bethe-lattice [3, 4].)
Minimizing the energy of the Hamiltonian, H, maximizes
the cut-size cut =−H. The cut results for the GNN (for
both, d= 3 and 5) are presented in Fig. 4 of Ref. [1],
where they find cut ∼γ3nwith γ3≈1.28 via an asymp-
totic fit to the GNN data obtained from averaging over
randomly generated instances of the problem for a pro-
gression of different problem sizes n. In Fig. 1(a) here,
I have recreated their Fig. 4, based on the value of γ3
reported for GNN (blue line). Like in Ref. [1], I have also
included what they describe as a rigorous upper bound,
cutub (black-dashed line), which derives from an exact
result obtained when d=∞[5]. While the GNN results
appear impressively close to that upper bound, however,
including two other sets of data puts these results in a
different perspective. The first set I obtained at signif-
icant computational cost (∼n3) with another heuristic
(“extremal optimization”, EO) long ago in Ref. [4] (black
circles). The second set is achieved by a simple gradient
descent (GD, maroon squares). GD sequentially looks
at randomly selected (Boolean) variables xiamong those
102103104105
number of nodes n
102
103
104
cut size
cutub
EO
GNN
GD
(a)
00.0005 0.001 0.0015
1/n
-0.76
-0.74
-0.72
-0.70
-0.68
-0.66
-0.64
-0.62
-0.60
<e3>/31/2
Gradient Descent
GNN
GraphSAGE
Greedy Search
EO-Heuristic
1-RSB
-P*(Parisi Energy)
(b)
Figure 1. Results discussed in the text for various heuristics
and bounds for the Max-Cut problem on a 3-regular random
graph ensemble, (a) plotted for the raw cut-size as a function
of problem size n, and (b) as an extrapolation plot according
to Eq. (1). Note that in (b), a fit (red-dashed line) to the
EO-data (circles) suggests a non-linear asymptotic correction
with ∼1/n2/3[4].
whose flip (xi7→ ¬xi) will improve the cost function.
(Such “unstable” variables are easy to track.) After only
∼0.4nsuch flips, typically no further improvements were
possible and GD converged; very scalable and fast (done
overnight on a laptop, averaging over 103−105instances
at each n, up to n= 105). Presented in the form of
Fig. 1(a), the results all look rather good, although it is
already noticeable that results for GD are barely distin-
guishable from those of the elaborate GNN heuristic.
To discern further details, it is essential to present
the data in a form that, at least, eliminates some of
its trivial aspects. For example, as Schuetz et al ref-
erence themselves, the ratio cut/n ∼γconverges to a
stable limit with γ∼d/4 + P∗pd/4 + O(√d) + o(n0)
for n, d → ∞ [6], where P∗= 0.7632 . . . [5]. In fact,
for better comparison with Refs. [3, 4], we focus on the
average ground-state energy density of the Hamiltonian
in their Eq. (7) at n=∞, which is related to γvia
arXiv:2210.00623v1 [cond-mat.dis-nn] 2 Oct 2022