2
applied, these mesh-based graph models can operate very large graphs; however, graph reduction
is important for efficiency and can promote learning [14]. While physical problems with short-
(e.g interface) and long-range (e.g. elastic) interactions are ubiquitous in engineering and materials
science, treating the longer scales with convolutions on the discretization graph can be inefficient
or ineffective.
Treating aggregated data via graph pooling based on data clustering is a long-standing field of
research with broad applications, e.g. image classification [15], mesh decomposition [16], and chem-
ical structure [17]. Since clustered data rarely has a regular topology, graph based networks are
natural representations. In particular, compared to a CNN operating on a pixelized image, a GCNN
operating on an unstructured mesh has less well-defined spatial locality and inherits the varying
neighborhood size of the source data. Akin to topologically localized convolutions supplanting
spectral-based convolutions [7, 18] and spectral pre-processing [19] in the literature, spectral clus-
tering based on the primary eigenvectors of the graph Laplacian has been largely superseded by
less expensive, more easily differentiable techniques, some of which connect to spectral clustering.
Dhillon et al. [20] showed the equivalence of spectral clustering and kernel k-means clustering
which allowed expensive eigenvalue problems to be reframed in terms of trace maximization objec-
tives. These objectives include maximizing in-cluster links/edges and minimizing cuts, i.e. number
of links, between any cluster and remainder of the graph. Typically these objectives are normalized
relative to cluster size (number of nodes) or degree (sum of non-self connections/adjacency) but are
ambivalent to the data on the graph. More recently, graph-based neural nets, such as DiffPool [21]
and MinCutPool [14], have been designed to take into account the data on the graph in soft clus-
tering, trainable pooling operations. Ying et al. [21] developed DiffPool to enable a hierarchical
treatment of graph structure. DiffPool uses a GNN for the ultimate classification task and another
with a softmax output for the intermediary pooling task. The separate GNNs learn a soft, in the
sense not binary and disjoint, assignment of nodes in the input graph to those in a smaller graph
as well as derivative features on the smaller embedded graph. Due to non-convexity of the clus-
tering objective, an auxiliary entropy loss is used to regularize the training. Bianchi, Grattarola
and Alippi [14] developed MinCutPool based on a degree normalized objective of minimizing edges
between any cluster and the remainder of the graph. They relaxed the objective of finding a binary,
disjoint cluster assignment to reduce the computational complexity by recasting the problem to a
continuous analog. Ultimately they proposed a network similar to DiffPool albeit with the softmax
assignment GNN replaced by a softmax multilayer perceptron (MLP) and different loss augmenta-
tion designed to promote orthogonality of the clustering matrix mapping graph nodes to clusters.
Grattarola et al. [22] also generalized the myriad approaches to pooling and clustering on graphs
with their select-reduced-connect abstraction of these operations.
Graph convolutional neural networks can be particularly opaque in how they achieve accurate
models. To interpret what convolution networks are learning through their representations, filter
visualization and activation [23–25], saliency maps [26], sensitivity, attention based and related
techniques [27, 28] have been developed. These techniques have been largely applied to the ubiqui-
tous commercial classification tasks, for instance Yosinski et al. [25] demonstrated how activations
at the deepest layers of a CNN correspond to obvious features in the original image used in clas-
sification, e.g. faces. Some of these techniques have enabled a degree of filter interpretation by
translating the filter output so that the learned features are obvious by inspection. For pixel-based
CNNs deconvolution and inversion of the input-output map [24], as well as guided/regularized op-
timization [25] have produces some insights. Some other methods rely on clustering, for example,
Local Interpretable Model-agnostic Explanations (LIME) [29] is based on segmentation and pertur-
bation of the cluster values. In general, clustering simplifies the input-output map created by the