is a dot product of the basic block embedding vector with a
vector of learned weights.
While Ithemal demonstrates a promising path forward
for performance estimation of basic blocks, its input data
format presents the instructions to the model linearly, as
they are laid out in memory, and it relies on the model and
the training process to discover dependencies between the
instructions on it own. Since these dependencies are well
defined and easy to extract using existing tools, we suggest
including them in the basic block representation explicitly, to
guide the computation of the model. This work wields graphs
as a natural and intuitive way to represent basic blocks and
the underlying dependencies, expecting that a graph neural
network model will be able to benefit from the additional
information and produce more precise throughput estimates.
2.3. Graph Neural Network
The family of graph neural networks (GNNs) [24–28]
has shown to be effective in a diverse range of applications
and domains [29–32]. Generally, GNNs yield promising
results in applications with highly structured inputs where
the relationships between elements of the input can be easily
expressed using a graph. The main objective of a GNN
is to learn to map the information structured as a graph
into an embedding space (a vector representation). In a
nutshell, the learning process of a GNN model consists of
propagating information between graph nodes and edges
via multiple message passing iterations, followed by an
aggregation step. At each message passing iteration, the
node and edge embeddings are updated according to received
messages from their neighbors in the graph. The final learned
embeddings are then employed in downstream tasks such as
regression, classification, and ranking.
3. GRANITE Model Architecture
The GRANITE model is composed of the following
building blocks:
•Graph encoding of basic blocks →
The first step in
GRANITE is to transform basic blocks into a graph
representation and convert the instructions and basic
block dependencies into node and edge labels according
to their types. The constructed graph representation for
basic blocks is used as input to the graph neural network.
•Graph neural network →
Next, GRANITE uses a
GNN model with the objective to learn an expressive
embedding for each basic block. As part of the training
process, the GNN model iteratively exchanges relevant
information between basic block elements with the
objective of computing the embedding vectors.
•Decoder network →
Each instruction embedding vec-
tor passes through an additional decoder network with
non-linearity that computes the contribution of the
instruction to the basic block throughput. GRANITE
predicts the final throughput values for each basic block
by adding all individual instructions’ contributions to
the overall throughput.
•Multi-task decoder network →
The multi-task version
of GRANITE uses a multi-task decoder that predicts the
throughput values across multiple microarchitectures
simultaneously. Other parts of the model are shared
across all target microarchitectures. Intuitively, the task
of the shared parts is to learn an internal representation
of basic block structure, while the decoder networks
are responsible for throughput estimation.
3.1. Graph Encoding of Basic Blocks
We model each basic block as a dependency graph
inspired by [20], but using a more compact format. The
GRANITE graph is designed to capture the semantic relation-
ships between instructions as well as the type and category of
instructions and registers. The nodes of the graph consist of
a set of instruction and value nodes (e.g. values in registers,
immediate values, etc.), whereas the edges indicate data and
structural dependencies between the instructions and values
represented by the nodes. Figure 1shows an example basic
block in the GRANITE encoding.
Each node of the graph corresponds to an element of
the assembly language, similar to one token in the Ithemal
model [19]. Broadly, we can categorize node types into two
groups: instruction nodes that represent instructions, and
value nodes that model the input and output values passed
between instructions. Table 2summarizes the node types
in GRANITE graph representation and assembly language
tokens that can be associated with them. We represent each
assembly instruction by a unique instruction mnemonic node.
Infrequently, an assembly instruction may have prefixes
that modify their behavior, such as “LOCK” or “REP”.
We represent each prefix by a separate graph node that
is connected to the instruction mnemonic node by an edge.
Each instruction node is connected to zero or more value
nodes representing the instruction operands. The operands
are values stored in registers or memory, immediate values,
and results of address computation. Each value node has zero
or one incoming edge from the instruction mnemonic node of
the instruction that produces it (no incoming edge means that
the value is not produced by an instruction of the block), and
zero or more outgoing edges to instructions that consume the
value. These edges represent the data dependencies between
instructions. The token associated with a value node is the
name of a register if the value is stored in a register, or
a special token if the value is stored in memory, it is an
immediate value, or it is the result of an address computation.
Note that the nodes represent a value in a storage location,
not the storage location itself and the graph may contain
multiple value nodes with a given register name, if multiple
instructions in the block write to this register. For example, in
Figure 1, register “RAX” is a destination operand for “MOV”
instruction and is used as a source operand to calculate the
memory address for “ADD” instruction. In the same example,
you can see two different “Memory” nodes; one is used as
an input operand, the other as an output operand. Since the
value written by the “ADD” instruction maybe different from
the value it reads, they are represented as two distinct nodes.