2
power, both in embedded applications and in data centers.
This paper is an update to IEEE-HPEC papers from the past
three years [9]–[11]. As in past years, this paper continues
with last year’s focus on accelerators and processors that are
geared toward deep neural networks (DNNs) and convolutional
neural networks (CNNs) as they are quite computationally in-
tense [12]. This survey focuses on accelerators and processors
for inference for a variety of reasons including that defense
and national security AI/ML edge applications rely heavily on
inference. And we will consider all of the numerical precision
types that an accelerator supports, but for most of them, their
best inference performance is in int8 or fp16/bf16 (IEEE 16-
bit floating point or Google’s 16-bit brain float).
There are many surveys [13]–[24] and other papers that
cover various aspects of AI accelerators. For instance, the first
paper in this multi-year survey included the peak performance
of FPGAs for certain AI models; however, several of the
aforementioned surveys cover FPGAs in depth so they are
no longer included in this survey. This multi-year survey
effort and this paper focus on gathering a comprehensive list
of AI accelerators with their computational capability, power
efficiency, and ultimately the computational effectiveness of
utilizing accelerators in embedded and data center applica-
tions. Along with this focus, this paper mainly compares
neural network accelerators that are useful for government
and industrial sensor and data processing applications. A few
accelerators and processors that were included in previous
years’ papers have been left out of this year’s survey. They
have been dropped because they have been surpassed by
new accelerators from the same company, they are no longer
offered, or they are no longer relevant to the topic.
II. SURVEY OF PROCESSORS
Many recent advances in AI can be at least partly cred-
ited to advances in computing hardware [6], [7], [25], [26],
enabling computationally heavy machine-learning algorithms
and in particular DNNs. This survey gathers performance and
power information from publicly available materials including
research papers, technical trade press, company benchmarks,
etc. While there are ways to access information from com-
panies and startups (including those in their silent period),
this information is intentionally left out of this survey; such
data will be included in this survey when it becomes publicly
available. The key metrics of this public data are plotted in
Figure 2, which graphs recent processor capabilities (as of July
2022) mapping peak performance vs. power consumption. The
dash-dotted box depicts the very dense region that is zoomed
in and plotted in Figure 3.
The x-axis indicates peak power, and the y-axis indicate
peak giga-operations per second (GOps/s), both on a loga-
rithmic scale. The computational precision of the processing
capability is depicted by the geometric shape used; the com-
putational precision spans from analog and single-bit int1 to
four-byte int32 and two-byte fp16 to eight-byte fp64. The
precisions that show two types denotes the precision of the
multiplication operations on the left and the precision of
the accumulate/addition operations on the right (for example,
fp16.32 corresponds to fp16 for multiplication and fp32 for
accumulate/add). The form factor is depicted by color, which
shows the package for which peak power is reported. Blue
corresponds to a single chip; orange corresponds to a card; and
green corresponds to entire systems (single node desktop and
server systems). This survey is limited to single motherboard,
single memory-space systems. Finally, the hollow geometric
objects are peak performance for inference-only accelerators,
while the solid geometric figures are performance for acceler-
ators that are designed to perform both training and inference.
The survey begins with the same scatter plot that we have
compiled for the past three years. As we did last year, to save
space, we have summarized some of the important metadata
of the accelerators, cards, and systems in Table I, including
the label used in Figure 2 for each of the points on the
graph; many of the points were brought forward from last
year’s plot, and some details of those entries are in [9].
There are several additions which we will cover below. In
Table I, most of the columns and entries are self explana-
tory. However, there are two Technology entries that may
not be: dataflow and PIM. Dataflow processors are custom-
designed processors for neural network inference and training.
Since neural network training and inference computations can
be entirely deterministically laid out, they are amenable to
dataflow processing in which computations, memory accesses,
and inter-ALU communications actions are explicitly/statically
programmed or “placed-and-routed” onto the computational
hardware. Processor in memory (PIM) accelerators integrate
processing elements with memory technology. Among such
PIM accelerators are those based on an analog computing
technology that augments flash memory circuits with in-place
analog multiply-add capabilities. Please refer to the references
for the Mythic and Gyrfalcon accelerators for more details on
this innovative technology.
Finally, a reasonable categorization of accelerators follows
their intended application, and the five categories are shown
as ellipses on the graph, which roughly correspond to perfor-
mance and power consumption: Very Low Power for speech
processing, very small sensors, etc.; Embedded for cameras,
small UAVs and robots, etc.; Autonomous for driver assist
services, autonomous driving, and autonomous robots; Data
Center Chips and Cards; and Data Center Systems.
For most of the accelerators, their descriptions and commen-
taries have not changed since last year so please refer to last
two years’ papers for descriptions and commentaries. There
are, however, several new releases that were not covered by
past papers that are covered here.
•Acelera, a Dutch embedded system startup, reported
the results of an embedded test chip that they have
produced [35]. They claim both digital and analog design
capabilities, and this test chip was made to test the
extent of the digital design capabilities. They expect to
add analog (probably flash) design elements in upcoming
efforts.
•Maxim Integrated has released a system-on-chip
(SoC) for ultra low power applications called the
MAX78000 [74]–[76], which includes an ARM CPU
core, a RISC-V CPU core and an AI accelerator. The