1 AI and ML Accelerator Survey and Trends Albert Reuther Peter Michaleas Michael Jones Vijay Gadepally Siddharth Samsi and Jeremy Kepner

2025-04-28 0 0 793.15KB 10 页 10玖币

侵权投诉

AI and ML Accelerator Survey and Trends

Albert Reuther, Peter Michaleas, Michael Jones, Vijay Gadepally, Siddharth Samsi, and Jeremy Kepner

MIT Lincoln Laboratory Supercomputing Center

Lexington, MA, USA

{reuther,pmichaleas,michael.jones,vijayg,sid,kepner}@ll.mit.edu

Abstract—This paper updates the survey of AI accelerators

and processors from past three years. This paper collects and

summarizes the current commercial accelerators that have been

publicly announced with peak performance and power consump-

tion numbers. The performance and power values are plotted on

a scatter graph, and a number of dimensions and observations

from the trends on this plot are again discussed and analyzed.

Two new trends plots based on accelerator release dates are

included in this year’s paper, along with the additional trends

of some neuromorphic, photonic, and memristor-based inference

accelerators.

Index Terms—Machine learning, GPU, TPU, dataﬂow, accel-

erator, embedded inference, computational performance

I. INTRODUCTION

Just as last year, the pace of new announcements, releases,

and deployments of artiﬁcial intelligence (AI) and machine

learning (ML) accelerators from startups and established tech-

nology companies has been modest. This is not unreason-

able; for many companies that have released an accelerator

report having spent three or four years researching, analyzing,

designing, verifying, and validating their accelerator design

trade-offs and building the software stack to program the

accelerator. For those who have released subsequent versions

of their accelerator, they have reported shorter development

cycles, though it is still at least two or three years. The focus of

these accelerators continues to be on accelerating deep neural

network (DNN) models, and the application space spans from

very low power embedded voice recognition and image clas-

siﬁcation to data center scale training, while the competition

for deﬁning markets and application areas continues as part

of a much larger industrial and technology shift in modern

computing to machine learning solutions.

AI ecosystems bring together components from embed-

ded computing (edge computing), traditional high perfor-

mance computing (HPC), and high performance data analy-

sis (HPDA) that must work together to effectively provide

capabilities for use by decision makers, warﬁghters, and

analysts [1]. Figure 1 captures an architectural overview of

such end-to-end AI solutions and their components. On the

left side of Figure 1, structured and unstructured data sources

provide different views of entities and/or phenomenology.

This material is based upon work supported by the Assistant Secretary

of Defense for Research and Engineering under Air Force Contract No.

FA8702-15-D-0001. Any opinions, ﬁndings, conclusions or recommendations

expressed in this material are those of the author(s) and do not necessarily

reﬂect the views of the Assistant Secretary of Defense for Research and

Engineering.

Fig. 1: Canonical AI architecture consists of sensors, data con-

ditioning, algorithms, modern computing, robust AI, human-

machine teaming, and users (missions). Each step is critical

in developing end-to-end AI applications and systems.

These raw data products are fed into a data conditioning step

in which they are fused, aggregated, structured, accumulated,

and converted into information. The information generated by

the data conditioning step feeds into a host of supervised

and unsupervised algorithms such as neural networks, which

extract patterns, predict new events, ﬁll in missing data, or

look for similarities across datasets, thereby converting the

input information to actionable knowledge. This actionable

knowledge is then passed to human beings for decision-

making processes in the human-machine teaming phase. The

phase of human-machine teaming provides the users with

useful and relevant insight turning knowledge into actionable

intelligence or insight.

Underpinning this system are modern computing systems.

Moore’s law trends have ended [2], as have a number of related

laws and trends including Denard’s scaling (power density),

clock frequency, core counts, instructions per clock cycle,

and instructions per Joule (Koomey’s law) [3]. Taking a page

from the system-on-chip (SoC) trends ﬁrst seen in automotive

applications, robotics, and smartphones, advancements and

innovations are still progressing by developing and integrating

accelerators for often-used operational kernels, methods, or

functions. These accelerators are designed with a different

balance between performance and functional ﬂexibility. This

includes an explosion of innovation in deep machine learning

processors and accelerators [4]–[8]. In this series of survey

papers, we explore the relative beneﬁts of these technologies

since they are of particular importance to applying AI to

domains under signiﬁcant constraints such as size, weight, and

arXiv:2210.04055v1 [cs.AR] 8 Oct 2022

power, both in embedded applications and in data centers.

This paper is an update to IEEE-HPEC papers from the past

three years [9]–[11]. As in past years, this paper continues

with last year’s focus on accelerators and processors that are

geared toward deep neural networks (DNNs) and convolutional

neural networks (CNNs) as they are quite computationally in-

tense [12]. This survey focuses on accelerators and processors

for inference for a variety of reasons including that defense

and national security AI/ML edge applications rely heavily on

inference. And we will consider all of the numerical precision

types that an accelerator supports, but for most of them, their

best inference performance is in int8 or fp16/bf16 (IEEE 16-

bit ﬂoating point or Google’s 16-bit brain ﬂoat).

There are many surveys [13]–[24] and other papers that

cover various aspects of AI accelerators. For instance, the ﬁrst

paper in this multi-year survey included the peak performance

of FPGAs for certain AI models; however, several of the

aforementioned surveys cover FPGAs in depth so they are

no longer included in this survey. This multi-year survey

effort and this paper focus on gathering a comprehensive list

of AI accelerators with their computational capability, power

efﬁciency, and ultimately the computational effectiveness of

utilizing accelerators in embedded and data center applica-

tions. Along with this focus, this paper mainly compares

neural network accelerators that are useful for government

and industrial sensor and data processing applications. A few

accelerators and processors that were included in previous

years’ papers have been left out of this year’s survey. They

have been dropped because they have been surpassed by

new accelerators from the same company, they are no longer

offered, or they are no longer relevant to the topic.

II. SURVEY OF PROCESSORS

Many recent advances in AI can be at least partly cred-

ited to advances in computing hardware [6], [7], [25], [26],

enabling computationally heavy machine-learning algorithms

and in particular DNNs. This survey gathers performance and

power information from publicly available materials including

research papers, technical trade press, company benchmarks,

etc. While there are ways to access information from com-

panies and startups (including those in their silent period),

this information is intentionally left out of this survey; such

data will be included in this survey when it becomes publicly

available. The key metrics of this public data are plotted in

Figure 2, which graphs recent processor capabilities (as of July

2022) mapping peak performance vs. power consumption. The

dash-dotted box depicts the very dense region that is zoomed

in and plotted in Figure 3.

The x-axis indicates peak power, and the y-axis indicate

peak giga-operations per second (GOps/s), both on a loga-

rithmic scale. The computational precision of the processing

capability is depicted by the geometric shape used; the com-

putational precision spans from analog and single-bit int1 to

four-byte int32 and two-byte fp16 to eight-byte fp64. The

precisions that show two types denotes the precision of the

multiplication operations on the left and the precision of

the accumulate/addition operations on the right (for example,

fp16.32 corresponds to fp16 for multiplication and fp32 for

accumulate/add). The form factor is depicted by color, which

shows the package for which peak power is reported. Blue

corresponds to a single chip; orange corresponds to a card; and

green corresponds to entire systems (single node desktop and

server systems). This survey is limited to single motherboard,

single memory-space systems. Finally, the hollow geometric

objects are peak performance for inference-only accelerators,

while the solid geometric ﬁgures are performance for acceler-

ators that are designed to perform both training and inference.

The survey begins with the same scatter plot that we have

compiled for the past three years. As we did last year, to save

space, we have summarized some of the important metadata

of the accelerators, cards, and systems in Table I, including

the label used in Figure 2 for each of the points on the

graph; many of the points were brought forward from last

year’s plot, and some details of those entries are in [9].

There are several additions which we will cover below. In

Table I, most of the columns and entries are self explana-

tory. However, there are two Technology entries that may

not be: dataﬂow and PIM. Dataﬂow processors are custom-

designed processors for neural network inference and training.

Since neural network training and inference computations can

be entirely deterministically laid out, they are amenable to

dataﬂow processing in which computations, memory accesses,

and inter-ALU communications actions are explicitly/statically

programmed or “placed-and-routed” onto the computational

hardware. Processor in memory (PIM) accelerators integrate

processing elements with memory technology. Among such

PIM accelerators are those based on an analog computing

technology that augments ﬂash memory circuits with in-place

analog multiply-add capabilities. Please refer to the references

for the Mythic and Gyrfalcon accelerators for more details on

this innovative technology.

Finally, a reasonable categorization of accelerators follows

their intended application, and the ﬁve categories are shown

as ellipses on the graph, which roughly correspond to perfor-

mance and power consumption: Very Low Power for speech

processing, very small sensors, etc.; Embedded for cameras,

small UAVs and robots, etc.; Autonomous for driver assist

services, autonomous driving, and autonomous robots; Data

Center Chips and Cards; and Data Center Systems.

For most of the accelerators, their descriptions and commen-

taries have not changed since last year so please refer to last

two years’ papers for descriptions and commentaries. There

are, however, several new releases that were not covered by

past papers that are covered here.

•Acelera, a Dutch embedded system startup, reported

the results of an embedded test chip that they have

produced [35]. They claim both digital and analog design

capabilities, and this test chip was made to test the

extent of the digital design capabilities. They expect to

add analog (probably ﬂash) design elements in upcoming

efforts.

•Maxim Integrated has released a system-on-chip

(SoC) for ultra low power applications called the

MAX78000 [74]–[76], which includes an ARM CPU

core, a RISC-V CPU core and an AI accelerator. The

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

1AIandMLAcceleratorSurveyandTrendsAlbertReuther,PeterMichaleas,MichaelJones,VijayGadepally,SiddharthSamsi,andJeremyKepnerMITLincolnLaboratorySupercomputingCenterLexington,MA,USAfreuther,pmichaleas,michael.jones,vijayg,sid,kepnerg@ll.mit.eduAbstractThispaperupdatesthesurveyofAIacceleratorsandprocess...

展开>> 收起<<

1 AI and ML Accelerator Survey and Trends Albert Reuther Peter Michaleas Michael Jones Vijay Gadepally Siddharth Samsi and Jeremy Kepner.pdf

共10页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

1 AI and ML Accelerator Survey and Trends Albert Reuther Peter Michaleas Michael Jones Vijay Gadepally Siddharth Samsi and Jeremy Kepner

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: