1 Towards the Multiple Constant Multiplication at Minimal Hardware Cost

2025-04-28 1 0 480.24KB 10 页 10玖币

侵权投诉

Towards the Multiple Constant Multiplication at

Minimal Hardware Cost

R´

emi Garcia and Anastasia Volkova

Abstract—Multiple Constant Multiplication (MCM) over inte-

gers is a frequent operation arising in embedded systems that

require highly optimized hardware. An efﬁcient way is to replace

costly generic multiplication by bit-shifts and additions, i. e. a

multiplierless circuit. In this work, we improve the state-of-

the-art optimal approach for MCM, based on Integer Linear

Programming (ILP). We introduce a new lower-level hardware

cost, based on counting the number of one-bit adders and

demonstrate that it is strongly correlated with the LUT count.

This new model for the multiplierless MCM circuits permitted us

to consider intermediate truncations that permit to signiﬁcantly

save resources when a full output precision is not required. We

incorporate the error propagation rules into our ILP model

to guarantee a user-given error bound on the MCM results.

The proposed ILP models for multiple ﬂavors of MCM are

implemented as an open-source tool and, combined with the

FloPoCo code generator, provide a complete coefﬁcient-to-VHDL

ﬂow. We evaluate our models in extensive experiments, and

propose an in-depth analysis of the impact that design metrics

have on actually synthesized hardware.

Index Terms—multiple constant multiplication, multiplierless

hardware, ILP, datapath optimization

I. INTRODUCTION

Multiplications by integer constants arise in many numerical

algorithms and applications. In particular, algorithms that tar-

get embedded systems often involve Fixed-Point (FxP) num-

bers which can be assimilated to integers. These algorithms

range from dot-product evaluation for deep neural networks

to more complex algorithms such as digital ﬁlters.

In order to save hardware resources, the knowledge on the

constants to multiply with can be used in dedicated multipli-

erless architectures, instead of costly generic multipliers [1].

The shift-and-add approach is the privileged method to reduce

hardware cost, it consists in replacing multiplications by

additions/subtractions and bit-shifts, which are multiplications

of the data by a power of two that can be hardwired for a

negligible cost. For example, multiplying an integer variable

xby the constant 7can be rewritten as 7x= 23x−x, reducing

the cost to a single bit-shift by three positions to the left and

a subtraction, instead of a multiplication.

Given a set of target constants to multiply with, ﬁnding

the implementation with the lowest cost is called the Multiple

Constant Multiplication (MCM) problem. Typical way to

tackle the problem is to ﬁnd shift-and-add implementations

represented using adder graphs, as in Fig. 1, that describe the

multiplierless solutions with the minimum number of adders.

In the following, this problem will be referred to as the MCM-

Adders. The main objective of this work is to ﬁrst improve

R. Garcia and A. Volkova were with Nantes Universit´

e, CNRS, LS2N,

44000 Nantes. Email: ﬁrstname.lastname@univ-nantes.fr

the existing approaches for the MCM-Adders problem, then

push towards ﬁner-grained hardware cost metrics counting

number of one-bit adders (the MCM-Bits problem), and

ﬁnally, use truncations in internal data paths to considerably

save resources (the tMCM problem).

It is straightforward to obtain a ﬁrst shift-and-add solution

from the binary representation of the constant. A greedy

algorithm based on the Canonical Signed Digit (CSD) rep-

resentation permits to reduce the number of adders [2], [3].

Many heuristics enhance the results obtained with the CSD

method [4]–[8], but heuristics do not provide any guarantee on

the solution quality. Optimal approaches for the MCM-Adders

can be roughly divided into two categories: (i) ﬁrst approaches

based on hypergraphs [9] and Integer Linear Programming

(ILP) model [10], which can be solved by generic solvers;

(ii) dedicated optimal algorithms based on branch and bound

technique proposed and further developed by Aksoy et al. [11].

In this work, we improve and extend the results of the ﬁrst

category of approaches, since ILP-based modeling offers a

better versatility for extensions than dedicated algorithms, and

permits to rely on the efﬁciency of generic solvers.

The state-of-the-art ILP-based model for the MCM-Adders

was proposed by Kumm in [12]. This method ﬁnds optimal

solutions in terms of the number of adders in reasonable time,

and has been adapted to SAT/SMT solvers [13] for single

constant multiplication. However, we found an error in the

model which makes it miss some optimal solutions.

In this work, we use [12] as basis, correct the modeling

error for the MCM-Adders problem and build upon it to

solve other ﬂavors of the MCM. In particular, the number

of cascaded adders, called the adder depth (AD), directly

impacts the delay of the circuit, and is hence desired to be

bounded. We ﬁrst encode the AD count in our ILP model,

and secondly propose, for the ﬁrst time, a new bi-objective

formulation called MCMAD, which minimizes the number of

adders and the AD simultaneously.

One of the main contributions of this paper is solving the

MCM for a low-level hardware metric based on counting the

one-bit adders, when the word length of the input xis known

a priori. Assume adder graphs in Fig. 1 taking a 3-bit input x.

While both require only three adders, the solution in Fig. 1a

requires 22 one-bit adders and the other in Fig. 1b uses only 9.

While classical bit-level optimization approaches are iterative

and require synthesis and simulations [4], [14], solving the

MCM with a ﬁne-grained cost model is done only once. By

one-bit adders, we gather both half and full adders as logic

elements having roughly the same cost. This low-level metric

has been discussed for decades, see [4] in which a heuristic

for ﬁxing the topology was presented. To our knowledge, we

arXiv:2210.02742v2 [cs.AR] 10 Oct 2022

(a) AD = 3 (b) AD = 2

Fig. 1: Different adder graph topologies computing the same

outputs: 49xand 51x

propose the ﬁrst optimal approach solving the MCM problem

targeting the one-bit adders (the MCM-Bits problem).

We further extend MCM-Bits model to solve the Truncated

MCM (tMCM) problem. Indeed, the output results of the

computations do not necessarily need to be full-precision

and in practice are often rounded post-MCM. Focusing on a

low-level metric allows to add intermediate truncations [15]–

[17] that will save resources and not waste area and time to

compute bits that will have no impact on the rounded result.

At the same time, our goal is to respect a user-given absolute

error on the result.

Some previous works address the tMCM but applied to a

ﬁxed adder-graph and either do not give a guarantee on the

output error, e. g., the heuristic-based approach [15], or overes-

timate and sometimes wrongly-compute the output error, e. g.,

ILP-based [16]. However, as demonstrated in Fig. 4, some

adder graph topologies are better suited for truncations than

others. Hence, solving directly for tMCM and delegating the

design exploration to an ILP solver, is a better approach. In this

paper, we extend our preliminary work [17] for combining,

for the ﬁrst time, adder graph optimization with internal

truncations. In particular, we tighten the error bounds and treat

several corner cases, compared to [17].

With this paper, we aim at providing a tool that democratizes

access to MCM at minimal hardware cost. We mix the

optimization techniques to model the problem, have an in-

depth look at the hardware addition to provide ﬁne-grain cost

metrics and provide a sound error-analysis to give numerical

guarantees on the computed output. The main ideas of our

approach are presented in Section II and in the next sections

we go through the details of the ILP models. In Section VI,

we demonstrate the efﬁciency of our approach providing

optimization results and obtained hardware comparisons.

II. BIRD VIEW

Our approach is based on ILP modeling, which consists in

stating objectives and constraints as linear equations involving

integer and binary variables. We chose to limit our possibilities

to linear equations because it allows using efﬁcient and robust

solving approaches embedded in commercial or open-source

solvers such as CPLEX, Gurobi, GLPK, etc. Our end goal is

to provide a tool which, given the target constants to multiply

with, the choice of a cost function and several associated

parameters, builds the corresponding ILP model. Solving the

model with your favorite solver results in an adder graph

description, which can be passed to the ﬁxed- and ﬂoating-

point core generator FloPoCo [18] to generate the VHDL code

implementing the MCM circuit.

From the modeling point of view, we search for an adder

graph that computes the product of the input xby given target

constants. In this work, we center the model of an adder graph

around the adders and their associated integer values, called

fundamentals. The rules of construction of an adder graph are

simple:

•for every target constant, there exists one fundamental

equal to its value;

•every fundamental is a sum of its signed and potentially

shifted left and right inputs, which can be other funda-

mentals or the input x;

•every used fundamental can be traced back to the input

x, meaning that the adder graph topology holds.

The search-space for the fundamentals is deduced from

the target constants: a tight upper bound on the number of

fundamentals can be deduced using heuristics [7], [19]; and

the maximum value fundamentals can take is often restricted in

practice by the word length of the largest coefﬁcient. To solve

the MCM-Adders problem correctly, we encode the above

rules as linear constraints. As a result of resolution, we obtain

a sequence of fundamentals, e. g., for Fig. 1a this sequence is

{1,7,49,51}.

On top of that, for the MCM-Bits and tMCM problems,

each adder has an associated one-bit adder cost which must

be correctly computed depending on the values of the inputs.

Furthermore, for the tMCM problem, for each adder we

associate potential truncations for its left and right operands

which permit to reduce the adder cost. However, as each

truncation induces an error, the model of error propagation

should be incorporated as a set of constraints and the output

errors is bounded by a user-given parameter. The challenge is

to consider all possible cases and not overestimate the one-bit

adders cost and errors.

The main challenges are to translate these objective func-

tions and constraints into linear equations over integer/binary

variables. In the following, we give details for that process.

III. OPTIMAL MULTIPLE CONSTANT MULTIPLICATION

COUNTING ADDERS

A. The base model for MCM

The main challenge behind this model is to be able to

formally state the constraints ﬁxing the adder graph topology

with linear equations. For conciseness, we also extensively

use the so-called indicator constraints, which are supported

by most modern MILP solvers, and have the form of:

ax ≤bif y= 1,(1)

where aand bare constants, xis an integer variable and yis

a binary variable. These indicator constraints can be used as

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

1TowardstheMultipleConstantMultiplicationatMinimalHardwareCostR´emiGarciaandAnastasiaVolkovaAbstractMultipleConstantMultiplication(MCM)overinte-gersisafrequentoperationarisinginembeddedsystemsthatrequirehighlyoptimizedhardware.Anefcientwayistoreplacecostlygenericmultiplicationbybit-shiftsandadditi...

展开>> 收起<<

1 Towards the Multiple Constant Multiplication at Minimal Hardware Cost.pdf

共10页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

1 Towards the Multiple Constant Multiplication at Minimal Hardware Cost

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: