1 Towards the Multiple Constant Multiplication at Minimal Hardware Cost

2025-04-28 0 0 480.24KB 10 页 10玖币
侵权投诉
1
Towards the Multiple Constant Multiplication at
Minimal Hardware Cost
R´
emi Garcia and Anastasia Volkova
Abstract—Multiple Constant Multiplication (MCM) over inte-
gers is a frequent operation arising in embedded systems that
require highly optimized hardware. An efficient way is to replace
costly generic multiplication by bit-shifts and additions, i. e. a
multiplierless circuit. In this work, we improve the state-of-
the-art optimal approach for MCM, based on Integer Linear
Programming (ILP). We introduce a new lower-level hardware
cost, based on counting the number of one-bit adders and
demonstrate that it is strongly correlated with the LUT count.
This new model for the multiplierless MCM circuits permitted us
to consider intermediate truncations that permit to significantly
save resources when a full output precision is not required. We
incorporate the error propagation rules into our ILP model
to guarantee a user-given error bound on the MCM results.
The proposed ILP models for multiple flavors of MCM are
implemented as an open-source tool and, combined with the
FloPoCo code generator, provide a complete coefficient-to-VHDL
flow. We evaluate our models in extensive experiments, and
propose an in-depth analysis of the impact that design metrics
have on actually synthesized hardware.
Index Terms—multiple constant multiplication, multiplierless
hardware, ILP, datapath optimization
I. INTRODUCTION
Multiplications by integer constants arise in many numerical
algorithms and applications. In particular, algorithms that tar-
get embedded systems often involve Fixed-Point (FxP) num-
bers which can be assimilated to integers. These algorithms
range from dot-product evaluation for deep neural networks
to more complex algorithms such as digital filters.
In order to save hardware resources, the knowledge on the
constants to multiply with can be used in dedicated multipli-
erless architectures, instead of costly generic multipliers [1].
The shift-and-add approach is the privileged method to reduce
hardware cost, it consists in replacing multiplications by
additions/subtractions and bit-shifts, which are multiplications
of the data by a power of two that can be hardwired for a
negligible cost. For example, multiplying an integer variable
xby the constant 7can be rewritten as 7x= 23xx, reducing
the cost to a single bit-shift by three positions to the left and
a subtraction, instead of a multiplication.
Given a set of target constants to multiply with, finding
the implementation with the lowest cost is called the Multiple
Constant Multiplication (MCM) problem. Typical way to
tackle the problem is to find shift-and-add implementations
represented using adder graphs, as in Fig. 1, that describe the
multiplierless solutions with the minimum number of adders.
In the following, this problem will be referred to as the MCM-
Adders. The main objective of this work is to first improve
R. Garcia and A. Volkova were with Nantes Universit´
e, CNRS, LS2N,
44000 Nantes. Email: firstname.lastname@univ-nantes.fr
the existing approaches for the MCM-Adders problem, then
push towards finer-grained hardware cost metrics counting
number of one-bit adders (the MCM-Bits problem), and
finally, use truncations in internal data paths to considerably
save resources (the tMCM problem).
It is straightforward to obtain a first shift-and-add solution
from the binary representation of the constant. A greedy
algorithm based on the Canonical Signed Digit (CSD) rep-
resentation permits to reduce the number of adders [2], [3].
Many heuristics enhance the results obtained with the CSD
method [4]–[8], but heuristics do not provide any guarantee on
the solution quality. Optimal approaches for the MCM-Adders
can be roughly divided into two categories: (i) first approaches
based on hypergraphs [9] and Integer Linear Programming
(ILP) model [10], which can be solved by generic solvers;
(ii) dedicated optimal algorithms based on branch and bound
technique proposed and further developed by Aksoy et al. [11].
In this work, we improve and extend the results of the first
category of approaches, since ILP-based modeling offers a
better versatility for extensions than dedicated algorithms, and
permits to rely on the efficiency of generic solvers.
The state-of-the-art ILP-based model for the MCM-Adders
was proposed by Kumm in [12]. This method finds optimal
solutions in terms of the number of adders in reasonable time,
and has been adapted to SAT/SMT solvers [13] for single
constant multiplication. However, we found an error in the
model which makes it miss some optimal solutions.
In this work, we use [12] as basis, correct the modeling
error for the MCM-Adders problem and build upon it to
solve other flavors of the MCM. In particular, the number
of cascaded adders, called the adder depth (AD), directly
impacts the delay of the circuit, and is hence desired to be
bounded. We first encode the AD count in our ILP model,
and secondly propose, for the first time, a new bi-objective
formulation called MCMAD, which minimizes the number of
adders and the AD simultaneously.
One of the main contributions of this paper is solving the
MCM for a low-level hardware metric based on counting the
one-bit adders, when the word length of the input xis known
a priori. Assume adder graphs in Fig. 1 taking a 3-bit input x.
While both require only three adders, the solution in Fig. 1a
requires 22 one-bit adders and the other in Fig. 1b uses only 9.
While classical bit-level optimization approaches are iterative
and require synthesis and simulations [4], [14], solving the
MCM with a fine-grained cost model is done only once. By
one-bit adders, we gather both half and full adders as logic
elements having roughly the same cost. This low-level metric
has been discussed for decades, see [4] in which a heuristic
for fixing the topology was presented. To our knowledge, we
arXiv:2210.02742v2 [cs.AR] 10 Oct 2022
2
(a) AD = 3 (b) AD = 2
Fig. 1: Different adder graph topologies computing the same
outputs: 49xand 51x
propose the first optimal approach solving the MCM problem
targeting the one-bit adders (the MCM-Bits problem).
We further extend MCM-Bits model to solve the Truncated
MCM (tMCM) problem. Indeed, the output results of the
computations do not necessarily need to be full-precision
and in practice are often rounded post-MCM. Focusing on a
low-level metric allows to add intermediate truncations [15]–
[17] that will save resources and not waste area and time to
compute bits that will have no impact on the rounded result.
At the same time, our goal is to respect a user-given absolute
error on the result.
Some previous works address the tMCM but applied to a
fixed adder-graph and either do not give a guarantee on the
output error, e. g., the heuristic-based approach [15], or overes-
timate and sometimes wrongly-compute the output error, e. g.,
ILP-based [16]. However, as demonstrated in Fig. 4, some
adder graph topologies are better suited for truncations than
others. Hence, solving directly for tMCM and delegating the
design exploration to an ILP solver, is a better approach. In this
paper, we extend our preliminary work [17] for combining,
for the first time, adder graph optimization with internal
truncations. In particular, we tighten the error bounds and treat
several corner cases, compared to [17].
With this paper, we aim at providing a tool that democratizes
access to MCM at minimal hardware cost. We mix the
optimization techniques to model the problem, have an in-
depth look at the hardware addition to provide fine-grain cost
metrics and provide a sound error-analysis to give numerical
guarantees on the computed output. The main ideas of our
approach are presented in Section II and in the next sections
we go through the details of the ILP models. In Section VI,
we demonstrate the efficiency of our approach providing
optimization results and obtained hardware comparisons.
II. BIRD VIEW
Our approach is based on ILP modeling, which consists in
stating objectives and constraints as linear equations involving
integer and binary variables. We chose to limit our possibilities
to linear equations because it allows using efficient and robust
solving approaches embedded in commercial or open-source
solvers such as CPLEX, Gurobi, GLPK, etc. Our end goal is
to provide a tool which, given the target constants to multiply
with, the choice of a cost function and several associated
parameters, builds the corresponding ILP model. Solving the
model with your favorite solver results in an adder graph
description, which can be passed to the fixed- and floating-
point core generator FloPoCo [18] to generate the VHDL code
implementing the MCM circuit.
From the modeling point of view, we search for an adder
graph that computes the product of the input xby given target
constants. In this work, we center the model of an adder graph
around the adders and their associated integer values, called
fundamentals. The rules of construction of an adder graph are
simple:
for every target constant, there exists one fundamental
equal to its value;
every fundamental is a sum of its signed and potentially
shifted left and right inputs, which can be other funda-
mentals or the input x;
every used fundamental can be traced back to the input
x, meaning that the adder graph topology holds.
The search-space for the fundamentals is deduced from
the target constants: a tight upper bound on the number of
fundamentals can be deduced using heuristics [7], [19]; and
the maximum value fundamentals can take is often restricted in
practice by the word length of the largest coefficient. To solve
the MCM-Adders problem correctly, we encode the above
rules as linear constraints. As a result of resolution, we obtain
a sequence of fundamentals, e. g., for Fig. 1a this sequence is
{1,7,49,51}.
On top of that, for the MCM-Bits and tMCM problems,
each adder has an associated one-bit adder cost which must
be correctly computed depending on the values of the inputs.
Furthermore, for the tMCM problem, for each adder we
associate potential truncations for its left and right operands
which permit to reduce the adder cost. However, as each
truncation induces an error, the model of error propagation
should be incorporated as a set of constraints and the output
errors is bounded by a user-given parameter. The challenge is
to consider all possible cases and not overestimate the one-bit
adders cost and errors.
The main challenges are to translate these objective func-
tions and constraints into linear equations over integer/binary
variables. In the following, we give details for that process.
III. OPTIMAL MULTIPLE CONSTANT MULTIPLICATION
COUNTING ADDERS
A. The base model for MCM
The main challenge behind this model is to be able to
formally state the constraints fixing the adder graph topology
with linear equations. For conciseness, we also extensively
use the so-called indicator constraints, which are supported
by most modern MILP solvers, and have the form of:
ax bif y= 1,(1)
where aand bare constants, xis an integer variable and yis
a binary variable. These indicator constraints can be used as
摘要:

1TowardstheMultipleConstantMultiplicationatMinimalHardwareCostR´emiGarciaandAnastasiaVolkovaAbstract—MultipleConstantMultiplication(MCM)overinte-gersisafrequentoperationarisinginembeddedsystemsthatrequirehighlyoptimizedhardware.Anefcientwayistoreplacecostlygenericmultiplicationbybit-shiftsandadditi...

展开>> 收起<<
1 Towards the Multiple Constant Multiplication at Minimal Hardware Cost.pdf

共10页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:10 页 大小:480.24KB 格式:PDF 时间:2025-04-28

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 10
客服
关注