1 High-Throughput Flexible Belief Propagation List Decoder for Polar Codes

2025-04-30 0 0 3.78MB 14 页 10玖币
侵权投诉
1
High-Throughput Flexible Belief Propagation
List Decoder for Polar Codes
Yuqing Ren, Yifei Shen, Leyu Zhang, Andreas Toftegaard Kristensen, Alexios
Balatsoukas-Stimming, Member, IEEE, Andreas Burg, Senior Member, IEEE, Chuan Zhang, Senior Member, IEEE
Abstract—Owing to its high parallelism, belief propagation
(BP) decoding is highly amenable to high-throughput implemen-
tations and thus represents a promising solution for meeting
the ultra-high peak data rate of future communication systems.
However, for polar codes, the error-correcting performance of
BP decoding is far inferior to that of the widely used CRC-aided
successive cancellation list (SCL) decoding algorithm. To close the
performance gap to SCL, BP list (BPL) decoding expands the
exploration of candidate codewords through multiple permuted
factor graphs (PFGs). From an implementation perspective,
designing a unified and flexible hardware architecture for BPL
decoding that supports various PFGs and code configurations
presents a big challenge. In this paper, we propose the first
hardware implementation of a BPL decoder for polar codes and
overcome the implementation challenge by applying a hardware-
friendly algorithm that generates flexible permutations on-the-fly.
First, we derive the graph selection gain and provide a sequential
generation (SG) algorithm to obtain a near-optimal PFG set.
We further prove that any permutation can be decomposed
into a combination of multiple fixed routings, and we design
a low-complexity permutation network to satisfy the decoding
schedule. Our BPL decoder not only has a low decoding latency
by executing the decoding and permutation generation in parallel,
but also supports an arbitrary list size without any area overhead.
Experimental results show that, for length-1024 polar codes with
a code rate of one-half, our BPL decoder with 32 PFGs has
a similar error-correcting performance to SCL with a list size
of 4and achieves a throughput of 25.63 Gbps and an area
efficiency of 29.46 Gbps/mm2at SNR = 4.0dB, which is
1.82×and 4.33×faster than the state-of-the-art BP flip and
SCL decoders, respectively.
Index Terms—polar codes, high-throughput, belief propagation
list (BPL) decoding, permuted factor graph, permutation, auto-
morphism ensemble, hardware implementation.
I. INTRODUCTION
P
OLAR codes, proposed by Arıkan in [1], have become
an integral part of 5G new radio (NR), where they
were ratified as the standard codes for the control channels
of 5G enhanced mobile broadband (eMBB) scenarios [2].
Along with the invention of polar codes, Arıkan introduced
successive cancellation (SC) decoding and belief propagation
Y. Ren, Y. Shen, A. T. Kristensen, and A. Burg are with the Telecom-
munications Circuits Laboratory (TCL),
´
Ecole Polytechnique F
´
ed
´
erale
de Lausanne (EPFL), Lausanne 1015, Switzerland (email:
{
yuqing.ren,
yifei.shen, andreas.kristensen, andreas.burg
}
@epfl.ch). Corresponding author:
Andreas Burg.
Y. Shen, L, Zhang and C. Zhang are with the LEADS of Southeast University,
the National Mobile Communications Research Laboratory, and the Purple
Mountain Laboratories, Nanjing 210096, China (email: chzhang@seu.edu.cn).
A. Balatsoukas-Stimming is with the Department of Electrical Engineering,
Eindhoven University of Technology, 5600 MB Eindhoven, The Netherlands
(email: a.k.balatsoukas.stimming@tue.nl).
(BP) decoding. Following the evolution of communication
scenarios, both SC and BP decoding led the development of
polar decoding algorithms and implementations, which were
extended into a series of advanced polar decoders such as
SC list (SCL) [3]–[9], BP list (BPL) [10]–[18], and BP flip
(BPF) [19]–[22] decoders.
While the original SC decoding algorithm can achieve
channel capacity at infinite code lengths, it shows poor error-
correcting performance with practical finite code lengths. To
improve the error-correcting performance of SC decoding, SCL
decoding was proposed in [3] to keep a list of up to
L
candi-
date codewords. Additionally, when concatenated with cyclic
redundancy check (CRC) codes [4], polar codes with SCL
decoding outperform low-density parity-check (LDPC) and
Turbo codes in terms of the error-correcting performance [23].
To satisfy the low-latency and high throughput requirements of
eMBB scenarios, node-based fast SCL decoders [5]–[9] focus
on exploiting special constituent codes [24]–[28], which help
to avoid traversing the lower stages of the decoding tree to
provide a significant reduction in decoding latency compared
to conventional bit-wise SCL decoders. The state-of-the-art
(SOA) node-based SCL decoder [9] with a list size (
L= 8
)
achieves a throughput of more than
2.94
Gbps, which fits
the reliability, latency, and throughput requirements of eMBB
scenarios. However, when considering the ultra-high peak data
rate requirements of future communication systems [29], SC-
based decoders become impractical due to the serial processing
inherent in these algorithms [5]–[9].
In contrast to SC-based decoding, BP decoding is an
inherently parallel algorithm. BP decoding can thus be im-
plemented easily in a multi-stage factor graph in pursuit of
a much higher throughput [30]. Additionally, BP decoding
has the potential to realize iterative detection and decoding
to achieve better system performance than separate detection
and decoding [31], [32], which further raises the interest in
BP decoding for academia and industry. Though the error-
correcting performance of BP decoding improves as the
iteration number increases, it is still far behind the SCL
performance. BPF decoding [19]–[22] and BPL decoding [10]–
[18] are two advanced BP algorithms that can approach the
performance of SCL by expanding the exploration of candidate
codewords. BPF decoding guesses the positions of error-prone
bits and sequentially corrects them in additional decoding
attempts. Unfortunately, online identification of error-prone
bits [20]–[22] through sorting and post-processing of channel
messages increases the hardware complexity and degrades
the maximum operating frequency [22]. Alternatively, BPL
arXiv:2210.13887v2 [cs.IT] 19 Mar 2023
2
decoding proposed in [11] tries to decode on multiple permuted
factor graphs (PFGs), where the number of possible PFGs
is
n! (n= log2N)
for length-
N
polar codes. Decoding
schedules of BPL can be divided into parallel [11]–[14] and
serial schedules [17], respectively. In parallel BPL decoding,
L
independent BP decoders operate in parallel (each BP decoder
works on a unique PFG) and the optimal codeword with the
minimum Euclidean distance to the received signals is selected
from the
L
identified candidate codewords. However, parallel
BPL decoding has very poor hardware utilization, especially
for large list sizes. To avoid the high hardware consumption
caused by the parallel architecture, the authors of [17] proposed
a serial BPL decoding schedule, in which shuffling the input
LLRs can be substituted for permutations of the factor graph
stages. This hardware-friendly decoding strategy allows BPL
decoding to reuse a single BP decoder at the cost of merely
shuffling the input LLRs into a specific order for each PFG.
To improve the error-correcting performance of BPL decod-
ing, numerous researchers have explored methods of optimizing
the PFG selection, including empirical methods [11], [17],
[33] and analytical methods [14]–[16]. It is noteworthy that
the authors of [14] first derived the permutation gain for
parallel BPL decoding, which provides the inspiration for
analytically solving the optimal PFG selection. In view of
hardware implementations, many works of BP decoders have
been presented in [30], [34]–[39]. Compared to the classical
single-column BP architectures [30], the SOA double-column
bidirectional-propagation architecture [39] instantiates two
processing element (PE) arrays and propagates the left-to-
right and right-to-left messages simultaneously to improve the
throughput. Moreover, the most challenging task for the BPL
decoder is the implementation of flexible permutations since
the PFG selection algorithms [14]–[16], [40] are generally
dynamic, corresponding to varying code configurations or
channel environments. Even if based on area-efficient serial
decoding, the BPL decoder still needs to support the generation
of flexible permutations by shuffling the input LLRs into a
specific order for each PFG. A straightforward method is to
utilize the Bene
ˇ
s network [41], [42], which is an optimal non-
blocking network that can achieve any arbitrary permutation.
However, the design space of permutations is
n!
instead of
N!
for length-
N
polar codes, and the control signals of the
Bene
ˇ
s network are difficult to generate on-the-fly for each
PFG. It is not efficient to adopt the Bene
ˇ
s network in the BPL
decoder. In summary, there are thus two critical problems for
the BPL decoder:
How to select the optimal PFG set from
n!
PFGs for
length-Npolar codes?
How to efficiently implement flexible permutations for
BPL decoding in hardware?
It is further noteworthy that BPL decoding is a particular case of
a generalized automorphism ensemble (AE) decoding, in which
we can deploy the SC, SCL, or BP decoding on multiple PFGs
to achieve ML performance of polar or Reed-Muller (RM)
codes [43], [44]. Hence, the solutions to these two problems
are significant for both BPL and for generalized AE decoding.
Contributions:
In this paper, we present the first BPL implementation, which
solves the aforementioned two problems, i.e., the use of near-
optimal PFG sets and the generation of flexible permutations.
Our contributions comprise the following:
We derive the block error probability of serial BPL
decoding and present a criterion for determining the
best PFG set. Then, we propose a sequential generation
(SG) algorithm that can efficiently obtain a near-optimal
PFG set. Simulations show that our BPL decoder with
L= 32
achieves similar error-correcting performance to
SCL with L= 4.
We propose a hardware-friendly algorithm using low-
complexity matrix decomposition to generate flexible
permutation routings for all PFGs. To this end, we
provide a mathematical model for permutations and
demonstrate that the permutation routing of each PFG can
be decomposed into a combination of
n1
fixed sub-
routings. This decomposition process can be done online.
We present the first hardware architecture of a BPL
decoder, based on the double-column bidirectional-
propagation scheme [39], that incorporates the aforemen-
tioned flexible permutation generator. To improve the
throughput of the BPL decoder, we adopt a decoupled
strategy that enables BP decoding and permutation gener-
ation to be executed simultaneously. It is noteworthy that
our decoder can increase the list size arbitrarily without
any additional area overhead. Synthesis results show that,
for
L= 32
, our decoder can achieve a throughput of
25.63
Gbps with an area efficiency of
29.46
Gbps/mm
2
at SNR
= 4
dB, which outperforms the SOA BP and BPF
decoders [20], [21], [38], [39], [45].
The remainder of this paper is organized as follows. Sec-
tion II reviews the background of polar codes, BP decoding, and
BPL decoding. Section III analyses the permutation gain for
serial BPL decoding and presents a graph selection algorithm
for a near-optimal PFG set. In Section IV, a hardware-friendly
algorithm for any permutation generation is proposed. Section V
presents our BPL decoder architecture with several advanced
techniques. Section VI provides our implementation results
and compares them with the SOA polar decoders. Finally,
Section VII concludes this paper.
II. PRELIMINARIES
Notation: Throughout this paper, we use the following
symbol definitions. Boldface lowercase letters
u
denote vectors,
where
ui
means the
i
-th element of
u
and
uj
i
denotes the
sub-vector
[uiui+1 . . . uj], i j
. If
i>j
,
uj
i=
.
Boldface uppercase letters
B
denote matrices, where
Bij
and
bj
denote the element at the
i
-th row and
j
-th column of
B
and the
j
-th column of
B
, respectively. In terms of the
factor graph for polar codes with length
N= 2n
, we use
πo
,
[m0m1m2. . . mn1]
, and
[0 1 2 . . . n 1]
to represent
the original factor graph (OFG), its stages, and its stage order,
respectively. Similarly, we use
π
,
[mπ0mπ1mπ2. . . mπn1]
,
and
[π0π1π2. . . πn1]
to denote any other PFG, its stages,
and its stage order, respectively. If
L
is a set of
L
PFG
3
3
u
6
u
7
u
1
x
2
x
3
x
4
x
5
x
6
x
7
x
1
m
2
m
1
u
2
u
0
m
0
u
4
u
5
u
PE
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
(000)
(001)
(010)
(011)
(100)
(101)
(110)
(111)
(000)
(001)
(010)
(011)
(100)
(101)
(110)
(111)
11
t
i,j
+
L
1
t
i,j+
R
1
21
j
t
i ,j
++
L
21
j
t
i ,j++
R
2j
t
i ,j+
R
2j
t
i ,j+
L
t
i,j
R
t
i,j
L
Fig. 1. The OFG for length-
8
polar codes, where one PE is marked in red
and F={0,1,2,4}is marked in grey.
candidates,
{˜π0˜π1. . . ˜πL1}
,
|L| =L
means its cardinality.
Note that all indices related to decoding start from
0
. The
hard decision function is defined as
HD(x)=1
if
x < 0
and
HD(x) = 0
if
x0
. We adopt the following parameters
for polar codes,
N
is the code length,
K
is the number of
message bits,
R=K/N
the code rate,
P
the number of CRC
bits,
K0=K+P
the number of information bits with the
CRC bits attached. The frozen and unfrozen bit set indices are
denoted as
F
and
A
, respectively, and we refer to a code as
an
(N, K)
polar code. As in this work we only consider polar
codes that are concatenated with CRC codes, we use the term
SCL decoding to refer to CRC-aided SCL decoding for brevity.
A. Construction and Encoding of Polar Codes
Given an input bit sequence
u
, the encoded vector
x
is
generated by
x=u·GN
, where
GN=Fn
denotes the
Kronecker power of the kernel
F= [ 1 0
1 1 ]
. Based on the
principle of channel polarization [1], the
N
bits in
u
correspond
to
N
coordinated bit channels with different reliabilities, where
the
K0
most reliable bit channels transmit unfrozen bits with
CRC attached and the remaining
NK0
bit channels transmit
frozen bits, typically set to a value of 0. Note that the metric
used to determine the bit channel reliability has an impact on
A
and influences the performance of polar codes. For 5G NR [46],
a universal reliability sequence is applied to formulate
A
with
K0
bits for uplink (UL) and downlink (DL) channels. Besides,
a novel polar code construction framework tailored to a given
decoding algorithm based on a genetic algorithm (GenAlg) was
introduced in [47], where populations of unfrozen sets evolve
based on the error-correcting performance of a given decoder.
B. BP Decoding of Polar Codes on the Factor Graph
The BP algorithm is a classical iterative algorithm to
calculate the marginal probability by the sum-product (SP)
equations on a factor graph [48]. Motivated by BP decoding
for RM codes, Arıkan first proposed BP decoding for polar
codes on the generator matrix-based factor graph [49]. The
OFG structure with three stages [
m0m1m2
] is shown in Fig. 1.
Namely, an
(N, K)
polar code is represented as an
n
-stage
factor graph, and each stage has
N/2
PEs. Two types of LLR
messages (left-to-right
R
and right-to-left
L
) are propagated over
Decoding
on new PFG Detection
Yes
No Detection
Yes
No ...
Input
...
Decoding
on new PFG
Decoding
on new PFG Decoding
on new PFG
Decoding
on new PFG
Output
Fig. 2. Overall framework for serial BPL decoding with the detection.
PEs on the factor graph. At the
t
-th iteration, for
j= 0, . . . , n
,
Rt
j
and
Lt
j
can be denoted as the
j
-th column of
R
- and
L
-
messages, respectively, where
Rt
i,j
and
Lt
i,j
denote the messages
at the
i
-th bit index of the
j
-th column, respectively. Each PE
propagates R- and L-messages as follows [50].
Lt
i,j =g(Lt1
i,j+1,Lt1
i+2j,j+1 +Rt
i+2j,j , βL),
Lt
i+2j,j =g(Lt1
i,j+1,Rt
i,j , βL) + Lt1
i+2j,j+1,
Rt
i,j+1 =g(Rt
i,j ,Lt1
i+2j,j+1 +Rt
i+2j,j , βR),
Rt
i+2j,j+1 =g(Rt
i,j ,Lt1
i,j+1, βR) + Rt
i+2j,j .
(1)
where we adopt the offset-MS (OMS) equation [20], [51] to
approximate the SP equation for all iterative BP decoders
It can be implemented easily in hardware to approach the
performance of the SP, where
g(a, b, β) = sgn(a)·sgn(b)·
max(min(|a|,|b|)β, 0)
and
[βRβL] = [0.25 0]
. At the
beginning of BP decoding,
R0
0
is initialized as a-priori
+
or
0
according to the bit channel allocation of
F
and
L0
n
is
initialized as a-posterior LLR values from the received signals
y
, i.e.,
ln Pr(yi|xi=0)
Pr(yi|xi=1) ,0iN1
.
R
- and
L
-messages of
other stages on the factor graph are initialized as
0
. When the
maximum number of iteration
Imax
is reached, the HD results
ˆu
are estimated based on the decision LLRs (
RImax1
0+LImax1
0
).
In the following, we omit the iteration index
t= 0
of the initial
LLRs R0
0and L0
nfor brevity.
C. BPL Decoding of Polar Codes
BPL decoding [11] is an efficient algorithm to enhance the
error-correcting performance of BP decoding, which executes
multiple BP decoding procedures on multiple PFGs either
in parallel [11]–[14] or serially [17]. Parallel BPL decoding
instantiates a set
L
of
L
independent BP decoders (each BP
decoder works on a unique PFG), which leads to a poor
hardware utilization since only one result is finally retained.
Alternatively, serial BPL decoding can reuse a single BP
decoder, which is illustrated in Fig. 2. If a BP decoding attempt
fails to pass the detection within
Imax
iterations, serial BPL
decoding activates the decoding on next PFG. Note that due
to the detection in Fig. 2, the miss and error-detection events
for each PFG are introduced, which are denoted as
M
and
D
, respectively.
1
In the following, all BPL decoders are used
in the serial structure unless stated otherwise. With regard to
the PFG selection, previous works have found that the OFG
always yields the best error-correcting performance [14] and the
PFGs which fix more left stages and only permute the right-
1M
represents a wrongly estimated information sequence which passes
the detection, and
D
represents the event that fails to pass the detection. We
generally use CRC detection as the detection strategy in serial BPL decoding.
摘要:

1High-ThroughputFlexibleBeliefPropagationListDecoderforPolarCodesYuqingRen,YifeiShen,LeyuZhang,AndreasToftegaardKristensen,AlexiosBalatsoukas-Stimming,Member,IEEE,AndreasBurg,SeniorMember,IEEE,ChuanZhang,SeniorMember,IEEEAbstract—Owingtoitshighparallelism,beliefpropagation(BP)decodingishighlyamenabl...

展开>> 收起<<
1 High-Throughput Flexible Belief Propagation List Decoder for Polar Codes.pdf

共14页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:14 页 大小:3.78MB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 14
客服
关注