1 High-Throughput Flexible Belief Propagation List Decoder for Polar Codes

2025-04-30 0 0 3.78MB 14 页 10玖币

侵权投诉

High-Throughput Flexible Belief Propagation

List Decoder for Polar Codes

Yuqing Ren, Yifei Shen, Leyu Zhang, Andreas Toftegaard Kristensen, Alexios

Balatsoukas-Stimming, Member, IEEE, Andreas Burg, Senior Member, IEEE, Chuan Zhang, Senior Member, IEEE

Abstract—Owing to its high parallelism, belief propagation

(BP) decoding is highly amenable to high-throughput implemen-

tations and thus represents a promising solution for meeting

the ultra-high peak data rate of future communication systems.

However, for polar codes, the error-correcting performance of

BP decoding is far inferior to that of the widely used CRC-aided

successive cancellation list (SCL) decoding algorithm. To close the

performance gap to SCL, BP list (BPL) decoding expands the

exploration of candidate codewords through multiple permuted

factor graphs (PFGs). From an implementation perspective,

designing a uniﬁed and ﬂexible hardware architecture for BPL

decoding that supports various PFGs and code conﬁgurations

presents a big challenge. In this paper, we propose the ﬁrst

hardware implementation of a BPL decoder for polar codes and

overcome the implementation challenge by applying a hardware-

friendly algorithm that generates ﬂexible permutations on-the-ﬂy.

First, we derive the graph selection gain and provide a sequential

generation (SG) algorithm to obtain a near-optimal PFG set.

We further prove that any permutation can be decomposed

into a combination of multiple ﬁxed routings, and we design

a low-complexity permutation network to satisfy the decoding

schedule. Our BPL decoder not only has a low decoding latency

by executing the decoding and permutation generation in parallel,

but also supports an arbitrary list size without any area overhead.

Experimental results show that, for length-1024 polar codes with

a code rate of one-half, our BPL decoder with 32 PFGs has

a similar error-correcting performance to SCL with a list size

of 4and achieves a throughput of 25.63 Gbps and an area

efﬁciency of 29.46 Gbps/mm2at SNR = 4.0dB, which is

1.82×and 4.33×faster than the state-of-the-art BP ﬂip and

SCL decoders, respectively.

Index Terms—polar codes, high-throughput, belief propagation

list (BPL) decoding, permuted factor graph, permutation, auto-

morphism ensemble, hardware implementation.

I. INTRODUCTION

OLAR codes, proposed by Arıkan in [1], have become

an integral part of 5G new radio (NR), where they

were ratiﬁed as the standard codes for the control channels

of 5G enhanced mobile broadband (eMBB) scenarios [2].

Along with the invention of polar codes, Arıkan introduced

successive cancellation (SC) decoding and belief propagation

Y. Ren, Y. Shen, A. T. Kristensen, and A. Burg are with the Telecom-

munications Circuits Laboratory (TCL),

Ecole Polytechnique F

erale

de Lausanne (EPFL), Lausanne 1015, Switzerland (email:

{

yuqing.ren,

yifei.shen, andreas.kristensen, andreas.burg

}

@epﬂ.ch). Corresponding author:

Andreas Burg.

Y. Shen, L, Zhang and C. Zhang are with the LEADS of Southeast University,

the National Mobile Communications Research Laboratory, and the Purple

Mountain Laboratories, Nanjing 210096, China (email: chzhang@seu.edu.cn).

A. Balatsoukas-Stimming is with the Department of Electrical Engineering,

Eindhoven University of Technology, 5600 MB Eindhoven, The Netherlands

(email: a.k.balatsoukas.stimming@tue.nl).

(BP) decoding. Following the evolution of communication

scenarios, both SC and BP decoding led the development of

polar decoding algorithms and implementations, which were

extended into a series of advanced polar decoders such as

SC list (SCL) [3]–[9], BP list (BPL) [10]–[18], and BP ﬂip

(BPF) [19]–[22] decoders.

While the original SC decoding algorithm can achieve

channel capacity at inﬁnite code lengths, it shows poor error-

correcting performance with practical ﬁnite code lengths. To

improve the error-correcting performance of SC decoding, SCL

decoding was proposed in [3] to keep a list of up to

candi-

date codewords. Additionally, when concatenated with cyclic

redundancy check (CRC) codes [4], polar codes with SCL

decoding outperform low-density parity-check (LDPC) and

Turbo codes in terms of the error-correcting performance [23].

To satisfy the low-latency and high throughput requirements of

eMBB scenarios, node-based fast SCL decoders [5]–[9] focus

on exploiting special constituent codes [24]–[28], which help

to avoid traversing the lower stages of the decoding tree to

provide a signiﬁcant reduction in decoding latency compared

to conventional bit-wise SCL decoders. The state-of-the-art

(SOA) node-based SCL decoder [9] with a list size (

L= 8

)

achieves a throughput of more than

2.94

Gbps, which ﬁts

the reliability, latency, and throughput requirements of eMBB

scenarios. However, when considering the ultra-high peak data

rate requirements of future communication systems [29], SC-

based decoders become impractical due to the serial processing

inherent in these algorithms [5]–[9].

In contrast to SC-based decoding, BP decoding is an

inherently parallel algorithm. BP decoding can thus be im-

plemented easily in a multi-stage factor graph in pursuit of

a much higher throughput [30]. Additionally, BP decoding

has the potential to realize iterative detection and decoding

to achieve better system performance than separate detection

and decoding [31], [32], which further raises the interest in

BP decoding for academia and industry. Though the error-

correcting performance of BP decoding improves as the

iteration number increases, it is still far behind the SCL

performance. BPF decoding [19]–[22] and BPL decoding [10]–

[18] are two advanced BP algorithms that can approach the

performance of SCL by expanding the exploration of candidate

codewords. BPF decoding guesses the positions of error-prone

bits and sequentially corrects them in additional decoding

attempts. Unfortunately, online identiﬁcation of error-prone

bits [20]–[22] through sorting and post-processing of channel

messages increases the hardware complexity and degrades

the maximum operating frequency [22]. Alternatively, BPL

arXiv:2210.13887v2 [cs.IT] 19 Mar 2023

decoding proposed in [11] tries to decode on multiple permuted

factor graphs (PFGs), where the number of possible PFGs

n! (n= log2N)

for length-

polar codes. Decoding

schedules of BPL can be divided into parallel [11]–[14] and

serial schedules [17], respectively. In parallel BPL decoding,

independent BP decoders operate in parallel (each BP decoder

works on a unique PFG) and the optimal codeword with the

minimum Euclidean distance to the received signals is selected

from the

identiﬁed candidate codewords. However, parallel

BPL decoding has very poor hardware utilization, especially

for large list sizes. To avoid the high hardware consumption

caused by the parallel architecture, the authors of [17] proposed

a serial BPL decoding schedule, in which shufﬂing the input

LLRs can be substituted for permutations of the factor graph

stages. This hardware-friendly decoding strategy allows BPL

decoding to reuse a single BP decoder at the cost of merely

shufﬂing the input LLRs into a speciﬁc order for each PFG.

To improve the error-correcting performance of BPL decod-

ing, numerous researchers have explored methods of optimizing

the PFG selection, including empirical methods [11], [17],

[33] and analytical methods [14]–[16]. It is noteworthy that

the authors of [14] ﬁrst derived the permutation gain for

parallel BPL decoding, which provides the inspiration for

analytically solving the optimal PFG selection. In view of

hardware implementations, many works of BP decoders have

been presented in [30], [34]–[39]. Compared to the classical

single-column BP architectures [30], the SOA double-column

bidirectional-propagation architecture [39] instantiates two

processing element (PE) arrays and propagates the left-to-

right and right-to-left messages simultaneously to improve the

throughput. Moreover, the most challenging task for the BPL

decoder is the implementation of ﬂexible permutations since

the PFG selection algorithms [14]–[16], [40] are generally

dynamic, corresponding to varying code conﬁgurations or

channel environments. Even if based on area-efﬁcient serial

decoding, the BPL decoder still needs to support the generation

of ﬂexible permutations by shufﬂing the input LLRs into a

speciﬁc order for each PFG. A straightforward method is to

utilize the Bene

s network [41], [42], which is an optimal non-

blocking network that can achieve any arbitrary permutation.

However, the design space of permutations is

instead of

for length-

polar codes, and the control signals of the

Bene

s network are difﬁcult to generate on-the-ﬂy for each

PFG. It is not efﬁcient to adopt the Bene

s network in the BPL

decoder. In summary, there are thus two critical problems for

the BPL decoder:

•

How to select the optimal PFG set from

PFGs for

length-Npolar codes?

•

How to efﬁciently implement ﬂexible permutations for

BPL decoding in hardware?

It is further noteworthy that BPL decoding is a particular case of

a generalized automorphism ensemble (AE) decoding, in which

we can deploy the SC, SCL, or BP decoding on multiple PFGs

to achieve ML performance of polar or Reed-Muller (RM)

codes [43], [44]. Hence, the solutions to these two problems

are signiﬁcant for both BPL and for generalized AE decoding.

Contributions:

In this paper, we present the ﬁrst BPL implementation, which

solves the aforementioned two problems, i.e., the use of near-

optimal PFG sets and the generation of ﬂexible permutations.

Our contributions comprise the following:

•

We derive the block error probability of serial BPL

decoding and present a criterion for determining the

best PFG set. Then, we propose a sequential generation

(SG) algorithm that can efﬁciently obtain a near-optimal

PFG set. Simulations show that our BPL decoder with

L= 32

achieves similar error-correcting performance to

SCL with L= 4.

•

We propose a hardware-friendly algorithm using low-

complexity matrix decomposition to generate ﬂexible

permutation routings for all PFGs. To this end, we

provide a mathematical model for permutations and

demonstrate that the permutation routing of each PFG can

be decomposed into a combination of

n−1

ﬁxed sub-

routings. This decomposition process can be done online.

•

We present the ﬁrst hardware architecture of a BPL

decoder, based on the double-column bidirectional-

propagation scheme [39], that incorporates the aforemen-

tioned ﬂexible permutation generator. To improve the

throughput of the BPL decoder, we adopt a decoupled

strategy that enables BP decoding and permutation gener-

ation to be executed simultaneously. It is noteworthy that

our decoder can increase the list size arbitrarily without

any additional area overhead. Synthesis results show that,

for

L= 32

, our decoder can achieve a throughput of

25.63

Gbps with an area efﬁciency of

29.46

Gbps/mm

at SNR

= 4

dB, which outperforms the SOA BP and BPF

decoders [20], [21], [38], [39], [45].

The remainder of this paper is organized as follows. Sec-

tion II reviews the background of polar codes, BP decoding, and

BPL decoding. Section III analyses the permutation gain for

serial BPL decoding and presents a graph selection algorithm

for a near-optimal PFG set. In Section IV, a hardware-friendly

algorithm for any permutation generation is proposed. Section V

presents our BPL decoder architecture with several advanced

techniques. Section VI provides our implementation results

and compares them with the SOA polar decoders. Finally,

Section VII concludes this paper.

II. PRELIMINARIES

Notation: Throughout this paper, we use the following

symbol deﬁnitions. Boldface lowercase letters

denote vectors,

where

means the

-th element of

and

denotes the

sub-vector

[uiui+1 . . . uj], i ≤j

. If

i>j

i=∅

Boldface uppercase letters

denote matrices, where

Bij

and

denote the element at the

-th row and

-th column of

and the

-th column of

, respectively. In terms of the

factor graph for polar codes with length

N= 2n

, we use

πo

[m0m1m2. . . mn−1]

, and

[0 1 2 . . . n −1]

to represent

the original factor graph (OFG), its stages, and its stage order,

respectively. Similarly, we use

[mπ0mπ1mπ2. . . mπn−1]

and

[π0π1π2. . . πn−1]

to denote any other PFG, its stages,

and its stage order, respectively. If

is a set of

PFG

(000)

(001)

(010)

(011)

(100)

(101)

(110)

(111)

(000)

(001)

(010)

(011)

(100)

(101)

(110)

(111)

i,j

−

i,j+

i ,j

−

i ,j++

i ,j+

i,j

Fig. 1. The OFG for length-

polar codes, where one PE is marked in red

and F={0,1,2,4}is marked in grey.

candidates,

{˜π0˜π1. . . ˜πL−1}

|L| =L

means its cardinality.

Note that all indices related to decoding start from

. The

hard decision function is deﬁned as

HD(x)=1

x < 0

and

HD(x) = 0

x≥0

. We adopt the following parameters

for polar codes,

is the code length,

is the number of

message bits,

R=K/N

the code rate,

the number of CRC

bits,

K0=K+P

the number of information bits with the

CRC bits attached. The frozen and unfrozen bit set indices are

denoted as

and

, respectively, and we refer to a code as

(N, K)

polar code. As in this work we only consider polar

codes that are concatenated with CRC codes, we use the term

SCL decoding to refer to CRC-aided SCL decoding for brevity.

A. Construction and Encoding of Polar Codes

Given an input bit sequence

, the encoded vector

generated by

x=u·GN

, where

GN=F⊗n

denotes the

Kronecker power of the kernel

F= [ 1 0

1 1 ]

. Based on the

principle of channel polarization [1], the

bits in

correspond

coordinated bit channels with different reliabilities, where

the

most reliable bit channels transmit unfrozen bits with

CRC attached and the remaining

N−K0

bit channels transmit

frozen bits, typically set to a value of 0. Note that the metric

used to determine the bit channel reliability has an impact on

and inﬂuences the performance of polar codes. For 5G NR [46],

a universal reliability sequence is applied to formulate

with

bits for uplink (UL) and downlink (DL) channels. Besides,

a novel polar code construction framework tailored to a given

decoding algorithm based on a genetic algorithm (GenAlg) was

introduced in [47], where populations of unfrozen sets evolve

based on the error-correcting performance of a given decoder.

B. BP Decoding of Polar Codes on the Factor Graph

The BP algorithm is a classical iterative algorithm to

calculate the marginal probability by the sum-product (SP)

equations on a factor graph [48]. Motivated by BP decoding

for RM codes, Arıkan ﬁrst proposed BP decoding for polar

codes on the generator matrix-based factor graph [49]. The

OFG structure with three stages [

m0m1m2

] is shown in Fig. 1.

Namely, an

(N, K)

polar code is represented as an

-stage

factor graph, and each stage has

N/2

PEs. Two types of LLR

messages (left-to-right

and right-to-left

) are propagated over

Decoding

on new PFG Detection

Yes

No Detection

Yes

No ...

Input

...

Decoding

on new PFG

Decoding

on new PFG Decoding

on new PFG

Decoding

on new PFG

Output

Fig. 2. Overall framework for serial BPL decoding with the detection.

PEs on the factor graph. At the

-th iteration, for

j= 0, . . . , n

and

can be denoted as the

-th column of

- and

messages, respectively, where

i,j

and

i,j

denote the messages

at the

-th bit index of the

-th column, respectively. Each PE

propagates R- and L-messages as follows [50].











i,j =g(Lt−1

i,j+1,Lt−1

i+2j,j+1 +Rt

i+2j,j , βL),

i+2j,j =g(Lt−1

i,j+1,Rt

i,j , βL) + Lt−1

i+2j,j+1,

i,j+1 =g(Rt

i,j ,Lt−1

i+2j,j+1 +Rt

i+2j,j , βR),

i+2j,j+1 =g(Rt

i,j ,Lt−1

i,j+1, βR) + Rt

i+2j,j .

(1)

where we adopt the offset-MS (OMS) equation [20], [51] to

approximate the SP equation for all iterative BP decoders

It can be implemented easily in hardware to approach the

performance of the SP, where

g(a, b, β) = sgn(a)·sgn(b)·

max(min(|a|,|b|)−β, 0)

and

[βRβL] = [0.25 0]

. At the

beginning of BP decoding,

is initialized as a-priori

+∞

according to the bit channel allocation of

and

initialized as a-posterior LLR values from the received signals

, i.e.,

ln Pr(yi|xi=0)

Pr(yi|xi=1) ,0≤i≤N−1

- and

-messages of

other stages on the factor graph are initialized as

. When the

maximum number of iteration

Imax

is reached, the HD results

ˆu

are estimated based on the decision LLRs (

RImax−1

0+LImax−1

In the following, we omit the iteration index

t= 0

of the initial

LLRs R0

0and L0

nfor brevity.

C. BPL Decoding of Polar Codes

BPL decoding [11] is an efﬁcient algorithm to enhance the

error-correcting performance of BP decoding, which executes

multiple BP decoding procedures on multiple PFGs either

in parallel [11]–[14] or serially [17]. Parallel BPL decoding

instantiates a set

independent BP decoders (each BP

decoder works on a unique PFG), which leads to a poor

hardware utilization since only one result is ﬁnally retained.

Alternatively, serial BPL decoding can reuse a single BP

decoder, which is illustrated in Fig. 2. If a BP decoding attempt

fails to pass the detection within

Imax

iterations, serial BPL

decoding activates the decoding on next PFG. Note that due

to the detection in Fig. 2, the miss and error-detection events

for each PFG are introduced, which are denoted as

and

, respectively.

In the following, all BPL decoders are used

in the serial structure unless stated otherwise. With regard to

the PFG selection, previous works have found that the OFG

always yields the best error-correcting performance [14] and the

PFGs which ﬁx more left stages and only permute the right-

represents a wrongly estimated information sequence which passes

the detection, and

represents the event that fails to pass the detection. We

generally use CRC detection as the detection strategy in serial BPL decoding.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

1High-ThroughputFlexibleBeliefPropagationListDecoderforPolarCodesYuqingRen,YifeiShen,LeyuZhang,AndreasToftegaardKristensen,AlexiosBalatsoukas-Stimming,Member,IEEE,AndreasBurg,SeniorMember,IEEE,ChuanZhang,SeniorMember,IEEEAbstractOwingtoitshighparallelism,beliefpropagation(BP)decodingishighlyamenabl...

展开>> 收起<<

1 High-Throughput Flexible Belief Propagation List Decoder for Polar Codes.pdf

共14页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

1 High-Throughput Flexible Belief Propagation List Decoder for Polar Codes

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: