BIT ERROR AND BLOCK ERROR RATE TRAINING FOR ML-ASSISTED COMMUNICATION Reinhard Wiesmayr1 Gian Marti1 Chris Dick2 Haochuan Song3 and Christoph Studer1 equal contribution1ETH Zurich2NVIDIA3Southeast University

2025-05-06 3 0 771.37KB 8 页 10玖币

侵权投诉

BIT ERROR AND BLOCK ERROR RATE TRAINING FOR ML-ASSISTED COMMUNICATION

Reinhard Wiesmayr ?,1, Gian Marti ?,1, Chris Dick2, Haochuan Song3, and Christoph Studer1

?equal contribution; 1ETH Zurich, 2NVIDIA, 3Southeast University

E-mail: wiesmayr@iis.ee.ethz.ch, marti@iis.ee.ethz.ch, cdick@nvidia.com, hcsong@seu.edu.cn, studer@ethz.ch

ABSTRACT

Even though machine learning (ML) techniques are being

widely used in communications, the question of how to train

communication systems has received surprisingly little atten-

tion. In this paper, we show that the commonly used binary

cross-entropy (BCE) loss is a sensible choice in uncoded sys-

tems, e.g., for training ML-assisted data detectors, but may not

be optimal in coded systems. We propose new loss functions

targeted at minimizing the block error rate and SNR deweight-

ing, a novel method that trains communication systems for

optimal performance over a range of signal-to-noise ratios.

The utility of the proposed loss functions as well as of SNR

deweighting is shown through simulations in NVIDIA Sionna.

1. INTRODUCTION

Machine learning (ML) has revolutionized a large number of

ﬁelds, including communications. The availability of software

frameworks, such as TensorFlow [1] and, recently, NVIDIA

Sionna [2], has made implementation and training of ML-

assisted communication systems convenient. Existing results

in ML-assisted communication systems range from the atom-

istic improvement of data detectors (e.g., using deep

unfolding)

–

6] to model-free learning of end-to-end communication sys-

tems [7

–

9]. Quite surprisingly, only little attention has been

devoted to the question of how ML-assisted communication

systems should be trained. In particular, the choice of the cost

function is seldom discussed (see, e.g., the recent overview

papers [10,11]) and—given the similarity between communi-

cation and classiﬁcation—one usually resorts to an empirical

cross-entropy (CE) loss [12

–

17]. The question of training a

communication system for good performance over a range of

signal-to-noise ratios (SNRs) is another issue that has not been

seriously investigated. Systems are usually trained on samples

from only one SNR [3,8], or on samples uniformly drawn from

the targeted SNR range [4,14,16], apparently without ques-

tioning how this may affect performance for different SNRs.

In this paper, we investigate how ML-assisted communi-

cation systems should be trained. We ﬁrst consider the case

where the intended goal is to minimize the uncoded bit error

A shorter version of this paper has been submitted to the 2023 IEEE Inter-

national Conference on Acoustics, Speech, and Signal Processing (ICASSP).

All code and simulation scripts to reproduce the results of this paper are

available on GitHub: https://github.com/IIP-Group/BLER_Training

The authors thank Oscar Castañeda for comments and suggestions.

rate (BER) and discuss why the empirical binary cross-entropy

(BCE) loss is indeed a sensible choice in uncoded systems,

e.g., for data detectors in isolation. However, in most practical

communication applications, the relevant ﬁgure of merit is the

(coded) block error rate (BLER), as opposed to the BER, since

block errors cause undesirable retransmissions [18, Sec. 9.2],

whereas (coded) bit errors themselves are irrelevant.

We un-

derpin that minimizing the (coded) BER is not equivalent to

minimizing the BLER. This observation calls into question

the common practice of training coded systems with loss func-

tions that penalize individual bit errors (such as the empirical

BCE), and thus optimize for the (irrelevant) coded BER in-

stead of the BLER. In response, we propose a range of novel

loss functions that aim at minimizing the BLER by penaliz-

ing bit errors jointly. We also show that training on samples

that are uniformly drawn from a target SNR range will focus

primarily on the low-SNR region while neglecting high-SNR

performance. As a remedy, we propose a new technique called

SNR deweighting. We evaluate the impact of the different loss

functions as well as of SNR deweighting through simulations

in NVIDIA Sionna [2].

2. TRAINING FOR BIT ERROR RATE

ML-assisted communication systems are typically trained with

a focus on minimizing the (uncoded) BER [4, 16], under a

tacit assumption that the learned system could then be used in

combination with a forward error correction (FEC) scheme to

ensure reliable communication.

Due to the similarity between

detection and classiﬁcation, the strategy typically consists of

(approximately) minimizing the empirical BCE

on a training

set

D={(b(n),y(n))}N

n=1

, where

b= (b1, . . . , bK)

is the

vector of bits of interest (even in uncoded systems, one is inter-

ested in multiple bits, e.g., when using higher-order constella-

tions, multiple OFDM subcarriers, or multi-user transmission),

y∈ Y

is the channel output, and

is the sample index. In

fact, this strategy appears to be so obvious that it is often not

motivated—let alone questioned—at all.

For this reason, physical layer (PHY) quality-of-service is assessed only

in terms of BLER (not BER) in 3GPP LTE and other standards. Reference [19]

notes that the relation between BER and BLER can be inconsistent.

The discussion also applies to systems that already include FEC, but we

argue in Secs. 1 and 3 that minimizing the coded BER is a category mistake.

When we speak of the BCE between vectors, we mean the sum of bi-

nary CEs between the individual components as deﬁned in

(3)

, and not the

categorical CE between the bit-vector and its estimate (as used, e.g., in [7

–

9]).

arXiv:2210.14103v3 [cs.IT] 6 Mar 2023

2.1. Minimizing the BCE Learns the Posterior Marginals

An “ML style” justiﬁcation is to note that the expected BCE

between the bit vector

and its estimate

f(y)=(f1, . . . , fK)

can be written as

PkH(bk|y) + EyD(pbk|ykfk)

, where

H(·|·)

and

D(·k·)

are the conditonal and relative entropy. The

expected BCE is thus minimized when the estimates

fk(y)

equal the true posterior marginals

pbk|y

Once the posterior is

learned, simple thresholding (at

) results in BER-optimal data

detection. The expected BCE is not available, but resorting to

an empirical proxy through stochastic gradient descent is so

common by now that it is often not even mentioned anymore.

We now argue explicitly—using the framework of empiri-

cal risk minimization (ERM)—that minimizing the empirical

(as opposed to the expected) BCE can learn the true posterior

marginals. We do not claim that this result is “novel,” but an

explicit derivation seems unavailable in the literature. In the

ERM framework, one learns a function

f= arg minf∈F L(f,D),(1)

where

F ⊆ {f:Y → [0,1]K}

is the set of admissible func-

tions f= (f1, . . . , fK)and

L(f,D) = Pn=1,...,N lBCE(b(n),f(y(n))),(2)

is the empirical risk, which here is induced by the BCE loss

lBCE(b,f) = −PK

k=1 bklog(fk) + (1−bk) log(1 −fk).(3)

In principle, the empirical risk would be minimal if

f(y(n)) = b(n), n = 1, . . . , N. (4)

The optimal

would therefore make hard decisions on the

training data set that—with hindsight—are always right. How-

ever, there are a priori no restrictions on how such a function

responds to an input

that is not contained in

: We are at

the danger of overﬁtting. ERM with a BCE loss may therefore

be a reasonable strategy primarily in one of the following two

settings: Either

is “inﬂexible” or the range

Y 3 y

is “small”

compared to

. In either case,

(4)

cannot be satisﬁed and over-

ﬁtting is prevented.

The ﬁrst case is more relevant in practice

but more difﬁcult to analyze. We therefore focus on the second

case, which we formalize through the following assumption:

Assumption 1.

We assume that

is large and representative

of the underlying posterior marginals

pbk|y

in the sense that,

for some 0<ε<1and for all kand all (b, y)∈ {0,1}×Y,



pbk|y(b= 1|y)−1

|N (y)|Pn∈N (y)b(n)

k



≤ε, (5)

where N(y) = {n∈ {1, . . . , N}:y(n)=y}.

This assumes that the transmitter is not trainable, so that

H(b|y)

is a

constant. See [20] for a discussion that includes trainable transmitters.

It has been argued that learned systems may also generalize to new inputs

even when they achieve perfect accuracy on the training dataset [21,22]. An

investigation of such settings is, however, beyond the scope of this paper.

Proposition 1.

Under Ass. 1, ERM with

F={f:Y →[0,1]K}

and BCE loss learns the posterior marginals up to precision

|pbk|y(b= 1|y)−ˆ

fk(y)| ≤ ε, ∀y∈ Y, k = 1, . . . , K. (6)

The proof of this proposition (as well as of all following

propositions) is shown in Sec. 7.1.

It should be interesting to translate this result to the case

where

is uncountable but

is “inﬂexible,” or even to the

interpolating case described in [21]. We also note that, while

the BCE is the most natural and probably most widely used

loss in this context, it is by no means the only option. In fact,

an analogous version of Prop. 1 holds for the mean square error

(MSE) loss lMSE :{0,1}K×[0,1]K,(b,f)7→ kb−fk2

2/K.

Proposition 2.

Under Ass. 1, ERM with

F={f:Y →[0,1]K}

and MSE loss learns the posterior marginals up to precision

|pbk|y(b= 1|y)−ˆ

fk(y)| ≤ ε, ∀y∈ Y, k = 1, . . . , K. (7)

2.2. Posterior vs. Posterior Marginals

We now draw attention to a subtle but conceptually important

point: The loss in

(3)

considers the sum of empirical BCEs

between the individual components of

and

, and we have

shown that this loss can be used to learn the posterior marginals

pbk|y, k = 1, . . . , K

. But this is not equivalent to learning the

joint posterior

pb|y

, since we do not learn the conditional

dependencies between the different bits

. As a consequence

of the summation of the component BCEs,

approximates

the posterior as a product of independent distributions. For an

information-theoretic perspective, see also Sec. 7.2.

3. TRAINING FOR BLOCK ERROR RATE

3.1. The Difference Between BER and BLER Optimality

Learning to minimize the BLER in (block-)coded systems is

not tantamount with learning to minimize the BER in those

systems. To see this, consider a (block-)coded system in which

the bits

b= (b1, . . . , bK)

are encoded into codewords

enc(b)∈ C

for reliable data transmission. (In contrast to

Sec. 2, we now look at multiple bits from the same data stream.)

Optimal (coded) BER is obtained when we decode on the

basis of the posterior probabilities

p(bk|y)

, which—as we

have seen—can be learned, e.g., with a BCE loss function:

bk= arg maxbk∈{0,1}pbk|y(bk|y), k = 1, . . . , K. (8)

Perhaps surprisingly, this need not coincide with BLER-

optimal decoding, which is achieved by the decoding rule

b=dec(arg maxc∈C pc|y(c|y)),(9)

where

dec=enc−1

is the inverse mapping of the encoder. The

reason is as follows: Even though the data bits

may be inde-

pendent a priori, their conditional distribution given the chan-

nel output,

pb|y(b|y)

, is in general no longer so,

pb|y(b|y)6=

Qk=1,...,K pbk|y(bk|y). We have the following result:

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

BITERRORANDBLOCKERRORRATETRAININGFORML-ASSISTEDCOMMUNICATIONReinhardWiesmayr?;1,GianMarti?;1,ChrisDick2,HaochuanSong3,andChristophStuder1?equalcontribution;1ETHZurich,2NVIDIA,3SoutheastUniversityE-mail:wiesmayr@iis.ee.ethz.ch,marti@iis.ee.ethz.ch,cdick@nvidia.com,hcsong@seu.edu.cn,studer@ethz.chABST...

展开>> 收起<<

BIT ERROR AND BLOCK ERROR RATE TRAINING FOR ML-ASSISTED COMMUNICATION Reinhard Wiesmayr1 Gian Marti1 Chris Dick2 Haochuan Song3 and Christoph Studer1 equal contribution1ETH Zurich2NVIDIA3Southeast University.pdf

共8页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

BIT ERROR AND BLOCK ERROR RATE TRAINING FOR ML-ASSISTED COMMUNICATION Reinhard Wiesmayr1 Gian Marti1 Chris Dick2 Haochuan Song3 and Christoph Studer1 equal contribution1ETH Zurich2NVIDIA3Southeast University

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: