
BIT ERROR AND BLOCK ERROR RATE TRAINING FOR ML-ASSISTED COMMUNICATION
Reinhard Wiesmayr ?,1, Gian Marti ?,1, Chris Dick2, Haochuan Song3, and Christoph Studer1
?equal contribution; 1ETH Zurich, 2NVIDIA, 3Southeast University
E-mail: wiesmayr@iis.ee.ethz.ch, marti@iis.ee.ethz.ch, cdick@nvidia.com, hcsong@seu.edu.cn, studer@ethz.ch
ABSTRACT
Even though machine learning (ML) techniques are being
widely used in communications, the question of how to train
communication systems has received surprisingly little atten-
tion. In this paper, we show that the commonly used binary
cross-entropy (BCE) loss is a sensible choice in uncoded sys-
tems, e.g., for training ML-assisted data detectors, but may not
be optimal in coded systems. We propose new loss functions
targeted at minimizing the block error rate and SNR deweight-
ing, a novel method that trains communication systems for
optimal performance over a range of signal-to-noise ratios.
The utility of the proposed loss functions as well as of SNR
deweighting is shown through simulations in NVIDIA Sionna.
1. INTRODUCTION
Machine learning (ML) has revolutionized a large number of
fields, including communications. The availability of software
frameworks, such as TensorFlow [1] and, recently, NVIDIA
Sionna [2], has made implementation and training of ML-
assisted communication systems convenient. Existing results
in ML-assisted communication systems range from the atom-
istic improvement of data detectors (e.g., using deep
unfolding)
[3
–
6] to model-free learning of end-to-end communication sys-
tems [7
–
9]. Quite surprisingly, only little attention has been
devoted to the question of how ML-assisted communication
systems should be trained. In particular, the choice of the cost
function is seldom discussed (see, e.g., the recent overview
papers [10,11]) and—given the similarity between communi-
cation and classification—one usually resorts to an empirical
cross-entropy (CE) loss [12
–
17]. The question of training a
communication system for good performance over a range of
signal-to-noise ratios (SNRs) is another issue that has not been
seriously investigated. Systems are usually trained on samples
from only one SNR [3,8], or on samples uniformly drawn from
the targeted SNR range [4,14,16], apparently without ques-
tioning how this may affect performance for different SNRs.
In this paper, we investigate how ML-assisted communi-
cation systems should be trained. We first consider the case
where the intended goal is to minimize the uncoded bit error
A shorter version of this paper has been submitted to the 2023 IEEE Inter-
national Conference on Acoustics, Speech, and Signal Processing (ICASSP).
All code and simulation scripts to reproduce the results of this paper are
available on GitHub: https://github.com/IIP-Group/BLER_Training
The authors thank Oscar Castañeda for comments and suggestions.
rate (BER) and discuss why the empirical binary cross-entropy
(BCE) loss is indeed a sensible choice in uncoded systems,
e.g., for data detectors in isolation. However, in most practical
communication applications, the relevant figure of merit is the
(coded) block error rate (BLER), as opposed to the BER, since
block errors cause undesirable retransmissions [18, Sec. 9.2],
whereas (coded) bit errors themselves are irrelevant.
1
We un-
derpin that minimizing the (coded) BER is not equivalent to
minimizing the BLER. This observation calls into question
the common practice of training coded systems with loss func-
tions that penalize individual bit errors (such as the empirical
BCE), and thus optimize for the (irrelevant) coded BER in-
stead of the BLER. In response, we propose a range of novel
loss functions that aim at minimizing the BLER by penaliz-
ing bit errors jointly. We also show that training on samples
that are uniformly drawn from a target SNR range will focus
primarily on the low-SNR region while neglecting high-SNR
performance. As a remedy, we propose a new technique called
SNR deweighting. We evaluate the impact of the different loss
functions as well as of SNR deweighting through simulations
in NVIDIA Sionna [2].
2. TRAINING FOR BIT ERROR RATE
ML-assisted communication systems are typically trained with
a focus on minimizing the (uncoded) BER [4, 16], under a
tacit assumption that the learned system could then be used in
combination with a forward error correction (FEC) scheme to
ensure reliable communication.
2
Due to the similarity between
detection and classification, the strategy typically consists of
(approximately) minimizing the empirical BCE
3
on a training
set
D={(b(n),y(n))}N
n=1
, where
b= (b1, . . . , bK)
is the
vector of bits of interest (even in uncoded systems, one is inter-
ested in multiple bits, e.g., when using higher-order constella-
tions, multiple OFDM subcarriers, or multi-user transmission),
y∈ Y
is the channel output, and
n
is the sample index. In
fact, this strategy appears to be so obvious that it is often not
motivated—let alone questioned—at all.
1
For this reason, physical layer (PHY) quality-of-service is assessed only
in terms of BLER (not BER) in 3GPP LTE and other standards. Reference [19]
notes that the relation between BER and BLER can be inconsistent.
2
The discussion also applies to systems that already include FEC, but we
argue in Secs. 1 and 3 that minimizing the coded BER is a category mistake.
3
When we speak of the BCE between vectors, we mean the sum of bi-
nary CEs between the individual components as defined in
(3)
, and not the
categorical CE between the bit-vector and its estimate (as used, e.g., in [7
–
9]).
arXiv:2210.14103v3 [cs.IT] 6 Mar 2023