1
Fast Beam Alignment via Pure Exploration in
Multi-armed Bandits
Yi Wei, Zixin Zhong, and Vincent Y. F. Tan, Senior Member, IEEE
Abstract—The beam alignment (BA) problem consists in accu-
rately aligning the transmitter and receiver beams to establish a
reliable communication link in wireless communication systems.
Existing BA methods search the entire beam space to identify
the optimal transmit-receive beam pair. This incurs a significant
latency when the number of antennas is large. In this work, we
develop a bandit-based fast BA algorithm to reduce BA latency
for millimeter-wave (mmWave) communications. Our algorithm
is named Two-Phase Heteroscedastic Track-and-Stop (2PHT&S).
We first formulate the BA problem as a pure exploration problem
in multi-armed bandits in which the objective is to minimize the
required number of time steps given a certain fixed confidence
level. By taking advantage of the correlation structure among
beams that the information from nearby beams is similar and
the heteroscedastic property that the variance of the reward of an
arm (beam) is related to its mean, the proposed algorithm groups
all beams into several beam sets such that the optimal beam set
is first selected and the optimal beam is identified in this set after
that. Theoretical analysis and simulation results on synthetic and
semi-practical channel data demonstrate the clear superiority of
the proposed algorithm vis-`
a-vis other baseline competitors.
Index Terms—Beam alignment, beam selection, mmWave com-
munication, multi-armed bandits.
I. INTRODUCTION
In millimeter-wave (mmWave) communications, the beams
at both the transmitter and receiver are narrow directional.
The beam alignment (BA) problem consists in ensuring that
the transmitter and receiver beams are accurately aligned to
establish a reliable communication link [2], [3]. An optimal
transmitter (receiver) beam or a pair of transmitter and receiver
beam is required to be selected to maximize the overall
capacity, throughput, SNR or diversity. To achieve this goal, a
na¨
ıve exhaustive search method scans all the beam space and
hence causes significant BA latency. Specifically, there are
two fundamental challenges to implement BA: (1) the amount
of additional channel state information (CSI) corresponding
to each beam (pair) is large; (2) the frequency of the channel
measurement [4], [5] is high. These challenges become even
Y. Wei is with the College of Information Science and Electronic
Engineering, Zhejiang University and the Zhejiang Provincial Key Lab-
oratory of Information Processing, Communication and Networking (IP-
CAN), Hangzhou 310027, China (email: 21731133@zju.edu.cn). Z. Zhong
is with the Department of Computing Science, University of Alberta (email:
zzhong10@ualberta.ca). V. Y. F. Tan is with Department of Mathematics and
the Department of Electrical and Computer Engineering, National University
of Singapore (email: vtan@nus.edu.sg).
This work is funded by a Singapore Ministry of Education Tier 1 grant
(A-0009042-01-00), the Fundamental Research Funds for the Central Univer-
sities 226-2022-00195, and the Defense Industrial Technology Development
Program under Grant JCKY2020210B021.
This paper was presented in part at the 2022 International Symposium on
Information Theory [1].
more difficult to overcome when a large number of anten-
nas are employed at both of the transmitter and receiver.
Moreover, the change of antenna orientation, the effect of
transmitter/reciever mobility, and the dynamic nature of the
wireless channel will result in previously found optimal beam
pairs to be suboptimal over time, which further exacerbates
the latency problem [6], [7].
Therefore, the design of fast BA algorithms is of paramount
importance, and has stimulated intense research interest. By
utilizing the sparsity of the mmWave channel, Marzi et al.
[8] incorporated compressed sensing techniques in the BA
problem to reduce the beam measurement complexity. Wang
et al. [9] developed a fast-discovery multi-resolution beam
search method, which first probes the wide beam and continues
to narrow beams until the optimal arm is identified. However,
the beam resolution needs to be adjusted at each step. Besides,
various forms of side information, e.g., location information
[10], out-of-band measurements [11], and dedicated short-
range communication [12], have also been used to reduce the
required number of probing beams.
Due to its inherent ability to balance between exploration
and exploitation, multi-armed bandit (MAB) theory has been
recently utilized in wireless communication field, such as
channel access in cognitive network [13], channel allocation
[14] and adaptive modulation/coding [15], etc., and harnessed
to address the BA problems; see [16]–[22]. In a single-user
MIMO system, the work [16] employed the canonical upper
confidence bound (UCB) bandit algorithm in beam selection,
where the instantaneous full CSI is not required. The work [17]
applied linear Thompson sampling (LTS) to address the beam
selection problem. For mmWave vehicular systems, Sim et
al. [18] developed an online learning algorithm to address the
problem of beam selection with environment-awareness based
on contextual bandit theory. The proposed method explores
different beams over time while accounting for contextual
information (i.e., vehicles’ direction of arrival). Similarly,
Hashemi et al. [19] proposed to maximize the directivity gain
(i.e., received energy) of the BA policy within a time period
using the contextual information (i.e., the inherent correlation
and unimodality properties), and formulate the BA problem
as contextual bandits. Wu et al. [20] provided a method to
quickly and accurately align beams in multi-path channels for
a point-to-point mmwave system, which takes advantage of the
correlation structure among beams such that the information
from nearby beams is extracted to identify the optimal beam,
instead of searching the entire beam space. In [21], Va
et al. used a UCB-based framework to develop the online
learning algorithm for beam pair selection and refinement,
arXiv:2210.12625v1 [cs.IT] 23 Oct 2022