Vision Transformer for Adaptive Image Transmission over MIMO Channels Haotian Wu Yulin Shao Chenghong Bian Krystian Mikolajczyk and Deniz G unduz

2025-05-06 0 0 851.21KB 6 页 10玖币
侵权投诉
Vision Transformer for Adaptive Image
Transmission over MIMO Channels
Haotian Wu, Yulin Shao, Chenghong Bian, Krystian Mikolajczyk, and Deniz G¨
und¨
uz
Department of Electrical and Electronic Engineering, Imperial College London, London SW7 2BT, UK
Email:{haotian.wu17, y.shao, c.bian22, k.mikolajczyk, d.gunduz}@imperial.ac.uk
Abstract—This paper presents a vision transformer (ViT)
based joint source and channel coding (JSCC) scheme for wireless
image transmission over multiple-input multiple-output (MIMO)
systems, called ViT-MIMO. The proposed ViT-MIMO archi-
tecture, in addition to outperforming separation-based bench-
marks, can flexibly adapt to different channel conditions without
requiring retraining. Specifically, exploiting the self-attention
mechanism of the ViT enables the proposed ViT-MIMO model
to adaptively learn the feature mapping and power allocation
based on the source image and channel conditions. Numerical
experiments show that ViT-MIMO can significantly improve the
transmission quality across a large variety of scenarios, including
varying channel conditions, making it an attractive solution for
emerging semantic communication systems.
Index Terms—Joint source channel coding, vision transformer,
MIMO, image transmission, semantic communications.
I. INTRODUCTION
The design of efficient image communication systems over
wireless channels has recently attracted a lot of interest due
to the increasing number of Internet-of-things (IoT) and edge
intelligence applications [1], [2]. The traditional solution from
Shannon’s separation theorem is to design source and channel
coding independently; however, the separation-based approach
is known to be sub-optimal in practice, which becomes partic-
ularly limiting in applications that impose strict latency con-
straints [3]. Despite the known theoretical benefits, designing
practical joint source channel coding (JSCC) schemes has been
an ongoing challenge for many decades. Significant progress
has been made in this direction over the last years thanks to the
introduction of deep neural networks (DNNs) for the design
of JSCC schemes [4]–[8]. The first deep learning based JSCC
(DeepJSCC) scheme for wireless image transmission is pre-
sented in [4], and it is shwon to outperform the concatenation
of state-of-the-art image compression algorithm better portable
graphics (BPG) with LDPC codes. It was later extended to
transmission with adaptive channel bandwidth in [9] and to
the transmission over multipath fading channels in [10] and
[11].
To the best of our knowledge, all the existing papers on
DeepJSCC consider single-antenna transmitters and receivers.
While there is a growing literature successfully employing
DNNs for various multiple-input multiple-output (MIMO)
related tasks, such as detection, channel estimation, or channel
state feedback [12]–[16], no previous work has so far applied
DeepJSCC to the more challenging MIMO channels. MIMO
systems are known to boost the throughput and spectral
efficiency of wireless communications, showing significant im-
provements in the capacity and reliability. JSCC over MIMO
channels is studied in [17] from a theoretical perspective. It
is challenging to design a practical JSCC scheme for MIMO
channels, where the model needs to retrieve coupled signals
from different antennas experiencing different channel gains.
A limited number of papers focus on DNN-based end-to-end
MIMO communication schemes. The first autoencoder-based
end-to-end MIMO communication method is introduced in
[18]. In [19], the authors set the symbol error rate bench-
marks for MIMO channels by evaluating several AE-based
models with channel state information (CSI). A singular-value
decomposition (SVD) based autoencoder is proposed in [20]
to achieve the state-of-the-art bit error rate. However, these
MIMO schemes only consider the transmission of bits at a
fixed signal-to-noise ratio (SNR) value, ignoring the source
signal’s semantic context and channel adaptability.
In this paper, we design an end-to-end unified DeepJSCC
scheme for MIMO image transmission. In particular, we
introduce a vision transformer (ViT) based DeepJSCC scheme
for MIMO systems with CSI, called ViT-MIMO. Inspired
by the success of the attention mechanism in the design of
flexible communication schemes [11], [21]–[23], we leverage
the self-attention mechanism of the ViT in wireless image
transmission. Specifically, we represent the channel conditions
with a channel heatmap, and adapt the JSCC encoding and
decoding parameters according to this heatmap. Our method
can learn global attention between the source image and
the channel conditions in all the intermediate layers of the
DeepJSCC encoder and decoder. Intuitively, we expect this
design to simultaneously learn feature mapping and power
allocation based on the source semantics and the channel
conditions. Our main contributions can be listed as follows:
To the best of the authors’ knowledge, our ViT-MIMO
is the first DeepJSCC-enabled MIMO communication
system for image transmission, where a ViT is designed
to explore the contextual semantic features of the image
as well as the CSI in a self-attention fashion.
Numerical results show that our ViT-MIMO model sig-
nificantly improves the transmission quality over a large
range of channel conditions and bandwidth ratios, com-
pared with the traditional separate source and channel
coding schemes adopting BPG image compression algo-
arXiv:2210.15347v1 [cs.IT] 27 Oct 2022
rithm with capacity achieving channel transmission. In
addition, our proposed ViT-MIMO is a flexible end-to-
end model that can adapt to varying channel conditions
without retraining.
II. SYSTEM MODEL
We consider a M×MMIMO communication system,
where an M-antenna transmitter aims to deliver an image
SRh×w×3to a M-antenna receiver (hand wdenote the
height and width of the image, while 3refers to the color
channels R, G and B). The transmitter encodes the image
into a vector of channel symbols XCM×k, where kis
the number of channel uses. Following the standard definition
[5], we denote the bandwidth ratio (i.e., channel usage to the
number of source symbols ratio) by R,k/n, where n= 3hw
is the number of source symbols. The transmitted signal Xis
subject to a power constraint Ps:
1
Mk kXk2
FPs,(1)
where k·kFdenotes the Frobenius norm, and we set Ps= 1
without loss of generality.
The channel model can be written as:
Y=HX +W,(2)
where XCM×kand YCM×kdenote the channel
input and output matrices, respectively, while WCM×k
is additive white Gaussian noise (AWGN) term that fol-
lows independent and identically distributed (i.i.d.) complex
Gaussian distribution with zero mean and σ2
wvariance, i.e.,
W[i, j]∼ CN(0, σ2
w). The entries of the channel gain matrix
HCM×Mfollow i.i.d. complex Gaussian distribution with
zero mean and variance σ2
h, i.e., H[i, j]∼ CN(0, σ2
h). We
consider a slow block-fading channel model, in which the
channel matrix Hremains constant for kchannel symbols,
which corresponds to the transmission of one image and takes
an independent realization in the next block. We assume that
the CSI is known both by the transmitter (CSIT) and the
receiver (CSIR).
Given the channel output Y, receiver reconstructs the source
image as ˆ
SRh×w×3. We use the peak signal-to-noise ratio
(PSNR) as the distortion metric:
PSNR = 10 log10 kSk2
EkSˆ
Sk2
F
(dB),(3)
where k·kis the infinity norm and the expectation is
computed over each pixel for mean squared error (MSE).
As shown in Fig. 1, there can be two approaches to solve
this problem. The traditional separate source and channel
coding scheme and JSCC.
A. Separate source and channel coding
For the traditional separate scheme, we sequentially per-
form image compression, channel coding, and modulation
to generate the channel input matrix X, the elements of
which are constellations with average power normalized to
Receiver
MIMO
Channel
"
𝑺
Joint
Source-Channel
Encoder
Transmitter
𝑿&
MIMO
Channel
"
𝑺
𝑺
𝑿&
Joint
Source-Channel
Decoder
Source
Encoder
(a)
(b)
Transmitter Receiver
𝑿Power
allocation
Modulator
Channel
Decoder Demodulator
MIMO
detection
𝒀
𝐕𝑿 𝐕𝚲𝑿
𝑺𝑿𝐕𝑿 𝒀Channel
equalization
Precoder
Precoder
Channel
Encoder
Source
Decoder
Channel
Decoder
Fig. 1. Block diagram of the MIMO image transmission system: (a)
conventional separate source-channel coding scheme, and (b) deep learning
based JSCC scheme.
1. Specifically, given the CSI, we first decompose the channel
matrix by singular-value decomposition (SVD), yielding H=
UΣVH, where UCM×Mand VCM×Mare unitary
matrices, and Σis a diagonal matrix whose singular values
are in descending order. We denote Σby diag(s1, s2, . . . , sM),
where s1≥ ··· ≥ sM.
Let us denote the power allocation matrix by Λ, where Λis
diagonal with its diagonal element being the power allocation
weight for the signal stream of each antenna. Λcan be derived
from standard water-filling algorithms. With power allocation
and SVD precoding (precoding Xinto V X), (2) can be
rewritten as
Y=HV ΛX+W=UΣΛX+W.(4)
Multiplying both sides of (4) by UH(MIMO detection) gives
X0=ΣΛX+UHW.(5)
As can be seen, SVD-based precoding converts the MIMO
channel into a set of parallel subchannels with different SNRs.
In particular, SNR of the i-th subchannel is determined by
the i-th singular value of siand the i-th power allocation
coefficient of Λ.
Given X0CM×k, the receiver performs demodulation,
channel decoding, and source decoding sequentially to recon-
struct the source image as ˆ
S.
B. DeepJSCC
In DeepJSCC, we exploit deep learning to parameterize the
encoder and decoder functions, which are trained jointly on an
image dataset and the channel model in (2). Let us denote the
DeepJSCC encoder and decoder by fθand fφ, respectively,
where θand φdenote the network parameters. We have
X=fθ(S,H, σ2
w).(6)
Unlike the separate source-channel coding scheme, the trans-
mitter does away with power allocation. Instead, we leverage
摘要:

VisionTransformerforAdaptiveImageTransmissionoverMIMOChannelsHaotianWu,YulinShao,ChenghongBian,KrystianMikolajczyk,andDenizG¨und¨uzDepartmentofElectricalandElectronicEngineering,ImperialCollegeLondon,LondonSW72BT,UKEmail:fhaotian:wu17;y:shao;c:bian22;k:mikolajczyk;d:gunduzg@imperial.ac.ukAbstract—Th...

展开>> 收起<<
Vision Transformer for Adaptive Image Transmission over MIMO Channels Haotian Wu Yulin Shao Chenghong Bian Krystian Mikolajczyk and Deniz G unduz.pdf

共6页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:6 页 大小:851.21KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 6
客服
关注