Vision Transformer for Adaptive Image Transmission over MIMO Channels Haotian Wu Yulin Shao Chenghong Bian Krystian Mikolajczyk and Deniz G unduz

2025-05-06 0 0 851.21KB 6 页 10玖币

侵权投诉

Vision Transformer for Adaptive Image

Transmission over MIMO Channels

Haotian Wu, Yulin Shao, Chenghong Bian, Krystian Mikolajczyk, and Deniz G¨

und¨

Department of Electrical and Electronic Engineering, Imperial College London, London SW7 2BT, UK

Email:{haotian.wu17, y.shao, c.bian22, k.mikolajczyk, d.gunduz}@imperial.ac.uk

Abstract—This paper presents a vision transformer (ViT)

based joint source and channel coding (JSCC) scheme for wireless

image transmission over multiple-input multiple-output (MIMO)

systems, called ViT-MIMO. The proposed ViT-MIMO archi-

tecture, in addition to outperforming separation-based bench-

marks, can ﬂexibly adapt to different channel conditions without

requiring retraining. Speciﬁcally, exploiting the self-attention

mechanism of the ViT enables the proposed ViT-MIMO model

to adaptively learn the feature mapping and power allocation

based on the source image and channel conditions. Numerical

experiments show that ViT-MIMO can signiﬁcantly improve the

transmission quality across a large variety of scenarios, including

varying channel conditions, making it an attractive solution for

emerging semantic communication systems.

Index Terms—Joint source channel coding, vision transformer,

MIMO, image transmission, semantic communications.

I. INTRODUCTION

The design of efﬁcient image communication systems over

wireless channels has recently attracted a lot of interest due

to the increasing number of Internet-of-things (IoT) and edge

intelligence applications [1], [2]. The traditional solution from

Shannon’s separation theorem is to design source and channel

coding independently; however, the separation-based approach

is known to be sub-optimal in practice, which becomes partic-

ularly limiting in applications that impose strict latency con-

straints [3]. Despite the known theoretical beneﬁts, designing

practical joint source channel coding (JSCC) schemes has been

an ongoing challenge for many decades. Signiﬁcant progress

has been made in this direction over the last years thanks to the

introduction of deep neural networks (DNNs) for the design

of JSCC schemes [4]–[8]. The ﬁrst deep learning based JSCC

(DeepJSCC) scheme for wireless image transmission is pre-

sented in [4], and it is shwon to outperform the concatenation

of state-of-the-art image compression algorithm better portable

graphics (BPG) with LDPC codes. It was later extended to

transmission with adaptive channel bandwidth in [9] and to

the transmission over multipath fading channels in [10] and

[11].

To the best of our knowledge, all the existing papers on

DeepJSCC consider single-antenna transmitters and receivers.

While there is a growing literature successfully employing

DNNs for various multiple-input multiple-output (MIMO)

related tasks, such as detection, channel estimation, or channel

state feedback [12]–[16], no previous work has so far applied

DeepJSCC to the more challenging MIMO channels. MIMO

systems are known to boost the throughput and spectral

efﬁciency of wireless communications, showing signiﬁcant im-

provements in the capacity and reliability. JSCC over MIMO

channels is studied in [17] from a theoretical perspective. It

is challenging to design a practical JSCC scheme for MIMO

channels, where the model needs to retrieve coupled signals

from different antennas experiencing different channel gains.

A limited number of papers focus on DNN-based end-to-end

MIMO communication schemes. The ﬁrst autoencoder-based

end-to-end MIMO communication method is introduced in

[18]. In [19], the authors set the symbol error rate bench-

marks for MIMO channels by evaluating several AE-based

models with channel state information (CSI). A singular-value

decomposition (SVD) based autoencoder is proposed in [20]

to achieve the state-of-the-art bit error rate. However, these

MIMO schemes only consider the transmission of bits at a

ﬁxed signal-to-noise ratio (SNR) value, ignoring the source

signal’s semantic context and channel adaptability.

In this paper, we design an end-to-end uniﬁed DeepJSCC

scheme for MIMO image transmission. In particular, we

introduce a vision transformer (ViT) based DeepJSCC scheme

for MIMO systems with CSI, called ViT-MIMO. Inspired

by the success of the attention mechanism in the design of

ﬂexible communication schemes [11], [21]–[23], we leverage

the self-attention mechanism of the ViT in wireless image

transmission. Speciﬁcally, we represent the channel conditions

with a channel heatmap, and adapt the JSCC encoding and

decoding parameters according to this heatmap. Our method

can learn global attention between the source image and

the channel conditions in all the intermediate layers of the

DeepJSCC encoder and decoder. Intuitively, we expect this

design to simultaneously learn feature mapping and power

allocation based on the source semantics and the channel

conditions. Our main contributions can be listed as follows:

•To the best of the authors’ knowledge, our ViT-MIMO

is the ﬁrst DeepJSCC-enabled MIMO communication

system for image transmission, where a ViT is designed

to explore the contextual semantic features of the image

as well as the CSI in a self-attention fashion.

•Numerical results show that our ViT-MIMO model sig-

niﬁcantly improves the transmission quality over a large

range of channel conditions and bandwidth ratios, com-

pared with the traditional separate source and channel

coding schemes adopting BPG image compression algo-

arXiv:2210.15347v1 [cs.IT] 27 Oct 2022

rithm with capacity achieving channel transmission. In

addition, our proposed ViT-MIMO is a ﬂexible end-to-

end model that can adapt to varying channel conditions

without retraining.

II. SYSTEM MODEL

We consider a M×MMIMO communication system,

where an M-antenna transmitter aims to deliver an image

S∈Rh×w×3to a M-antenna receiver (hand wdenote the

height and width of the image, while 3refers to the color

channels R, G and B). The transmitter encodes the image

into a vector of channel symbols X∈CM×k, where kis

the number of channel uses. Following the standard deﬁnition

[5], we denote the bandwidth ratio (i.e., channel usage to the

number of source symbols ratio) by R,k/n, where n= 3hw

is the number of source symbols. The transmitted signal Xis

subject to a power constraint Ps:

Mk kXk2

F≤Ps,(1)

where k·kFdenotes the Frobenius norm, and we set Ps= 1

without loss of generality.

The channel model can be written as:

Y=HX +W,(2)

where X∈CM×kand Y∈CM×kdenote the channel

input and output matrices, respectively, while W∈CM×k

is additive white Gaussian noise (AWGN) term that fol-

lows independent and identically distributed (i.i.d.) complex

Gaussian distribution with zero mean and σ2

wvariance, i.e.,

W[i, j]∼ CN(0, σ2

w). The entries of the channel gain matrix

H∈CM×Mfollow i.i.d. complex Gaussian distribution with

zero mean and variance σ2

h, i.e., H[i, j]∼ CN(0, σ2

h). We

consider a slow block-fading channel model, in which the

channel matrix Hremains constant for kchannel symbols,

which corresponds to the transmission of one image and takes

an independent realization in the next block. We assume that

the CSI is known both by the transmitter (CSIT) and the

receiver (CSIR).

Given the channel output Y, receiver reconstructs the source

image as ˆ

S∈Rh×w×3. We use the peak signal-to-noise ratio

(PSNR) as the distortion metric:

PSNR = 10 log10 kSk2

∞

EkS−ˆ

Sk2

(dB),(3)

where k·k∞is the inﬁnity norm and the expectation is

computed over each pixel for mean squared error (MSE).

As shown in Fig. 1, there can be two approaches to solve

this problem. The traditional separate source and channel

coding scheme and JSCC.

A. Separate source and channel coding

For the traditional separate scheme, we sequentially per-

form image compression, channel coding, and modulation

to generate the channel input matrix X, the elements of

which are constellations with average power normalized to

Receiver

MIMO

Channel

𝑺

Joint

Source-Channel

Encoder

Transmitter

𝑿&

MIMO

Channel

𝑺

𝑿&

Joint

Source-Channel

Decoder

Source

Encoder

(a)

(b)

Transmitter Receiver

𝑿Power

allocation

Modulator

Channel

Decoder Demodulator

MIMO

detection

𝒀

𝐕𝑿 𝐕𝚲𝑿

𝑺𝑿𝐕𝑿 𝒀Channel

equalization

Precoder

Channel

Encoder

Source

Decoder

Channel

Decoder

Fig. 1. Block diagram of the MIMO image transmission system: (a)

conventional separate source-channel coding scheme, and (b) deep learning

based JSCC scheme.

1. Speciﬁcally, given the CSI, we ﬁrst decompose the channel

matrix by singular-value decomposition (SVD), yielding H=

UΣVH, where U∈CM×Mand V∈CM×Mare unitary

matrices, and Σis a diagonal matrix whose singular values

are in descending order. We denote Σby diag(s1, s2, . . . , sM),

where s1≥ ··· ≥ sM.

Let us denote the power allocation matrix by Λ, where Λis

diagonal with its diagonal element being the power allocation

weight for the signal stream of each antenna. Λcan be derived

from standard water-ﬁlling algorithms. With power allocation

and SVD precoding (precoding Xinto V X), (2) can be

rewritten as

Y=HV ΛX+W=UΣΛX+W.(4)

Multiplying both sides of (4) by UH(MIMO detection) gives

X0=ΣΛX+UHW.(5)

As can be seen, SVD-based precoding converts the MIMO

channel into a set of parallel subchannels with different SNRs.

In particular, SNR of the i-th subchannel is determined by

the i-th singular value of siand the i-th power allocation

coefﬁcient of Λ.

Given X0∈CM×k, the receiver performs demodulation,

channel decoding, and source decoding sequentially to recon-

struct the source image as ˆ

B. DeepJSCC

In DeepJSCC, we exploit deep learning to parameterize the

encoder and decoder functions, which are trained jointly on an

image dataset and the channel model in (2). Let us denote the

DeepJSCC encoder and decoder by fθand fφ, respectively,

where θand φdenote the network parameters. We have

X=fθ(S,H, σ2

w).(6)

Unlike the separate source-channel coding scheme, the trans-

mitter does away with power allocation. Instead, we leverage

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

VisionTransformerforAdaptiveImageTransmissionoverMIMOChannelsHaotianWu,YulinShao,ChenghongBian,KrystianMikolajczyk,andDenizG¨und¨uzDepartmentofElectricalandElectronicEngineering,ImperialCollegeLondon,LondonSW72BT,UKEmail:fhaotian:wu17;y:shao;c:bian22;k:mikolajczyk;d:gunduzg@imperial.ac.ukAbstractTh...

展开>> 收起<<

Vision Transformer for Adaptive Image Transmission over MIMO Channels Haotian Wu Yulin Shao Chenghong Bian Krystian Mikolajczyk and Deniz G unduz.pdf

共6页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Vision Transformer for Adaptive Image Transmission over MIMO Channels Haotian Wu Yulin Shao Chenghong Bian Krystian Mikolajczyk and Deniz G unduz

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: