MULTI-RATE ADAPTIVE TRANSFORM CODING FOR VIDEO COMPRESSION Lyndon R. Duong lyndon.duongnyu.edu

2025-04-29 1 0 411.69KB 5 页 10玖币

侵权投诉

MULTI-RATE ADAPTIVE TRANSFORM CODING FOR VIDEO COMPRESSION

Lyndon R. Duong∗

lyndon.duong@nyu.edu

Center for Neural Science, NYU

New York, NY, USA

Bohan Li, Cheng Chen, Jingning Han

{bohanli, chengchen, jingning}@google.com

Open Codecs, Google LLC

Mountain View, CA, USA

ABSTRACT

Contemporary lossy image and video coding standards rely on

transform coding, the process through which pixels are mapped to

an alternative representation to facilitate efﬁcient data compression.

Despite impressive performance of end-to-end optimized compres-

sion with deep neural networks, the high computational and space

demands of these models has prevented them from superseding

the relatively simple transform coding found in conventional video

codecs. In this study, we propose learned transforms and entropy

coding that may either serve as (non)linear drop-in replacements, or

enhancements for linear transforms in existing codecs. These trans-

forms can be multi-rate, allowing a single model to operate along the

entire rate-distortion curve. To demonstrate the utility of our frame-

work, we augmented the DCT with learned quantization matrices

and adaptive entropy coding to compress intra-frame AV1 block

prediction residuals. We report substantial BD-rate and perceptual

quality improvements over more complex nonlinear transforms at a

fraction of the computational cost.

Index Terms—video compression, transform coding, entropy

coding

1. INTRODUCTION

Transform coding is an integral component of image and video

coding [1]. In state-of-the-art video standards such as HEVC [2],

VVC [3], and AV1 [4], transform coding is used to map block pre-

diction residuals to a domain in which the statistics of the transform

coefﬁcients facilitate more effective compression. These codecs use

linear transforms such as the discrete cosine transform (DCT) [5]

and the asymmetric discrete sine transform (ADST) [6], due to their

compression efﬁciency as well as low computational complexity.

In recent years, impressive results using end-to-end optimized

codecs anticipate a possible shift away from conventional codecs,

whose designs are largely based on heuristics and hand-engineered

components ([7,8] for reviews). Indeed, image compression com-

petitions based on rate-distortion (R-D) performance are now dom-

inated by nonlinear machine learning (ML) models [7]. However,

end-to-end ML codecs have yet to become standardized, primarily

due to their extreme increase in time and space complexity relative

to conventional solutions. For example, one undesirable factor is

that many ML compression approaches train an individual model

for each point along the R-D curve, requiring an entirely separate set

of neural network parameters for each R-D trade-off. This not only

dramatically increases the space needed to store such parameters, but

also limits the ability to ﬁne-tune the R-D trade-off.

∗Work was performed at Open Codecs, Google LLC.

Fig. 1. Multi-rate model architecture. The analysis/synthesis trans-

forms (left column) can be linear or nonlinear, and use a ﬁxed set of

λ-independent parameters (blue), but uses a subset of λ-dependent

to ﬁne-tune the R-D trade-off (pink). A learned hyperprior network

(right column) enables forward-adaptive entropy coding by condi-

tioning the probability over transform coefﬁcients, p(ˆ

y|Φ). AC and

AD denote arithmetic coding and decoding, respectively and orange

boxes indicate quantization operations. See section 3 for details.

In this study, we take steps towards addressing these issues, with

our contributions summarized as follows:

1. We trained transforms and conditional entropy models with

an R-D objective to compress intra-frame prediction residuals

collected from the AVM codec [9], the reference software for

the next codec from Alliance for Open Media. These models

can be used as drop-in replacements, or augmentations for

existing transform coding modules in video codecs.

2. We used a family of architectures and a training procedure

that enable multi-rate compression via adaptive gain control

with context-adaptive entropy coding [7], allowing a single

trained model to operate at any arbitrary point along the R-D

curve. This vastly improves space complexity compared to

training a model for each R-D trade-off.

3. We augmented the DCT with ML components to provide sig-

niﬁcant improvements on BD-rate [10] and structural simi-

larity (SSIM) for intra-frame block prediction residuals com-

pared to learned nonlinear transforms with higher complexity.

arXiv:2210.14308v2 [eess.IV] 18 Feb 2023

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

MULTI-RATEADAPTIVETRANSFORMCODINGFORVIDEOCOMPRESSIONLyndonR.Duonglyndon.duong@nyu.eduCenterforNeuralScience,NYUNewYork,NY,USABohanLi,ChengChen,JingningHanfbohanli,chengchen,jingningg@google.comOpenCodecs,GoogleLLCMountainView,CA,USAABSTRACTContemporarylossyimageandvideocodingstandardsrelyontransfor...

展开>> 收起<<

MULTI-RATE ADAPTIVE TRANSFORM CODING FOR VIDEO COMPRESSION Lyndon R. Duong lyndon.duongnyu.edu.pdf

共5页,预览1页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

MULTI-RATE ADAPTIVE TRANSFORM CODING FOR VIDEO COMPRESSION Lyndon R. Duong lyndon.duongnyu.edu

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: