Reducing The Mismatch Between Marginal and Learned Distributions in Neural Video Compression Muhammet Balcilar

2025-04-24 0 0 1012.19KB 5 页 10玖币

侵权投诉

Reducing The Mismatch Between Marginal and

Learned Distributions in Neural Video Compression

Muhammet Balcilar

InterDigital, Inc.

Rennes, France

Bharath Bhushan Damodaran

InterDigital, Inc.

Rennes, France

Pierre Hellier

InterDigital, Inc.

Rennes, France

Abstract—During the last four years, we have witnessed the

success of end-to-end trainable models for image compression.

Compared to decades of incremental work, these machine learn-

ing (ML) techniques learn all the components of the compression

technique, which explains their actual superiority. However, end-

to-end ML models have not yet reached the performance of

traditional video codecs such as VVC. Possible explanations

can be put forward: lack of data to account for the temporal

redundancy, or inefﬁciency of latent’s density estimation in

the neural model. The latter problem can be deﬁned by the

discrepancy between the latent’s marginal distribution and the

learned prior distribution. This mismatch, known as amortization

gap of entropy model, enlarges the ﬁle size of compressed data.

In this paper, we propose to evaluate the amortization gap for

three state-of-the-art ML video compression methods. Second, we

propose an efﬁcient and generic method to solve the amortization

gap and show that it leads to an improvement between 2% to

5% without impacting reconstruction quality.

Keywords—Neural video compression, Entropy model, Repa-

rameterization.

I. INTRODUCTION

Image and video compression is a fundamental task in

image processing, which has become crucial in the time of

pandemic and increasing video streaming. Thanks to the com-

munity’s huge efforts for decades, traditional methods (includ-

ing linear transformations under heavily optimized handcrafted

techniques) have reached current state of the art rate-distortion

(RD) performance and dominate current industrial codecs

solutions. Alternatively, end-to-end trainable deep models have

recently emerged, with promising results. Even though these

methods clearly exceed many traditional techniques and sur-

pass human capability for some tasks for a few years back

very recently they beat the best traditional compressing method

(VVC, versatile video coding [1]) even in terms of peak

signal-to-noise ratio (PSNR) for single image compression

[2]. However, their performance on video compression are

still far from VVC, just on par with one generation back of

the traditional method (HEVC, high efﬁciency video coding

[3]). In addition to the inefﬁciency of neural models on

capturing temporal redundancy, the mismatch between test

latent’s normalized histograms and learned distributions in the

entropy models may be a contributing factor.

Lossy image compression via end-to-end trainable models

is special kind of Variational Autoencoder (VAE) that learns

the transformations among data and latent codes and the

probability models of these latent codes jointly [2], [4]–[7].

The problem is a multi-objective optimization problem where

the model should be optimized for reconstruction quality and

cross entropy of latent code w.r.t learned probabilities known

as RD loss function. These neural image codecs were extended

by using two VAEs, one for encoding motion information,

another for encoding residual information in end-to-end video

compression [8]–[12]. As all trainable models suffer from

amortization gap [13] (may be optimal for entire dataset, but

sub-optimal for given single test instance), neural compres-

sion models have a similar issue and this gap reduces the

performance by either enlarging the ﬁle size or decreasing the

reconstruction quality. The ﬁrst fold solution to this problem

is to apply post training for a given single test image/video,

where some of them train just the encoder component of VAE

[14], [15] in order to prevent extra signaling cost. The other

selection is to ﬁne tune all parts of the model by adding

signaling cost to the loss function [16]. Second fold solutions

do not apply time-consuming post training, but adjust some

parameters of the model. For instance, in [17], only the entropy

model’s amortization gap in end-to-end image compression is

targeted and instance speciﬁc reparameterization of the latent

distribution was proposed.

This paper is an extension of our previous work [17] to

the video compression with some important differences. First,

we introduce a general framework that generalizes end-to-end

trainable models on video compression. Second, we analyse

the amortization gap of all entropy models for different frames

(I, B and P frames), for different information (motion and

residual) in three recent neural video compression methods.

Third, we identify the origin of the main performance drop and

we show how it can be ﬁxed. Last, but not least, we show the

efﬁciency of our probability reparameterization method where

the new parameters are kept into ﬁle considering the temporal

redundancy of these parameters. According to the result, we

decrease the ﬁle size of video between 2% to 5% in average

without any effect on reconstruction quality. To the best of our

knowledge, it is the ﬁrst research on closing the amortization

gap of neural video compression without post-training.

II. END-TO-END VIDEO COMPRESSION

Although the ﬁrst end-to-end image compression model

used only factorized entropy model in [4], following papers

with hierarchical VAE based hyperprior entropy models as

in [2], [5]–[7] became standard in neural image compression

and now form the backbone of sota neural video compression.

arXiv:2210.06596v1 [cs.CV] 12 Oct 2022

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ReducingTheMismatchBetweenMarginalandLearnedDistributionsinNeuralVideoCompressionMuhammetBalcilarInterDigital,Inc.Rennes,FranceBharathBhushanDamodaranInterDigital,Inc.Rennes,FrancePierreHellierInterDigital,Inc.Rennes,FranceAbstractDuringthelastfouryears,wehavewitnessedthesuccessofend-to-endtrainabl...

展开>> 收起<<

Reducing The Mismatch Between Marginal and Learned Distributions in Neural Video Compression Muhammet Balcilar.pdf

共5页,预览1页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Reducing The Mismatch Between Marginal and Learned Distributions in Neural Video Compression Muhammet Balcilar

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: