Reducing The Mismatch Between Marginal and Learned Distributions in Neural Video Compression Muhammet Balcilar

2025-04-24 0 0 1012.19KB 5 页 10玖币
侵权投诉
Reducing The Mismatch Between Marginal and
Learned Distributions in Neural Video Compression
Muhammet Balcilar
InterDigital, Inc.
Rennes, France
Bharath Bhushan Damodaran
InterDigital, Inc.
Rennes, France
Pierre Hellier
InterDigital, Inc.
Rennes, France
Abstract—During the last four years, we have witnessed the
success of end-to-end trainable models for image compression.
Compared to decades of incremental work, these machine learn-
ing (ML) techniques learn all the components of the compression
technique, which explains their actual superiority. However, end-
to-end ML models have not yet reached the performance of
traditional video codecs such as VVC. Possible explanations
can be put forward: lack of data to account for the temporal
redundancy, or inefficiency of latent’s density estimation in
the neural model. The latter problem can be defined by the
discrepancy between the latent’s marginal distribution and the
learned prior distribution. This mismatch, known as amortization
gap of entropy model, enlarges the file size of compressed data.
In this paper, we propose to evaluate the amortization gap for
three state-of-the-art ML video compression methods. Second, we
propose an efficient and generic method to solve the amortization
gap and show that it leads to an improvement between 2% to
5% without impacting reconstruction quality.
KeywordsNeural video compression, Entropy model, Repa-
rameterization.
I. INTRODUCTION
Image and video compression is a fundamental task in
image processing, which has become crucial in the time of
pandemic and increasing video streaming. Thanks to the com-
munity’s huge efforts for decades, traditional methods (includ-
ing linear transformations under heavily optimized handcrafted
techniques) have reached current state of the art rate-distortion
(RD) performance and dominate current industrial codecs
solutions. Alternatively, end-to-end trainable deep models have
recently emerged, with promising results. Even though these
methods clearly exceed many traditional techniques and sur-
pass human capability for some tasks for a few years back
very recently they beat the best traditional compressing method
(VVC, versatile video coding [1]) even in terms of peak
signal-to-noise ratio (PSNR) for single image compression
[2]. However, their performance on video compression are
still far from VVC, just on par with one generation back of
the traditional method (HEVC, high efficiency video coding
[3]). In addition to the inefficiency of neural models on
capturing temporal redundancy, the mismatch between test
latent’s normalized histograms and learned distributions in the
entropy models may be a contributing factor.
Lossy image compression via end-to-end trainable models
is special kind of Variational Autoencoder (VAE) that learns
the transformations among data and latent codes and the
probability models of these latent codes jointly [2], [4]–[7].
The problem is a multi-objective optimization problem where
the model should be optimized for reconstruction quality and
cross entropy of latent code w.r.t learned probabilities known
as RD loss function. These neural image codecs were extended
by using two VAEs, one for encoding motion information,
another for encoding residual information in end-to-end video
compression [8]–[12]. As all trainable models suffer from
amortization gap [13] (may be optimal for entire dataset, but
sub-optimal for given single test instance), neural compres-
sion models have a similar issue and this gap reduces the
performance by either enlarging the file size or decreasing the
reconstruction quality. The first fold solution to this problem
is to apply post training for a given single test image/video,
where some of them train just the encoder component of VAE
[14], [15] in order to prevent extra signaling cost. The other
selection is to fine tune all parts of the model by adding
signaling cost to the loss function [16]. Second fold solutions
do not apply time-consuming post training, but adjust some
parameters of the model. For instance, in [17], only the entropy
model’s amortization gap in end-to-end image compression is
targeted and instance specific reparameterization of the latent
distribution was proposed.
This paper is an extension of our previous work [17] to
the video compression with some important differences. First,
we introduce a general framework that generalizes end-to-end
trainable models on video compression. Second, we analyse
the amortization gap of all entropy models for different frames
(I, B and P frames), for different information (motion and
residual) in three recent neural video compression methods.
Third, we identify the origin of the main performance drop and
we show how it can be fixed. Last, but not least, we show the
efficiency of our probability reparameterization method where
the new parameters are kept into file considering the temporal
redundancy of these parameters. According to the result, we
decrease the file size of video between 2% to 5% in average
without any effect on reconstruction quality. To the best of our
knowledge, it is the first research on closing the amortization
gap of neural video compression without post-training.
II. END-TO-END VIDEO COMPRESSION
Although the first end-to-end image compression model
used only factorized entropy model in [4], following papers
with hierarchical VAE based hyperprior entropy models as
in [2], [5]–[7] became standard in neural image compression
and now form the backbone of sota neural video compression.
Let x,¯x,ˆx,ˆxrrespectively denote the current image, motion978-1-6654-7592-1/22/$31.00 ©2022 IEEE
arXiv:2210.06596v1 [cs.CV] 12 Oct 2022
摘要:

ReducingTheMismatchBetweenMarginalandLearnedDistributionsinNeuralVideoCompressionMuhammetBalcilarInterDigital,Inc.Rennes,FranceBharathBhushanDamodaranInterDigital,Inc.Rennes,FrancePierreHellierInterDigital,Inc.Rennes,FranceAbstract—Duringthelastfouryears,wehavewitnessedthesuccessofend-to-endtrainabl...

展开>> 收起<<
Reducing The Mismatch Between Marginal and Learned Distributions in Neural Video Compression Muhammet Balcilar.pdf

共5页,预览1页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:5 页 大小:1012.19KB 格式:PDF 时间:2025-04-24

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 5
客服
关注