PalGAN Image Colorization with Palette Generative Adversarial Networks Yi Wang1 Menghan Xia2 Lu Qi3 Jing Shao4 and Yu Qiao1

2025-05-06 0 0 4.67MB 17 页 10玖币

侵权投诉

PalGAN: Image Colorization with Palette

Generative Adversarial Networks

Yi Wang1, Menghan Xia2, Lu Qi3, Jing Shao4, and Yu Qiao1

1Shanghai AI Laboratory 2Tencent AI Lab 3CUHK 4SenseTime Research

{wangyi,qiaoyu}@pjlab.org.cn menghanxyz@gmail.com luqi@cse.cuhk.edu.hk

shaojing@senseauto.com

Fig. 1: Our colorization results. 1st row: inputs, and 2nd row: our predictions.

Abstract. Multimodal ambiguity and color bleeding remain challenging

in colorization. To tackle these problems, we propose a new GAN-based

colorization approach PalGAN, integrated with palette estimation and

chromatic attention. To circumvent the multimodality issue, we present a

new colorization formulation that estimates a probabilistic palette from

the input gray image ﬁrst, then conducts color assignment conditioned on

the palette through a generative model. Further, we handle color bleed-

ing with chromatic attention. It studies color aﬃnities by considering

both semantic and intensity correlation. In extensive experiments, Pal-

GAN outperforms state-of-the-arts in quantitative evaluation and visual

comparison, delivering notable diverse, contrastive, and edge-preserving

appearances. With the palette design, our method enables color transfer

between images even with irrelevant contexts.

Keywords: Image Colorization, Generative Adversarial Networks, At-

tention, Color Transfer

1 Introduction

Colorization means to predict the missing chrome information from the given

gray image. It is an interesting and practical task in computer vision, widely used

in legacy footage processing [27], color transfer [1,39], and other visual editing

applications [3,52]. It is also exploited as a proxy task for self-supervised learning

[25], since predicting perceptually natural colors from the given grayscale image

Corresponding author

arXiv:2210.11204v1 [cs.CV] 20 Oct 2022

2 Y. Wang et al.

heavily relies on scene understanding. However, even the ground-truth color

is available for supervision, it is still very challenging to predict pixel colors

from gray images, due to the ill-posed nature that one input grayscale could

correspond to multiple possible color variants.

Most current methods [54,56,26,12,23,38,49,17,3] formulate colorization as a

pixel-level regression task, suﬀering from multimodal representation more or less.

With the large-scale training data and end-to-end learning models, they can learn

the color distribution prior conveniently, e.g. vegetation greenish tones, human

skin colors, etc.. Anyhow, when it comes to objects with inherently color am-

biguity (e.g. human clothes, cars, and other man-made stuﬀ), these approaches

tend to predict the brownish average colors. To tackle such multi-modality, re-

searches [54,56,24] proposed to formulate the color prediction as pixel-level color

classiﬁcation, which allows multiple colors to be assigned to each pixel based

on posterior probability. Unfortunately, these suﬀer from regional color incon-

sistency due to the independent pixel-wise sampling mechanism. In this regard,

means of utilizing the sequential modeling [12,23] can only partially help the

sampling issue, because the unidirectional sequential dependence of 2D ﬂattened

pixel primitives causes error accumulation and hinders the learning eﬃciency.

Apart from the multimodal issue, color bleeding is another common issue

in colorization due to inaccurate identiﬁcation of semantic boundaries. To sup-

press such visual artifacts, most works [54,56,26,38,49,17,3] resort to Generative

adversarial networks (GAN) to encourage the generated chrome distribution to

be indistinguishable from that of the real-life color images. Currently, no spe-

cial algorithms or modules for deep models have been proposed to enhance the

performance of this aspect, which matters the visual pleasantness considerably.

To avoid modeling the color multimodality pixel-wisely, we propose a new

colorization framework PalGAN that predicts the pixel colors in a coarse-to-ﬁne

paradigm. The key idea is to ﬁrst predict the global palette probability (e.g.

palette histogram) from the grayscale. It does not collapse into a single speciﬁc

colorization solution but represents a certain color distribution of the potential

color variants. Then, the uncertainty about the per-pixel color assignment is

modeled with a generative model in the GAN framework, conditioned on the

grayscale and palette histogram. Therefore, multiple colorization results could

be achieved by changing the palette histogram input.

To guarantee the color assignment with semantic correctness and regional

consistency, we study color aﬃnities by a proposed chromatic attention mod-

ule. It explicitly aligns color aﬃnity with both semantics and low-level charac-

teristics. In structure, chromatic attention includes global interaction and lo-

cal delineation. The former enables global context utilization for color infer-

ence by using semantic features in the attention mechanism. The latter pre-

serves regional details by mapping the gray input to color through local aﬃne

transformation. The transformation is explicitly parameterized by the correla-

tion between gray input and color feature. Experiments illustrate the eﬀective-

ness of our method. It achieves impressive visual results (Fig. 1) and quantita-

tive superiority over state-of-the-art approaches over ImageNet [9] and COCO-

PalGAN: Image Colorization with Palette Generative Adversarial Networks 3

Stuﬀ [5]. Our method also works well with the user-speciﬁed palette histogram

from a reference image, which could even have no content correlation with the

input grayscale. So, by nature, our method supports diverse coloring results

with certain controllability. Our code and pretrained models are available in

https://github.com/shepnerd/PalGAN.

Generally, our contributions are three-fold: i) We propose a new coloriza-

tion framework PalGAN that decomposes colorization to palette estimation and

pixel-wise assignment. It circumvents the challenges of color ambiguity and re-

gional homogeneity eﬀectively, and supports diverse and controllable colorization

by nature. ii) We explore the less-touched color aﬃnities and propose an eﬀec-

tive module named chromatic attention. It considers both semantic and local

detail correspondence, applying such correlations to color generation. It allevi-

ates notable color bleeding eﬀects. iii) Our method surpasses state-of-the-arts

in perceptual quality (FID [16] and LPIPS [55]) notably. It is known that there

exists a trade-oﬀ between perceptual and ﬁdelity results in multiple low-level

tasks. We argue perceptual eﬀects matter more than ﬁdelity as colorization aims

to produce realistic colorized results rather than restore identical pixel-wise col-

ors as the ground truth. Regardless, our method can achieve best both ﬁdelity

(PSNR and SSIM) and perceptual performance with proper tuning.

2 Related Work

2.1 Colorization

User Guided Colorization Some of early works [7,8,18,29,39,47,21,6,34] in col-

orization turn to a reference image for transferring its color statistics to the

given gray one. With the prevalence of deep learning, such color transfer is char-

acterized in neural feature space for introducing semantic consistency [15]. These

works perform decently when the reference and input share similar semantics.

Its applications are limited by the reference retrieval quality, especially when

handling complicated scenes.

Besides of reference images, several systems require users to give suﬃcient

local color hints (usually in scribble form) before colorizing inputs [27,37,52,21].

Then approaches propagate the given colors based on their local aﬃnities. Be-

sides, some attempts are made [3] to explore other modalities like languages to

instruct what colors are used and how they are distributed.

Learning-based Colorization This line of work [54,56,10,17,24,19] gives colorful

images only from the gray inputs, learning a pixel-to-pixel mapping. Large-scale

datasets are exploited in a self-supervised fashion, converting colorful pictures

to gray ones for pair-wise training. Iizuka et. al. [17] utilize image-level labels for

associating predicted color with global semantics, using a global-and-local con-

volutional neural network. Larsson et. al. [24] and Zhang et. al. [54] introduce

pixel-level color distribution matching by classiﬁcation, alleviating color unbal-

ance and multi-modal outputs. Besides, extra input hints are integrated into

learning systems by simulation in [56], providing automatic and semi-automatic

ways to colorize images. Recently, transformer architectures are explored for this

task considering their expressiveness on non-local modeling [23].

4 Y. Wang et al.

EGD

Forward flow

Conv CA PalNorm Palette Concatenation

Fig. 2: Our colorization system framework.

Some work explicitly exploits additional priors from pretrained models for

colorization. Su et. al. [38] study leveraging instance-level annotations (e.g., in-

stance bounding boxes and classes) by using an oﬀ-the-shelf detector. It will make

the colorization model focuses on color rendering without the need of recogniz-

ing high-level semantics. In addition to the mentioned pretrained discriminative

models, pretrained generative ones are also exploitable in improving colorization

performance in diversity. Wu et. al. [49] explore to incorporate generative color

prior from a pretrained BigGAN [4] to help a deep model produce colored re-

sults with diversities. They design an extra encoder to project the given gray

image into latent code, then estimate colorful images from BigGAN. With such

primary predictions, they further reﬁne the color results by the intermediate

features in BigGAN. Aﬁﬁ et. al. [1] propose employing a pretrained StyleGAN

[20] for image recoloring, and color is controlled by histogram features.

2.2 GAN-based Image-to-image Translation

Image-to-image translation aims to learn the transformation between the input

and output image. Colorization can be formulated to this task and handled by

Generative Adversarial Networks [11] (GAN) based approaches [19,41,35,30,44].

They employ an adversarial loss that learns to discriminate between real and

generated images, and then minimize this loss by updating the generator to

make the produced results look realistic [57,28,31,50,36,51,45,46,42].

3 Method

PalGAN aims to colorize grayscale images. It formulates colorization as a palette

prediction and assignment problem. Compared with directly learning the pixel-

to-pixel mapping from gray to color as adopted by most learning-based methods,

this disentanglement fashion not only brings empirical colorization improvements

(Section 4), but also enables us to manipulate global color distributions by ad-

justing or regularizing palettes.

For PalGAN, its input is a grayscale image (i.e. the luminance channel of

color images) L∈ Rh×w×1, and the output is the estimated chromatic map

C∈ Rh×w×2that will be used as the complementary ab channels together with

Lin CIE Lab color space. PalGAN consists of palette generator TE, palette

assignment generator TG, and a color discriminator D. In inference, only TE

and TGare employed. The whole framework is given in Fig. 2.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

PalGAN:ImageColorizationwithPaletteGenerativeAdversarialNetworksYiWang1,MenghanXia2,LuQi3,JingShao4,andYuQiao11ShanghaiAILaboratory2TencentAILab3CUHK4SenseTimeResearch{wangyi,qiaoyu}@pjlab.org.cnmenghanxyz@gmail.comluqi@cse.cuhk.edu.hkshaojing@senseauto.comFig.1:Ourcolorizationresults.1strow:inputs,...

收起<<

PalGAN Image Colorization with Palette Generative Adversarial Networks Yi Wang1 Menghan Xia2 Lu Qi3 Jing Shao4 and Yu Qiao1.pdf

共17页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

PalGAN Image Colorization with Palette Generative Adversarial Networks Yi Wang1 Menghan Xia2 Lu Qi3 Jing Shao4 and Yu Qiao1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: