Guiding Users to Where to Give Color Hints for Efficient Interactive Sketch Colorization via Unsupervised Region Prioritization Youngin Cho1Junsoo Lee2Soyoung Yang1Juntae Kim3Yeojeong Park1Haneol Lee4

2025-05-06 0 0 5.08MB 19 页 10玖币
侵权投诉
Guiding Users to Where to Give Color Hints for Efficient Interactive Sketch
Colorization via Unsupervised Region Prioritization
Youngin Cho*1Junsoo Lee*2Soyoung Yang1Juntae Kim3Yeojeong Park1Haneol Lee4
Mohammad Azam Khan1Daesik Kim2Jaegul Choo1
1KAIST AI 2NAVER WEBTOON AI 3Korea University 4UNIST
{choyi0521,sy yang,indigopyj,azamkhan,jchoo}@kaist.ac.kr
{junsoolee93,daesik.kim}@webtoonscorp.com
kjt7889@korea.ac.kr
haneollee@unist.ac.kr
1stSketch Image
(c) Result (b) Hints (a) Mask
3rd 5th 6th 2nd 3rd 5th 6th
Figure 1: Results of our proposed model on human faces and comics datasets. Each column of (a)-(c) indicates the order
of interactions as the i-th priority. (a) visualizes masked regions which our model guides at the i-th step. Given a region as
(a), users select its representative color, and the region is filled with the selected color. (c) shows intermediate colorization
results for given accumulated color hints as (b).
Abstract
Existing deep interactive colorization models have fo-
cused on ways to utilize various types of interactions, such
as point-wise color hints, scribbles, or natural-language
texts, as methods to reflect a user’s intent at runtime. How-
ever, another approach, which actively informs the user
of the most effective regions to give hints for sketch im-
age colorization, has been under-explored. This paper pro-
poses a novel model-guided deep interactive colorization
framework that reduces the required amount of user inter-
actions, by prioritizing the regions in a colorization model.
Our method, called GuidingPainter, prioritizes these re-
gions where the model most needs a color hint, rather than
just relying on the user’s manual decision on where to give
a color hint. In our extensive experiments, we show that
our approach outperforms existing interactive colorization
*Equal contribution
methods in terms of the conventional metrics, such as PSNR
and FID, and reduces required amount of interactions.
1. Introduction
The colorization task in computer vision has received
considerable attention recently, since it can be widely ap-
plied in content creation. Most content creation starts with
drawn or sketch images, and these can be accomplished
within a reasonable amount of time, but fully colorizing
them is a labor-intensive task. For this reason, the ability to
automatically colorize sketch images has significant poten-
tial values. However, automatic sketch image colorization
is still challenging for the following reasons. (i) The infor-
mation provided by an input sketch image is extremely lim-
ited compared to colored images or even gray-scale ones,
and (ii) there can be multiple possible outcomes for a given
sketch image without any conditional input, which tends to
degrade the model performance and introduce bias toward
arXiv:2210.14270v1 [cs.CV] 25 Oct 2022
the dominant colors in the dataset.
To alleviate these issues, conditional image colorization
methods take partial hints in addition to the input image, and
attempt to generate a realistic output image that reflects the
context of the given hints. Several studies have leveraged
user-guided interactions as a form of user-given conditions
to the model, assuming that the users would provide a de-
sired color value for a region as a type of point-wise color
hint [40] or a scribble [28, 3]. Although these approaches
have made remarkable progress, there still exist nontrivial
limitations. First, existing approaches do not address the is-
sue of estimating semantic regions which indicate how far
the user-given color hints should be spread, and thus the col-
orization model tends to require lots of user hints to produce
a desirable output. Second, for every interaction at test time,
the users are still expected to provide a local-position infor-
mation of color hint by pointing out the region of interest
(RoI), which increases the user’s effort and time commit-
ment. Lastly, since existing approaches typically obtain the
color hints on randomized locations at training time, the dis-
crepancies among intervention mechanisms for the training
and the test phases need to be addressed.
In this work, we propose a novel model-guided frame-
work for the interactive colorization of a sketch image,
called GuidingPainter. A key idea behind our work is to
make a model actively seek for regions where color hints
would be provided, which can significantly improve the
efficiency of interactive colorization process. To this end,
GuidingPainter consists of two modules: active-guidance
module and colorization module. Although colorization
module works similar to previous methods, our main con-
tribution is a hint generation mechanism in active-guidance
module. The active-guidance module (Section 3.2-3.3) (i)
divides the input image into multiple semantic regions and
(ii) ranks them in decreasing order of estimated model gains
when the region is colorized (Fig. 1(a)).
Since it is extremely expensive to obtain groundtruth for
segmentation labels or even their prioritization, we explore
a simple yet effective approach that identifies the meaning-
ful regions in an order of their priority without any man-
ually annotated labels. In our active guidance mechanism
(Section 3.3), GuidingPainter can learn such regions by in-
tentionally differentiating the frequency of usage for each
channel obtained from the segmentation network. Also, we
conduct a toy experiment (Section 4.5) to understand the
mechanism, and to verify the validity of our approach. We
propose several loss terms, e.g. smoothness loss and total
variance loss, to improve colorization quality in our frame-
work (Section 3.5), and analyze its effectiveness for both
quantitatively and qualitatively (Section 4.6). Note that the
only action required of users in our framework is to select
one representative color for each region the model provides
based on the estimated priorities (Fig. 1(b)). Afterwards, the
colorization network (Section 3.4) generates a high-quality
colorized output by taking the given sketch image and the
color hints (Fig. 1(c)).
In summary, our contributions are threefold:
We propose a novel model-guided deep image col-
orization framework, which prioritizes regions of a
sketch image in the order of the interest of the coloriza-
tion model.
GuidingPainter can learn to discover meaningful re-
gions for colorization and arrange them in their priority
just by using the groundtruth colorized image, without
additional manual supervision.
We demonstrate that our framework can be applied to
a variety of datasets by comparing it against previous
interactive colorization approaches in terms of various
metrics, including our proposed evaluation protocol.
2. Related Work
2.1. Deep Image Colorization
Existing deep image colorization methods, which uti-
lize deep neural networks for colorization, can be divided
into automatic and conditional approaches, depending on
whether conditions are involved or not. Automatic image
colorization models [39, 29, 36, 1] take a gray-scale or
sketch image as an input and generate a colorized image.
CIC [39] proposed a fully automatic colorization model
using convolutional neural networks (CNNs), and Su et
al. [29] further improved the model by extracting the fea-
tures of objects in the input image. Despite the substantial
performances of automatic colorization models, a nontrivial
amount of user intervention is still required in practice.
Conditional image colorization models attempt to re-
solve these limitations by taking reference images [16] or
user interactions [40, 3, 38, 34, 37] as additional input. For
example, Zhang et al. [40] allowed the users to input the
point-wise color hint in real time, and AlacGAN [3] uti-
lized stroke-based user hints by extracting semantic feature
maps. Although these studies consider the results are im-
proved by user hints, they generally require a large amount
of user interactions.
2.2. Interactive Image Generation
Beyond the colorization task, user interaction is uti-
lized in numerous computer vision tasks, such as image
generation, and image segmentation. In image genera-
tion, research has been actively conducted to utilize vari-
ous user interactions as additional input to GANs. A va-
riety of GAN models employ image-related features from
users to generate user-driven images [7, 17] and face im-
ages [26, 12, 31, 15, 30]. Several models generate and edit
images via natural-language text [35, 23, 42, 2]. In image
Active-guidance Module Colorization Module
ST -GUMBEL
Segmentation Network Colorization Network
Hint
Generation
# of hints ()
,,,
0 0
0 0 0
0
,
 ××
{0,1}××
  ××
copy
  {0,1}××
  {0,1}
Element-wise Multiplication Averaging Sum-up-to-map
  {0,1}××
 ×
Broad
cast
(a)
(b)
(c)
(d)
(e)
 


Hint Generation
Broad
cast
Disc
Figure 2: Hint generation process of our proposed GuidingPainter model. The segmentation network and the hint
generation function renders colored hints (C) and condition masks (M). Based on the guidance results, our colorization
network colorizes the sketch image. The example illustrates the hint generation process in the training phase where Nh= 3
and Nc= 4. First, the groundtruth image is copied as Nctimes to consider each color segment at each interaction step.
After element-wise multiplication with guided regions, (a) averages the color to decide representative colors for each guided
region. To restrict the number of hints, we mask out the segments whose iteration step is larger than Nh, The masked results
are (b). Based on (a) and (b), our module generates the colored condition for each segment as (c). In (d), we combine them
into one partially-colorized image C. (e) operates as the same manner with (d) and generates the condition mask M.
segmentation, to improve the details of segmentation re-
sults, recent models have utilized dots [27, 20] and texts [9]
from users. Although we surveyed a wide scope of inter-
active deep learning models beyond sketch image coloriza-
tion, there is no directly related work with our approach,
to the best of our knowledge. Therefore, the use of a deep
learning-based guidance system for interactive process can
be viewed as a promising but under-explored approach.
3. Proposed Approach
3.1. Problem Setting
The goal of the interactive colorization task is to train
networks to generate a colored image ˆ
YR3×H×Wby
taking as input a sketch image XR1×H×Walong with
user-provided partial hints U, where Hand Windicate the
height and width of the target image, respectively. The user-
provided partial hints are defined as a pair U= (C, M)
where CR3×H×Wis a sparse tensor with RGB values,
and M∈ {0,1}1×H×Wis a binary mask indicating the
region in which the color hints are provided. Our training
framework consists of two networks and one function: seg-
mentation network f(Section 3.2), colorization network g
(Section 3.4), and a hint generation function called h(Sec-
tion 3.3), which are trained in an end-to-end manner.
3.2. Segmentation Network
The purpose of segmentation network f(·)is to divide
the sketch input Xinto several semantic regions which are
expected to be painted in a single color, i.e.,
S=f(X;θf),(1)
where S= (S1, S2, ..., SNc)∈ {0,1}Nc×H×W,Siis the i-
th guided region, and Ncdenotes the maximum number of
hints. Specifically, fcontains an encoder-decoder network
with skip connections, based on U-Net [10] architecture, to
preserve the spatial details of given objects.
Since each guided region will be painted with a single
color, we have to segment the output of U-Net in a dis-
crete form while taking advantages of end-to-end learn-
ing. To this end, after obtaining an output tensor Slogit
RNc×H×Wof U-Net, we discretize Slogit by applying
straight-through (ST) gumbel estimator [11, 19] across
channel dimensions to obtain Sas a differentiable approxi-
mation. The result Ssatisfies PNc
i=1 Si(j) = 1 where Si(j)
indicates the i-th scalar value of the j-th position vector, i.e.,
every pixel is contained in only one guided region. Here,
Si(j) = 1 indicates that the j-th pixel is contained in the
i-th guided region while Si(j) = 0 indicates that the pixel
is not contained in the guided region.
3.3. Hint Generation
The hint generation function h(·)is a non-parametric
function that plays the role of simulating Ubased on S,
a colored image Y, and the number of hints Nh, i.e.,
U=h(S, Y, Nh).(2)
To this end, we first randomly sample Nhfrom a bounded
distribution which is similar to a geometric distribution for-
mulated as
G(Nh=i) = ((1 p)ipif i= 0,1, ..., Nc1
(1 p)Ncif i=Nc,(3)
where p < 1is a hyperparameter indicating the probability
that the user stops adding a hint on each trial. We set Nc=
30 and p= 0.125 for the following experiments.
Step1: building masked segments ˜
S.Given Nh, we con-
struct a mask vector m∈ {0,1}Nchaving each element
with the following rule:
mi=(1if iNh
0otherwise,(4)
where miindicates the i-th scalar value of the vector m.
Afterwards, we obtain a masked segment ˜
SRNc×H×W
by element-wise multiplying the i-th element of mwith the
i-th channel of Sas
˜
Si=miSi,(5)
where Si,˜
SiR1×H×Wdenote the i-th channel of Sand
˜
S, respectively.
Step2: building hint maps C.The goal of this step is to
find the representative color value of the activated region in
each segment ˜
Si, and then to fill the corresponding region
with this color. To this end, we calculate a mean RGB color
¯ciR3as
¯ci=(1
NpPHW
jSi(j)Y(j)if 1Np
0otherwise,(6)
where Np=PjSi(j)indicates the number of activated
pixels of the i-th segment, denotes an element-wise mul-
tiplication, i.e., the Hadamard product, after each element of
Siis broadcast to the RGB channels of Y, and both Si(j)
and Y(j)indicate the j-th position vector of each map. Fi-
nally, we obtain hint maps CR3×H×Was
C=
Nc
X
i=1
¯ci˜
Si,(7)
where ¯ciis repeated to the spatial axis as the form of
˜
SiR1×H×Wsimilar to Eq. (5) and ˜
Siis broadcast to
the channel axis as the form of ¯ciR3as in Eq. (6). In
order to indicate the region of given hints, we simply obtain
a condition mask MR1×H×Was
M=
Nc
X
i=1
˜
Si.(8)
Eventually, the output of this module U=CM
R4×H×Wwhere indicates a channel-wise concatenation.
Fig. 2 illustrates overall scheme of the hint generation pro-
cess. At the inference time, we can create Usimilar to the
hint generation process, but without an explicit groundtruth
image. Note that a sketch image is all we need to produce ˜
S
at the inference time. We can obtain Cand Mby assigning
a color to each Sifor i= 1,2, ..., Nh.
To understand how the hint generation module works,
recall that Nhis randomly sampled from the bounded geo-
metric distribution G(Eq. (3)) per mini-batch at the training
time. Since the probability that iNhis higher than the
probability that jNhfor i < j,Siis more frequently
activated than Sjduring training the model. Hence, we
can expect the following effects via this module: i) Nhaf-
fects in determining how many segments starting from the
first channel of Sas computed in Eq. (4-5); therefore, this
mechanism encourages the segmentation network f(·)to lo-
cate relatively important and uncertain regions at the for-
ward indexes of S. Section 4.5 shows this module behaves
as our expectation. ii) We can provide more abundant in-
formation for the following colorization networks g(·)than
previous approaches without requiring additional labels at
training time or even interactions at test time, helping to
generate better results even with fewer hints than baselines
(Section 4.3).
3.4. Colorization Network
The colorization network g(·)aims to generate a colored
image ˆ
Yby taking all the information obtained from the
previous steps, i.e., a sketch image X, guided regions S,
and partial hints U, as
ˆ
Y=g(X, S, U;θg).(9)
The reason for using the segments as input is to provide in-
formation about the color relationship, which the segmen-
tation network infers. In order to capture the context of the
input and to preserve the spatial information of the sketch
image, our colorization networks also adopt the U-Net ar-
chitecture, the same as in the segmentation network. We
then apply a hyperbolic tangent activation function to nor-
malize the output tensor of the U-Net.
摘要:

GuidingUserstoWheretoGiveColorHintsforEfficientInteractiveSketchColorizationviaUnsupervisedRegionPrioritizationYounginCho*1JunsooLee*2SoyoungYang1JuntaeKim3YeojeongPark1HaneolLee4MohammadAzamKhan1DaesikKim2JaegulChoo11KAISTAI2NAVERWEBTOONAI3KoreaUniversity4UNIST{choyi0521,syyang,indigopyj,azamkhan,j...

展开>> 收起<<
Guiding Users to Where to Give Color Hints for Efficient Interactive Sketch Colorization via Unsupervised Region Prioritization Youngin Cho1Junsoo Lee2Soyoung Yang1Juntae Kim3Yeojeong Park1Haneol Lee4.pdf

共19页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:19 页 大小:5.08MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 19
客服
关注