2 J. Kim et al.
various methodologies with solutions ranging from utilizing canonical steps ap-
proximated by invertible functions [2], mapping RAW data to CIE-XYZ space
from sRGB images [1], a novel modular and differentiable ISP model with in-
terpretable parameters that is capable of end-to-end learning [3] among many
approaches [1,2,3,9,12,13]. With these inherent advantages that RAW data holds,
the task of reconstructing RAW data from RGB images has become exceedingly
relevant, especially with the lack of availability of RAW data due to factors such
as memory-related concerns or data storage processes that discard the RAW.
However, the task of RAW data reconstruction remains a novel area of re-
search with complexities and limitations that are yet to be fully addressed. For
instance, as noted by Conde et al. [3], approximations using inverse functions
for real-world ISPs show degradation in performance when a large portion of the
RGB images are close to overexposure. Our proposed methodology using overex-
posure mask fusion is a novel portion of our pipeline that specifically addresses
this issue by mapping overexposed and non-overexposed pixels separately and
fusing them together using an overexposure mask.
Among various AIM challenges with different research problems [6], for the
AIM Reversed ISP Challenge [4] where competing teams were given the task of
reconstructing RAW data from RGB images, our methodology is a top solution,
and therefore, evaluated as a state-of-the-art solution to the novel inverse prob-
lem. By mapping from RGB to demosaiced RAW by generating a demosaiced
RAW from the groundtruth bayer using Demosaic Net [5], we allow the use of
perceptual losses. With our novel overexposure mask fusion methodology, our
pipeline addresses the issue of overexposed pixels as mentioned by Conde et al.
[3]. It is most notable that the pipeline led to significant enhancement in fidelity
measures while keeping all neural networks within our pipeline as the U-Net
[10]. It is further notable that our methodology can incorporate other proposed
state-of-the-art solutions involving end-to-end learning after slight modifications
to map from RGB images to demosaiced RAW images. For instance, the model
proposed by Conde et al. [3] can be integrated with our refinement pipeline by
making small modifications such as removing the final mosaic step and generat-
ing demosaiced RAW groundtruth images for training in order to use perceptual
loss. We propose, to the best of our knowledge, the first generalizable, multi-
step refinement process for enhanced performance of other reversed ISPs while
addressing the issue of overexposure.
2 Related Works
Works such as [7,11,12] have addressed the task of mapping from RAW data to
RGB images, modeling the camera ISP. Schwartz et al. [11] proposes a full end-
to-end deep learning model of the ISP, which has demonstrated to be capable
of generating visually compelling RGB images from RAW data. Ignatov et al.
[7] proposes another end-to-end deep learning solution with the use of a novel
PyNET CNN architecture and Xing et al. [12] designed an invertible ISP that is
capable of generating visually pleasing RGB images from RAW data as well as