or inpainting techniques remove a selected region from
the image and fills the space with new pixel values esti-
mated from background [37]. Image enhancement exploits
a wide collection of local manipulations, such as sharpen-
ing, brightness adjustment, etc. Each of the broader cate-
gories can be further divided into more fine-grained forgery
types. For example, Gaussian blurring or JPEG compres-
sion may be applied to the tampered region before commit-
ting splicing or copy-move forgery. Recently, more general-
purpose image forgery localization methods have been pro-
posed, which can detect or localize more than one forgery
type, such as RGB-N Net [41], Manipulation Tracing Net-
work (ManTraNet) [37], Spatial Pyramid Attention Net-
work (SPAN) [22], etc.
These general image forgery detection or localization
methods usually rely on different forgery clues or footprints
left by the forgery operation, such as JPEG artifacts [27, 1],
edge inconsistency [32, 39], noise pattern [13, 38], cam-
era model [31], EXIF inconsistency [23], etc., to detect or
localize forgery. Table 1 of [37] summarizes existing ma-
jor forgery localization methods and the forgery clues the
methods focus on. For example, [2] employs LSTM based
patch comparison to focus on edge inconsistency between
the tampered patches and authentic patches. CAT-Net [26]
leverages DCT coefficients to focus on resampling clues.
However, training models to focus on specific forgery
clues has a major disadvantage. Because then, the model
can only detect forgery if that particular forgery footprint is
prominent in the forged image. This is unacceptable be-
cause, in real-life, different manipulation techniques can
leave behind wide variety of forgery clues. Thus, focus-
ing on specific forgery clues is not optimal. For example,
if a method focuses on edge inconsistency to detect forgery,
the method will not perform well on a forged image where
the boundary between untampered and manipulated region
is smooth. Again, if a method focuses on resampling fea-
tures, it will struggle to detect forgery if an image has the
same JPEG compression applied several times to both the
untampered and manipulated regions.
Another major disadvantage of existing methods is that
these methods use cross-entropy loss without additional
constraints for training. Recently, [40] stated that tra-
ditional cross-entropy based methods assume that all in-
stances within each category should be close in feature dis-
tribution. This ignores the unique information of each sam-
ple. Thus, cross-entropy loss encourages the model to ex-
tract similar features for same category. This might be help-
ful for classification or segmentation of datasets such as Im-
agenet or Cityscapes, where objects of the same category
should have similar features. However, in the case of image
forgery localization, extracting similar features for all the
tampered regions in the dataset is not optimal as different
manipulation operations leave behind different forgery foot-
prints in the tampered regions. Hence, without additional
constraints, a common cross-entropy loss-based framework
is prone to over-fitting on specific forgery patterns [28].
This is not conducive to generalization.
Taking all these limitations into consideration, we pro-
pose a novel forgery localization method named Contrastive
Forgery Localization Network or CFL-Net, based on re-
cently proposed contrastive loss [24]. Our method relies on
the general assumption in underlying forged region local-
ization that there remains a difference of feature statistics,
i.e., color, intensity, noise, etc., between untampered region
and manipulated region [22], irrespective of the forgery
type. In this paper, we focus on leveraging this difference
in the feature space to aid in image forgery localization via
contrastive loss. Specifically, our model learns mapping
into a feature space where the features between untampered
and manipulated regions are well-separated and dispersed
for each image. Thus, our method does not focus on spe-
cific forgery clues. Also, we calculate the contrastive loss
for each sample. Hence, our method treats the forgery clues
of each sample differently, which helps in generalization.
Our main contributions are summarized as follows:
• We propose a novel image forgery localization method
called CFL-Net. Our method leverages the difference
of feature distribution between untampered and manip-
ulated regions of each image sample and does not fo-
cus on specific forgery footprints. Hence, our method
is more well-suited to detect real-life forgery.
• We address the problem of using cross-entropy loss
without any constraints for general purpose image
forgery localization. We incorporate contrastive loss
and especially tailor it towards solving this problem.
• We perform extensive experiments on benchmark ma-
nipulation datasets to show that our method out-
performs several existing image forgery localization
methods.
2. Related Works
2.1. Image Forgery Localization
Image forgery methods are concerned with forgery clas-
sification or localization. Classification is basically predict-
ing whether an image is forged or non-forged, whereas,
forgery localization is concerned with locating the forged
region as well. The latter is a segmentation task.
In pre deep learning era, methods used hand-crafted fea-
tures such as local noise analysis [16, 10], CFA artifacts
[15], JPEG compression [5] etc. Recent works usually
use deep learning based methods in conjunction with these
forgery traces to localize forged regions. Bappy et al. [2]
exploit the edge inconsistency trace using LSTM to local-
ize forgery. The work is later improved in [3], where the