
2 Complexity Factors
Object #1 Object #2 Object #3
Scene #1 Scene #2 Scene #3
Figure 2: Complexity in appearance
and geometry for objects and scenes.
As illustrated in the top row of Figure 2, an individual object,
represented by a set of color pixels painted within a mask,
can vary significantly given different types of appearance and
geometric shape. A specific scene, represented by a set of
objects placed within an image, can also differ vastly given
different types of relative appearance and geometric layout
between objects, as illustrated in the bottom row. Unarguably,
such variation and complexity of appearance and geometry in
both object level and scene level directly affects human’s ability
to precisely separate all objects. Naturally, the performance
of unsupervised segmentation models are also expected to be
influenced by the variation. In this regard, we carefully define
the following two groups of factors to quantitatively describe
the complexity of different datasets.
2.1 Object-level Complexity Factors
As to a specific object, all its information can be described by appearance and geometry. Therefore
we define the below two factors to measure the complexity of appearance and geometry respectively.
Notably, both factors are nicely invariant to the object scale.
•Object Color Gradient:
This factor aims to calculate how frequently the appearance changes
within the object mask. In particular, given the RGB image and mask of an object, we firstly
convert RGB into grayscale and then apply Sobel filter [
51
] to compute the gradients horizontally
and vertically for each pixel within the mask. The final gradient value is obtained by averaging out
all object pixels. Note that, the object boundary pixels are removed to avoid the interference of
background. Numerically, the higher this factor is, the more complex texture and/or lighting effect
the object has, and therefore it is likely harder to segment.
•Object Shape Concavity:
This factor is designed to evaluate how irregular the object boundary
is. Particularly, given an object (binary) mask, denoted as
Mobj ∈RH×W
, we firstly find the
smallest convex polygon mask (
Mcvx ∈RH×W
) that surrounds the object mask using an existing
algorithm [
19
], and then the object shape concavity value is computed as:
1−PMobj /PMcvx
.
Clearly, the higher this factor is, the more irregular object shape is, and segmentation is more tricky.
2.2 Scene-level Complexity Factors
As to a specific image, in addition to the object-level complexity, the spatial and appearance relation-
ships between all objects can also incur extra difficulty for segmentation. We define the following two
factors to quantify the complexity of relative appearance and geometry between objects in an image.
•Inter-object Color Similarity:
This factor intends to assess the appearance similarity between all
objects in the same image. Specifically, we firstly calculate the average color for each object, and
then compute the pair-wise Euclidean distances of object colors, obtaining a
K×K
matrix where
K
represents the object number. The average color distance is calculated by averaging the matrix
excluding diagonal entries, and the final inter-object color similarity is computed as:
1−
average
color distance
/(255 ×√3)
. Intuitively, the higher this factor is, the more similar all objects appear
to be, the less distinctive each object is, and it is harder to separate each object.
•Inter-object Shape Variation:
This factor aims to measure the relative geometry diversity between
all objects in the image. We firstly calculate the diagonal length of bounding box for each object,
and then compute the pair-wise absolute differences for all object diagonal lengths, obtaining a
K×K
matrix. The final inter-object shape variation is the average of the matrix excluding diagonal
entries. The higher this factor, the objects within an image have more diverse and imbalanced sizes,
and therefore segmenting both gigantic and tiny objects is likely more challenging.
By capturing the appearance and geometry in both object and scene levels, the four factors are
designed to quantify the complexity of objects and images. For illustration, Figure 3shows sample
images for the four factors at different values. The higher the values, the more complex the objects
3