Exploring CNN-basedmodelsfor images aesthetic score prediction with using ensemble Ying Dai

2025-04-27 1 0 3.36MB 15 页 10玖币

侵权投诉

Exploring CNN-based models for image’s aesthetic score prediction

with using ensemble

Ying Dai

Iwate Prefectural University, Takizawa, Japan

dai@iwate-pu.ac.jp

Abstract:

In this paper, we proposed a framework of constructing two types of the automatic image aesthetics

assessment (IAA) models with different CNN architectures and improving the performance of the

image’s aesthetic score (AS) prediction by the ensemble. Moreover, the attention regions of the models

to the images are extracted to analyze the consistency with the subjects in the images. The experimental

results verify that the proposed method is effective for improving the AS prediction. Moreover, it is

found that the AS classification models trained on XiheAA dataset [25] seem to learn the latent

photography principles, although it can’t be said that they learn the aesthetic sense.

Keywords:

Aesthetic score prediction; CNN architecture; Ensemble; Photography composition

principle; Attention region

1. Introduction

Automatic image aesthetics assessment (IAA) can be applied to a variety of tasks, such as image

recommendation, image retrieval, photo management, and product design (cooking). In [1], the authors

give an experimental survey about this field’s research. In this paper, besides the discussion of the main

contributions of the reviewed approaches, the authors systematically evaluate deep leaning settings that

are useful for developing a robust deep model for aesthetic scoring. Early efforts of IAA focus on

extracting designed hand-crafted features according to the known photographic principles, for example,

the rule of thirds, color harmony, and global image layout [2-5]. With the advance of convolutional

neural network (CNN), recent methods aim to map image aesthetics to different types of tasks using

CNNs, majorly including high/low quality classification, aesthetic score prediction and their

distribution [6-11]. Although some achievements have obtained, the state-of-art research involves the

attention mechanism the and layout-aware graph convolutional network in IAA, so as to improve the

performance of the aesthetic score prediction, and so on.

In [12], a multi-patch aggregation method for image aesthetic assessment with preserving the original

aspect ratio is proposed. In this method, the goal is achieved by resorting to an attention-based

mechanism that adaptively adjusts the weight of each patch of the image. In [13], the authors propose

a gated peripheral–foveal convolutional neural net-work. It is a double-subnet neural network. The

former aims to encode the holistic in-formation and provide the attended regions. The latter aims to

extract fine-grained features on these key regions. Then, a gated information fusion network is

employed for the image aesthetic prediction. In [14], the authors propose a novel multimodal recurrent

attention CNN, which incorporates the visual information with the text information. This method

employs the recurrent attention network to focus on some key regions to extract visual features. In [29,

30], the contributions of different regions at object level to aesthetics are adaptively predicted. However,

it has been validated that feeding the weighted key regions to CNN to train the IAA model degrades the

performance of prediction according to our preliminary experiments, because the aesthetic assessment

is influenced by holistic information in the image. Weakening some regions results in the information

degradation for aesthetic assessment.

In [31], a hierarchical layout-aware graph convolutional network is involved to capture layout

information for unified IAA. However, although there is a strong correlation between image layouts

and perceived image quality, the image layout is neither the sufficient condition nor the necessary

condition to determine the image’s aesthetic quality. In fact, several typical failure cases presented in

[31] confirm the above statement. Figure 5 in the paper shows several failure cases. Some pictures

appear the good lay-outs that seem to meet the rule-of-thirds and are predicted to have a high rating.

However, the ground truths (GT) of these images are of low ratings. A picture seems not to meet the

photography composition principles and is assigned to a low rating. However, its GT is of high rating.

Generally, modeling IAA is supervised learning. Most of the research utilize the labeling data of the

images regarding aesthetics in the public photo dataset, such as CUHK-PQ [1] or AVA [28], to train the

model. However, these aesthetic data are almost labeled by the amateurs. Whether the labeling data

embody the latent principles of aesthetics is not clear. So, whether the IAA models trained on these

datasets are significant is also unclear. To make the labelled data embody the photo’s aesthetic

principles, the author in [25] aims to establish a photo dataset called XiheAA which are scored by an

experienced photographer, because it is assumed that the experienced photographers should have the

higher ability of reflecting the latent principles of aesthetics when they assess the photos. These labelled

images are used to train the IAA model. However, the IAA exhibit a highly-skewed score distribution.

in order to solve the imbalance issue in aesthetic assessment, in this paper, the author proposes a method

of repetitive self-revised learning (RSRL) to retrain the CNN-based aesthetic score prediction model

repetitively by transfer learning, so as to improve the performance of imbalance classification caused

by the overconcentration distribution of the scores. Moreover, in [32], the author focuses on the issue

of CNN-based RSRL to explore suitable metrics for Establishing an Optimal Model of IAA. Further,

the learned feature maps of the model are utilized to define the first fixation perspective (FFP) and the

assessment interest region (AIR), so as to analyze whether the aesthetics features are learned by the

optimal model. Although RSRL shows the effectiveness on the imbalance classification by several

experiments, how to construct an aesthetic score prediction model which really embodies the aesthetic

principles on IAA is not involved.

In photography, it is known that two important elements of assessing a photograph are the subject

and the holistic composition. One standard for a good photograph is that the image should be achieve

attention-subject consistency. Inspired by the above knowledge, we propose a framework of

constructing two types of IAA models with different CNN architectures and improving the performance

of the image’s AS prediction by the ensemble, and analyzing the consistency of the subject with the

attention regions of the models. The contributions of the paper are summarized as follows.



Besides fine-tuning the pretrained models, a new CNN architecture which could embody the

holistic composition of the image is designed. Based on this architecture, the models with different

architectural parameters are trained on XiheAA dataset [25] to predict the image’s aesthetic score.



The performances of the above models are evaluated, and an ensemble method of aggregating two

models is proposed to improve the performance of the AS prediction.



The feature maps of the models regarding the images are analyzed. It is found that the attention

regions of the models are often consistent with the subjects of the images, and follow the simple

photography composition guidelines, such as visual balance, and rule of thirds, if they are

predicted to have the high aesthetic scores, otherwise the opposite, whether or not the correct

predictions are made. It is indicated that the models trained on XiheAA seem to learn the latent

photography composition principles, but it cannot be said that they learned the aesthetic sense.

2. Related work

Image Aesthetics Assessment (IAA)

Besides the research mentioned in the Section Introduction, the

other main-stream research on IAA is as the following.

In [15], the authors propose a unified algorithm to solve the three problems of image aesthetic

assessment, score regression, binary classification, and personalized aesthetics based on pairwise

comparison. The model for personalized regression is trained on the FLICKERAES dataset [16].

However, the ground truth score was set to the mean of five workers’ scores. Accordingly, whether the

predicted score embodies the inherently personal aesthetics is not clear.

On the other hand, some researchers aim at extracting and analyzing the aesthetic features to find the

relation with the aesthetic assessment. In [17], the paper presents an in-depth analysis of the deep

models and the learned features for image aesthetic assessment in various viewpoints. In particular, the

analysis is based on transfer learning among image classification and aesthetics classifications. The

authors find that the learned features for aesthetic classification are largely different for those for image

classification; i.e., the former accounts for color and overall harmony, while the latter focus-es on

texture and local information. However, whether this finding is universal needs to be validated further.

In [18], besides extracting deep CNN features, five algorithms for handcrafted extracting aesthetic

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ExploringCNN-basedmodelsforimage’saestheticscorepredictionwithusingensembleYingDaiIwatePrefecturalUniversity,Takizawa,Japandai@iwate-pu.ac.jpAbstract:Inthispaper,weproposedaframeworkofconstructingtwotypesoftheautomaticimageaestheticsassessment(IAA)modelswithdifferentCNNarchitecturesandimprovingthepe...

展开>> 收起<<

Exploring CNN-basedmodelsfor images aesthetic score prediction with using ensemble Ying Dai.pdf

共15页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Exploring CNN-basedmodelsfor images aesthetic score prediction with using ensemble Ying Dai

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: