Exploring CNN-basedmodelsfor images aesthetic score prediction with using ensemble Ying Dai

2025-04-27 0 0 3.36MB 15 页 10玖币
侵权投诉
Exploring CNN-based models for images aesthetic score prediction
with using ensemble
Ying Dai
Iwate Prefectural University, Takizawa, Japan
dai@iwate-pu.ac.jp
Abstract:
In this paper, we proposed a framework of constructing two types of the automatic image aesthetics
assessment (IAA) models with different CNN architectures and improving the performance of the
image’s aesthetic score (AS) prediction by the ensemble. Moreover, the attention regions of the models
to the images are extracted to analyze the consistency with the subjects in the images. The experimental
results verify that the proposed method is effective for improving the AS prediction. Moreover, it is
found that the AS classification models trained on XiheAA dataset [25] seem to learn the latent
photography principles, although it can’t be said that they learn the aesthetic sense.
Keywords:
Aesthetic score predictio CNN architecture; Ensemble; Photography composition
principle; Attention region
1. Introduction
Automatic image aesthetics assessment (IAA) can be applied to a variety of tasks, such as image
recommendation, image retrieval, photo management, and product design (cooking). In [1], the authors
give an experimental survey about this field’s research. In this paper, besides the discussion of the main
contributions of the reviewed approaches, the authors systematically evaluate deep leaning settings that
are useful for developing a robust deep model for aesthetic scoring. Early efforts of IAA focus on
extracting designed hand-crafted features according to the known photographic principles, for example,
the rule of thirds, color harmony, and global image layout [2-5]. With the advance of convolutional
neural network (CNN), recent methods aim to map image aesthetics to different types of tasks using
CNNs, majorly including high/low quality classification, aesthetic score prediction and their
distribution [6-11]. Although some achievements have obtained, the state-of-art research involves the
attention mechanism the and layout-aware graph convolutional network in IAA, so as to improve the
performance of the aesthetic score prediction, and so on.
In [12], a multi-patch aggregation method for image aesthetic assessment with preserving the original
aspect ratio is proposed. In this method, the goal is achieved by resorting to an attention-based
mechanism that adaptively adjusts the weight of each patch of the image. In [13], the authors propose
a gated peripheral–foveal convolutional neural net-work. It is a double-subnet neural network. The
former aims to encode the holistic in-formation and provide the attended regions. The latter aims to
extract fine-grained features on these key regions. Then, a gated information fusion network is
employed for the image aesthetic prediction. In [14], the authors propose a novel multimodal recurrent
attention CNN, which incorporates the visual information with the text information. This method
employs the recurrent attention network to focus on some key regions to extract visual features. In [29,
30], the contributions of different regions at object level to aesthetics are adaptively predicted. However,
it has been validated that feeding the weighted key regions to CNN to train the IAA model degrades the
performance of prediction according to our preliminary experiments, because the aesthetic assessment
is influenced by holistic information in the image. Weakening some regions results in the information
degradation for aesthetic assessment.
In [31], a hierarchical layout-aware graph convolutional network is involved to capture layout
information for unified IAA. However, although there is a strong correlation between image layouts
and perceived image quality, the image layout is neither the sufficient condition nor the necessary
condition to determine the image’s aesthetic quality. In fact, several typical failure cases presented in
[31] confirm the above statement. Figure 5 in the paper shows several failure cases. Some pictures
appear the good lay-outs that seem to meet the rule-of-thirds and are predicted to have a high rating.
However, the ground truths (GT) of these images are of low ratings. A picture seems not to meet the
photography composition principles and is assigned to a low rating. However, its GT is of high rating.
Generally, modeling IAA is supervised learning. Most of the research utilize the labeling data of the
images regarding aesthetics in the public photo dataset, such as CUHK-PQ [1] or AVA [28], to train the
model. However, these aesthetic data are almost labeled by the amateurs. Whether the labeling data
embody the latent principles of aesthetics is not clear. So, whether the IAA models trained on these
datasets are significant is also unclear. To make the labelled data embody the photo’s aesthetic
principles, the author in [25] aims to establish a photo dataset called XiheAA which are scored by an
experienced photographer, because it is assumed that the experienced photographers should have the
higher ability of reflecting the latent principles of aesthetics when they assess the photos. These labelled
images are used to train the IAA model. However, the IAA exhibit a highly-skewed score distribution.
in order to solve the imbalance issue in aesthetic assessment, in this paper, the author proposes a method
of repetitive self-revised learning (RSRL) to retrain the CNN-based aesthetic score prediction model
repetitively by transfer learning, so as to improve the performance of imbalance classification caused
by the overconcentration distribution of the scores. Moreover, in [32], the author focuses on the issue
of CNN-based RSRL to explore suitable metrics for Establishing an Optimal Model of IAA. Further,
the learned feature maps of the model are utilized to define the first fixation perspective (FFP) and the
assessment interest region (AIR), so as to analyze whether the aesthetics features are learned by the
optimal model. Although RSRL shows the effectiveness on the imbalance classification by several
experiments, how to construct an aesthetic score prediction model which really embodies the aesthetic
principles on IAA is not involved.
In photography, it is known that two important elements of assessing a photograph are the subject
and the holistic composition. One standard for a good photograph is that the image should be achieve
attention-subject consistency. Inspired by the above knowledge, we propose a framework of
constructing two types of IAA models with different CNN architectures and improving the performance
of the image’s AS prediction by the ensemble, and analyzing the consistency of the subject with the
attention regions of the models. The contributions of the paper are summarized as follows.
Besides fine-tuning the pretrained models, a new CNN architecture which could embody the
holistic composition of the image is designed. Based on this architecture, the models with different
architectural parameters are trained on XiheAA dataset [25] to predict the image’s aesthetic score.
The performances of the above models are evaluated, and an ensemble method of aggregating two
models is proposed to improve the performance of the AS prediction.
The feature maps of the models regarding the images are analyzed. It is found that the attention
regions of the models are often consistent with the subjects of the images, and follow the simple
photography composition guidelines, such as visual balance, and rule of thirds, if they are
predicted to have the high aesthetic scores, otherwise the opposite, whether or not the correct
predictions are made. It is indicated that the models trained on XiheAA seem to learn the latent
photography composition principles, but it cannot be said that they learned the aesthetic sense.
2. Related work
Image Aesthetics Assessment (IAA)
Besides the research mentioned in the Section Introduction, the
other main-stream research on IAA is as the following.
In [15], the authors propose a unified algorithm to solve the three problems of image aesthetic
assessment, score regression, binary classification, and personalized aesthetics based on pairwise
comparison. The model for personalized regression is trained on the FLICKERAES dataset [16].
However, the ground truth score was set to the mean of five workers’ scores. Accordingly, whether the
predicted score embodies the inherently personal aesthetics is not clear.
On the other hand, some researchers aim at extracting and analyzing the aesthetic features to find the
relation with the aesthetic assessment. In [17], the paper presents an in-depth analysis of the deep
models and the learned features for image aesthetic assessment in various viewpoints. In particular, the
analysis is based on transfer learning among image classification and aesthetics classifications. The
authors find that the learned features for aesthetic classification are largely different for those for image
classification; i.e., the former accounts for color and overall harmony, while the latter focus-es on
texture and local information. However, whether this finding is universal needs to be validated further.
In [18], besides extracting deep CNN features, five algorithms for handcrafted extracting aesthetic
摘要:

ExploringCNN-basedmodelsforimage’saestheticscorepredictionwithusingensembleYingDaiIwatePrefecturalUniversity,Takizawa,Japandai@iwate-pu.ac.jpAbstract:Inthispaper,weproposedaframeworkofconstructingtwotypesoftheautomaticimageaestheticsassessment(IAA)modelswithdifferentCNNarchitecturesandimprovingthepe...

展开>> 收起<<
Exploring CNN-basedmodelsfor images aesthetic score prediction with using ensemble Ying Dai.pdf

共15页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:15 页 大小:3.36MB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 15
客服
关注