Automatic Neural Network Hyperparameter Optimization for Extrapolation Lessons Learned from Visible and Near-Infrared Spectroscopy of Mango Fruit

2025-05-02 0 0 1.22MB 12 页 10玖币
侵权投诉
Automatic Neural Network Hyperparameter Optimization for
Extrapolation: Lessons Learned from Visible and Near-Infrared
Spectroscopy of Mango Fruit
Matthew Dirksa,
, David Poolea
aUniversity of British Columbia, 2366 Main Mall, Vancouver, British Columbia, Canada
Abstract
Neural networks are configured by choosing an architecture and hyperparameter values; doing so often
involves expert intuition and hand-tuning to find a configuration that extrapolates well without overfitting.
This paper considers automatic methods for configuring a neural network that extrapolates in time for the
domain of visible and near-infrared (VNIR) spectroscopy. In particular, we study the effect of (a) selecting
samples for validating configurations and (b) using ensembles.
Most of the time, models are built of the past to predict the future. To encourage the neural network
model to extrapolate, we consider validating model configurations on samples that are shifted in time similar
to the test set. We experiment with three validation set choices: (1) a random sample of 1/3 of non-test data
(the technique used in previous work), (2) using the latest 1/3 (sorted by time), and (3) using a semantically
meaningful subset of the data. Hyperparameter optimization relies on the validation set to estimate test-set
error, but neural network variance obfuscates the true error value. Ensemble averaging—computing the
average across many neural networks—can reduce the variance of prediction errors.
To test these methods, we do a comprehensive study of a held-out 2018 harvest season of mango fruit given
VNIR spectra from 3 prior years. We find that ensembling improves the state-of-the-art model’s variance and
accuracy. Furthermore, hyperparameter optimization experiments—with and without ensemble averaging
and with each validation set choice—show that when ensembling is combined with using the latest 1/3 of
samples as the validation set, a neural network configuration is found automatically that is on par with the
state-of-the-art.
Keywords: Extrapolation, Convolutional Neural Network, Ensemble Averaging, Hyperparameter
Optimization, Automated Machine Learning
1. Introduction
This paper considers how to automatically con-
figure neural network hyperparameters such that it
extrapolates in time for visible and near-infrared
(VNIR) spectroscopy. Hyperparameter optimiza-
tion (HPO) is a significant undertaking. Neural
networks are configured by choosing an architec-
ture (such as number of layers) and hyperparame-
ter values (such as learning rate), all of which may
be optimized at once during HPO. Even when us-
ing state-of-the-art Bayesian optimization software,
Corresponding author
Email addresses: mcdirks@cs.ubc.ca (Matthew Dirks),
poole@cs.ubc.ca (David Poole)
HPO still involves many decisions and intuitions
(some of which are explained in a recent tutorial
[1]). This paper is about further streamlining the
process of hyperparameter optimization in order to
do so automatically, without overfitting, and in a
manner that mimics an expertly-tuned model.
A dataset is partitioned into test and non-test
samples. Given the non-test samples, the goal is to
build a predictor that works the best on the test set.
The test set is only used to evaluate final models.
If a neural network is trained on all non-test data it
will overfit. To avoid overfitting, the non-test data
is partitioned into calibration and validation sets (in
machine learning literature, these are often called
training and development sets). The calibration set
Preprint submitted to Chemometrics and Intelligent Laboratory Systems October 5, 2022
arXiv:2210.01124v1 [eess.IV] 3 Oct 2022
is used to train the model and the validation set is
used as a proxy of the test set.
Neural network hyperparameters are chosen to
minimize prediction error on the validation set (in
this case, it’s sometimes called a tuning set). HPO
may overfit the validation set and the best method
to combat this is an open area of research [2]. To
avoid overfitting, a combination of expert intuition
and hand-tuning is often used. One approach, re-
cently studied [3] for chemometrics, is to find stable
optima where the RMSE on the validation set (with
respect to the hyperparameters) is wide (doesn’t
change much with slight perturbations) rather than
narrow [2]. We take a complementary approach:
encourage the model to extrapolate by the choice
of validation samples used in hyperparameter opti-
mization.
Extrapolating in time is often difficult because
the future is different from the past. The dataset
of mangoes by Anderson et al [4, 5] is a good ex-
ample: Using spectra from 3 years, the goal is to
predict dry matter (DM) content in the next year.
Thus, we want a neural network configuration that
doesn’t overfit the past but extrapolates well to the
future. In previous work, the validation set is 1/3
of non-test data, sampled randomly. We test two
alternatives to avoid overfitting in HPO and encour-
age the neural network model to extrapolate: First,
we use the latest 1/3 of samples (sorted by time).
Second, we use a semantically meaningful subset
[6]; specifically, the latest harvest season (2017).
Due to the stochastic nature of training algo-
rithms, neural networks have different weights and
different errors each time they’re trained. We re-
port the distributions of RMSE scores for the pur-
pose of fairly evaluating each method. The variance
of errors is also problematic for HPO because the
prediction error on the validation set is an estimate
of how well the model will perform on the test set
and in deployment; a poor estimate leads to a sub-
optimal neural network configuration.
We investigate using ensembles to reduce the
variance of validation-set error during hyperparam-
eter optimization. Ensembles (of many kinds) have
been shown to improve accuracy, reduce variance,
and improve robustness to domain shift [7, 8, 9].
Specifically, we obtain an ensemble by re-initializing
a neural network randomly and re-training it a
number of times [7, 10]; this model reduces the por-
tion of the variance that is due to random initial-
ization.
To test these methods, we do a comprehensive
study of a held-out 2018 harvest season of mango
fruit given VNIR spectra from 3 prior years [4].
We conduct hyperparameter optimization for each
choice of validation set and compare HPO with and
without ensemble averaging. The results in this
study sheds light on reproducible and automated
practices for configuring and training neural net-
works for spectroscopy; these results can inform
practitioners what steps to take in building their
own models to make predictions for future samples.
2. Methodologies
2.1. Data set
Visible and near-infrared (VNIR) spectra of
mango fruit from four harvest seasons (2015, 2016,
2017, and 2018) are publicly available [11]. The
spectral bands range 300 1100 nm with approx-
imately 3.3 nm intervals [4]. Near infrared spec-
troscopy allows for non-invasive assessment of fruit
quality. In this case, the prediction target is the
percent of dry matter (DM) content. DM % is an
index of total carbohydrates which indicates quality
of mango fruit [4].
Mishra and Passos [3] make a number of modi-
fications to the mango fruit dataset (available on-
line1), specifically: (1) only a subset (684–990 nm,
3.3 nm intervals) of the available spectral bands
are used, (2) outliers have been removed from non-
test-set samples, (3) chemometric pre-processing
techniques were applied and concatenated together,
and (4) each feature is standardized separately.
Standardization of a distribution entails subtract-
ing each value by the mean of the distribution and
then dividing it by the standard deviation of the
distribution. Each sample in the dataset consists of
DM %, as the target to predict, and the concate-
nation of 6 vectors (each with 103 elements) which
are:
1. The raw spectrum
2. The first derivative of the smoothed spectrum
(smoothing uses a Savitzky–Golay filter with
window size of 13).
3. The second derivative of the smoothed spec-
trum.
1https://github.com/dario-passos/DeepLearning_
for_VIS-NIR_Spectra/raw/master/notebooks/Tutorial_
on_DL_optimization/datasets/mango_dm_full_outlier_
removed2.mat
2
1 to 13 kernel(s)
conv.
layer
input spectrum
kernel width:
3 to 29
...
...
...
...
1 to 4 fully-connected layers
DM %
L2 regularization applied
to kernels and weights
4 96 units in the first
fully-connected layer
output
F.C. layers are
½ size of previous layer
Figure 1: Neural network architecture and hyperparameter search space. Hyperparameters are shown in blue text.
4. Standardized spectrum (Standard Normal
Variate, or SNV, of the spectrum).
5. The first derivative of the smoothed SNV spec-
trum.
6. The second derivative of the smoothed SNV
spectrum.
2.2. Baseline Neural Network
The state-of-the-art prediction model for this
dataset is a convolutional neural network (CNN),
model “B” by Mishra and Passos [3], which we’ll
refer to as CNNB. This model will serve as the
baseline to compare our results. Its hyperparame-
ters were optimized through a combination of mul-
tiple stages of grid search, expert intuition based on
experience, and a careful analysis of overfitting. For
details, readers are referred to the original paper [3]
but we summarize the architecture and main hyper-
parameters here. The CNNBarchitecture (which is
similar to Figure 1) consists of a convolutional layer
(1 kernel of width 21, stride 1) followed by three
fully-connected layers (of size 36, 18, and 12) with
exponential linear unit (ELU) activations. Kernel
and fully-connected weights are initialized by the
He normal initialization method and regularized
with an L2with a coefficient of 0.0055. Training
proceeds for 750 epochs with mini-batches of size
128 and early stops when validation loss stops im-
proving for 50 epochs. Learning rate (LR) starts
at 0.005 and halves each time validation loss stops
improving for 25 epochs until the minimum LR is
reached (1e-6).
The weights of the neural network are optimized
by stochastic gradient descent using the ADAM al-
gorithm. Since training is stochastic (weights are
initialized randomly and mini-batching randomly
shuffles the data in between epochs), the weights
of the network may converge to any one of many
possible settings. In the results, we report the dis-
tribution of the errors given by randomly initializ-
ing and re-training, which is fairer than reporting
a sample from the distribution of errors; this im-
proves reproducibility and reveals more about the
neural network’s performance [12].
2.3. Hyperparameter Search Space and Optimiza-
tion
Neural networks can take on many different ar-
chitectures, each with many possible hyperparam-
eters. Any specific assignment of all the hyperpa-
rameters is referred to as a configuration. In this
study, the problem of neural architecture search is
treated as additional hyperparameters and will be
optimized in conjunction with other hyperparame-
ters [13].
The space of possible architectures used in our
hyperparameter search is based on typical archi-
tectures used in spectroscopy applications [14, 15,
3
摘要:

AutomaticNeuralNetworkHyperparameterOptimizationforExtrapolation:LessonsLearnedfromVisibleandNear-InfraredSpectroscopyofMangoFruitMatthewDirksa,,DavidPooleaaUniversityofBritishColumbia,2366MainMall,Vancouver,BritishColumbia,CanadaAbstractNeuralnetworksarecon guredbychoosinganarchitectureandhyperpar...

展开>> 收起<<
Automatic Neural Network Hyperparameter Optimization for Extrapolation Lessons Learned from Visible and Near-Infrared Spectroscopy of Mango Fruit.pdf

共12页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:12 页 大小:1.22MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 12
客服
关注