
is used to train the model and the validation set is
used as a proxy of the test set.
Neural network hyperparameters are chosen to
minimize prediction error on the validation set (in
this case, it’s sometimes called a tuning set). HPO
may overfit the validation set and the best method
to combat this is an open area of research [2]. To
avoid overfitting, a combination of expert intuition
and hand-tuning is often used. One approach, re-
cently studied [3] for chemometrics, is to find stable
optima where the RMSE on the validation set (with
respect to the hyperparameters) is wide (doesn’t
change much with slight perturbations) rather than
narrow [2]. We take a complementary approach:
encourage the model to extrapolate by the choice
of validation samples used in hyperparameter opti-
mization.
Extrapolating in time is often difficult because
the future is different from the past. The dataset
of mangoes by Anderson et al [4, 5] is a good ex-
ample: Using spectra from 3 years, the goal is to
predict dry matter (DM) content in the next year.
Thus, we want a neural network configuration that
doesn’t overfit the past but extrapolates well to the
future. In previous work, the validation set is 1/3
of non-test data, sampled randomly. We test two
alternatives to avoid overfitting in HPO and encour-
age the neural network model to extrapolate: First,
we use the latest 1/3 of samples (sorted by time).
Second, we use a semantically meaningful subset
[6]; specifically, the latest harvest season (2017).
Due to the stochastic nature of training algo-
rithms, neural networks have different weights and
different errors each time they’re trained. We re-
port the distributions of RMSE scores for the pur-
pose of fairly evaluating each method. The variance
of errors is also problematic for HPO because the
prediction error on the validation set is an estimate
of how well the model will perform on the test set
and in deployment; a poor estimate leads to a sub-
optimal neural network configuration.
We investigate using ensembles to reduce the
variance of validation-set error during hyperparam-
eter optimization. Ensembles (of many kinds) have
been shown to improve accuracy, reduce variance,
and improve robustness to domain shift [7, 8, 9].
Specifically, we obtain an ensemble by re-initializing
a neural network randomly and re-training it a
number of times [7, 10]; this model reduces the por-
tion of the variance that is due to random initial-
ization.
To test these methods, we do a comprehensive
study of a held-out 2018 harvest season of mango
fruit given VNIR spectra from 3 prior years [4].
We conduct hyperparameter optimization for each
choice of validation set and compare HPO with and
without ensemble averaging. The results in this
study sheds light on reproducible and automated
practices for configuring and training neural net-
works for spectroscopy; these results can inform
practitioners what steps to take in building their
own models to make predictions for future samples.
2. Methodologies
2.1. Data set
Visible and near-infrared (VNIR) spectra of
mango fruit from four harvest seasons (2015, 2016,
2017, and 2018) are publicly available [11]. The
spectral bands range 300 −1100 nm with approx-
imately 3.3 nm intervals [4]. Near infrared spec-
troscopy allows for non-invasive assessment of fruit
quality. In this case, the prediction target is the
percent of dry matter (DM) content. DM % is an
index of total carbohydrates which indicates quality
of mango fruit [4].
Mishra and Passos [3] make a number of modi-
fications to the mango fruit dataset (available on-
line1), specifically: (1) only a subset (684–990 nm,
3.3 nm intervals) of the available spectral bands
are used, (2) outliers have been removed from non-
test-set samples, (3) chemometric pre-processing
techniques were applied and concatenated together,
and (4) each feature is standardized separately.
Standardization of a distribution entails subtract-
ing each value by the mean of the distribution and
then dividing it by the standard deviation of the
distribution. Each sample in the dataset consists of
DM %, as the target to predict, and the concate-
nation of 6 vectors (each with 103 elements) which
are:
1. The raw spectrum
2. The first derivative of the smoothed spectrum
(smoothing uses a Savitzky–Golay filter with
window size of 13).
3. The second derivative of the smoothed spec-
trum.
1https://github.com/dario-passos/DeepLearning_
for_VIS-NIR_Spectra/raw/master/notebooks/Tutorial_
on_DL_optimization/datasets/mango_dm_full_outlier_
removed2.mat
2