Automatic Neural Network Hyperparameter Optimization for Extrapolation Lessons Learned from Visible and Near-Infrared Spectroscopy of Mango Fruit

2025-05-02 0 0 1.22MB 12 页 10玖币

侵权投诉

Automatic Neural Network Hyperparameter Optimization for

Extrapolation: Lessons Learned from Visible and Near-Infrared

Spectroscopy of Mango Fruit

Matthew Dirksa,∗

, David Poolea

aUniversity of British Columbia, 2366 Main Mall, Vancouver, British Columbia, Canada

Abstract

Neural networks are conﬁgured by choosing an architecture and hyperparameter values; doing so often

involves expert intuition and hand-tuning to ﬁnd a conﬁguration that extrapolates well without overﬁtting.

This paper considers automatic methods for conﬁguring a neural network that extrapolates in time for the

domain of visible and near-infrared (VNIR) spectroscopy. In particular, we study the eﬀect of (a) selecting

samples for validating conﬁgurations and (b) using ensembles.

Most of the time, models are built of the past to predict the future. To encourage the neural network

model to extrapolate, we consider validating model conﬁgurations on samples that are shifted in time similar

to the test set. We experiment with three validation set choices: (1) a random sample of 1/3 of non-test data

(the technique used in previous work), (2) using the latest 1/3 (sorted by time), and (3) using a semantically

meaningful subset of the data. Hyperparameter optimization relies on the validation set to estimate test-set

error, but neural network variance obfuscates the true error value. Ensemble averaging—computing the

average across many neural networks—can reduce the variance of prediction errors.

To test these methods, we do a comprehensive study of a held-out 2018 harvest season of mango fruit given

VNIR spectra from 3 prior years. We ﬁnd that ensembling improves the state-of-the-art model’s variance and

accuracy. Furthermore, hyperparameter optimization experiments—with and without ensemble averaging

and with each validation set choice—show that when ensembling is combined with using the latest 1/3 of

samples as the validation set, a neural network conﬁguration is found automatically that is on par with the

state-of-the-art.

Keywords: Extrapolation, Convolutional Neural Network, Ensemble Averaging, Hyperparameter

Optimization, Automated Machine Learning

1. Introduction

This paper considers how to automatically con-

ﬁgure neural network hyperparameters such that it

extrapolates in time for visible and near-infrared

(VNIR) spectroscopy. Hyperparameter optimiza-

tion (HPO) is a signiﬁcant undertaking. Neural

networks are conﬁgured by choosing an architec-

ture (such as number of layers) and hyperparame-

ter values (such as learning rate), all of which may

be optimized at once during HPO. Even when us-

ing state-of-the-art Bayesian optimization software,

∗Corresponding author

Email addresses: mcdirks@cs.ubc.ca (Matthew Dirks),

poole@cs.ubc.ca (David Poole)

HPO still involves many decisions and intuitions

(some of which are explained in a recent tutorial

[1]). This paper is about further streamlining the

process of hyperparameter optimization in order to

do so automatically, without overﬁtting, and in a

manner that mimics an expertly-tuned model.

A dataset is partitioned into test and non-test

samples. Given the non-test samples, the goal is to

build a predictor that works the best on the test set.

The test set is only used to evaluate ﬁnal models.

If a neural network is trained on all non-test data it

will overﬁt. To avoid overﬁtting, the non-test data

is partitioned into calibration and validation sets (in

machine learning literature, these are often called

training and development sets). The calibration set

Preprint submitted to Chemometrics and Intelligent Laboratory Systems October 5, 2022

arXiv:2210.01124v1 [eess.IV] 3 Oct 2022

is used to train the model and the validation set is

used as a proxy of the test set.

Neural network hyperparameters are chosen to

minimize prediction error on the validation set (in

this case, it’s sometimes called a tuning set). HPO

may overﬁt the validation set and the best method

to combat this is an open area of research [2]. To

avoid overﬁtting, a combination of expert intuition

and hand-tuning is often used. One approach, re-

cently studied [3] for chemometrics, is to ﬁnd stable

optima where the RMSE on the validation set (with

respect to the hyperparameters) is wide (doesn’t

change much with slight perturbations) rather than

narrow [2]. We take a complementary approach:

encourage the model to extrapolate by the choice

of validation samples used in hyperparameter opti-

mization.

Extrapolating in time is often diﬃcult because

the future is diﬀerent from the past. The dataset

of mangoes by Anderson et al [4, 5] is a good ex-

ample: Using spectra from 3 years, the goal is to

predict dry matter (DM) content in the next year.

Thus, we want a neural network conﬁguration that

doesn’t overﬁt the past but extrapolates well to the

future. In previous work, the validation set is 1/3

of non-test data, sampled randomly. We test two

alternatives to avoid overﬁtting in HPO and encour-

age the neural network model to extrapolate: First,

we use the latest 1/3 of samples (sorted by time).

Second, we use a semantically meaningful subset

[6]; speciﬁcally, the latest harvest season (2017).

Due to the stochastic nature of training algo-

rithms, neural networks have diﬀerent weights and

diﬀerent errors each time they’re trained. We re-

port the distributions of RMSE scores for the pur-

pose of fairly evaluating each method. The variance

of errors is also problematic for HPO because the

prediction error on the validation set is an estimate

of how well the model will perform on the test set

and in deployment; a poor estimate leads to a sub-

optimal neural network conﬁguration.

We investigate using ensembles to reduce the

variance of validation-set error during hyperparam-

eter optimization. Ensembles (of many kinds) have

been shown to improve accuracy, reduce variance,

and improve robustness to domain shift [7, 8, 9].

Speciﬁcally, we obtain an ensemble by re-initializing

a neural network randomly and re-training it a

number of times [7, 10]; this model reduces the por-

tion of the variance that is due to random initial-

ization.

To test these methods, we do a comprehensive

study of a held-out 2018 harvest season of mango

fruit given VNIR spectra from 3 prior years [4].

We conduct hyperparameter optimization for each

choice of validation set and compare HPO with and

without ensemble averaging. The results in this

study sheds light on reproducible and automated

practices for conﬁguring and training neural net-

works for spectroscopy; these results can inform

practitioners what steps to take in building their

own models to make predictions for future samples.

2. Methodologies

2.1. Data set

Visible and near-infrared (VNIR) spectra of

mango fruit from four harvest seasons (2015, 2016,

2017, and 2018) are publicly available [11]. The

spectral bands range 300 −1100 nm with approx-

imately 3.3 nm intervals [4]. Near infrared spec-

troscopy allows for non-invasive assessment of fruit

quality. In this case, the prediction target is the

percent of dry matter (DM) content. DM % is an

index of total carbohydrates which indicates quality

of mango fruit [4].

Mishra and Passos [3] make a number of modi-

ﬁcations to the mango fruit dataset (available on-

line1), speciﬁcally: (1) only a subset (684–990 nm,

3.3 nm intervals) of the available spectral bands

are used, (2) outliers have been removed from non-

test-set samples, (3) chemometric pre-processing

techniques were applied and concatenated together,

and (4) each feature is standardized separately.

Standardization of a distribution entails subtract-

ing each value by the mean of the distribution and

then dividing it by the standard deviation of the

distribution. Each sample in the dataset consists of

DM %, as the target to predict, and the concate-

nation of 6 vectors (each with 103 elements) which

are:

1. The raw spectrum

2. The ﬁrst derivative of the smoothed spectrum

(smoothing uses a Savitzky–Golay ﬁlter with

window size of 13).

3. The second derivative of the smoothed spec-

trum.

1https://github.com/dario-passos/DeepLearning_

for_VIS-NIR_Spectra/raw/master/notebooks/Tutorial_

on_DL_optimization/datasets/mango_dm_full_outlier_

removed2.mat

1 to 13 kernel(s)

conv.

layer

input spectrum

kernel width:

3 to 29

...

1 to 4 fully-connected layers

DM %

L2 regularization applied

to kernels and weights

4 – 96 units in the first

fully-connected layer

output

F.C. layers are

½ size of previous layer

Figure 1: Neural network architecture and hyperparameter search space. Hyperparameters are shown in blue text.

4. Standardized spectrum (Standard Normal

Variate, or SNV, of the spectrum).

5. The ﬁrst derivative of the smoothed SNV spec-

trum.

6. The second derivative of the smoothed SNV

spectrum.

2.2. Baseline Neural Network

The state-of-the-art prediction model for this

dataset is a convolutional neural network (CNN),

model “B” by Mishra and Passos [3], which we’ll

refer to as CNNB. This model will serve as the

baseline to compare our results. Its hyperparame-

ters were optimized through a combination of mul-

tiple stages of grid search, expert intuition based on

experience, and a careful analysis of overﬁtting. For

details, readers are referred to the original paper [3]

but we summarize the architecture and main hyper-

parameters here. The CNNBarchitecture (which is

similar to Figure 1) consists of a convolutional layer

(1 kernel of width 21, stride 1) followed by three

fully-connected layers (of size 36, 18, and 12) with

exponential linear unit (ELU) activations. Kernel

and fully-connected weights are initialized by the

He normal initialization method and regularized

with an L2with a coeﬃcient of 0.0055. Training

proceeds for 750 epochs with mini-batches of size

128 and early stops when validation loss stops im-

proving for 50 epochs. Learning rate (LR) starts

at 0.005 and halves each time validation loss stops

improving for 25 epochs until the minimum LR is

reached (1e-6).

The weights of the neural network are optimized

by stochastic gradient descent using the ADAM al-

gorithm. Since training is stochastic (weights are

initialized randomly and mini-batching randomly

shuﬄes the data in between epochs), the weights

of the network may converge to any one of many

possible settings. In the results, we report the dis-

tribution of the errors given by randomly initializ-

ing and re-training, which is fairer than reporting

a sample from the distribution of errors; this im-

proves reproducibility and reveals more about the

neural network’s performance [12].

2.3. Hyperparameter Search Space and Optimiza-

tion

Neural networks can take on many diﬀerent ar-

chitectures, each with many possible hyperparam-

eters. Any speciﬁc assignment of all the hyperpa-

rameters is referred to as a conﬁguration. In this

study, the problem of neural architecture search is

treated as additional hyperparameters and will be

optimized in conjunction with other hyperparame-

ters [13].

The space of possible architectures used in our

hyperparameter search is based on typical archi-

tectures used in spectroscopy applications [14, 15,

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

AutomaticNeuralNetworkHyperparameterOptimizationforExtrapolation:LessonsLearnedfromVisibleandNear-InfraredSpectroscopyofMangoFruitMatthewDirksa,,DavidPooleaaUniversityofBritishColumbia,2366MainMall,Vancouver,BritishColumbia,CanadaAbstractNeuralnetworksareconguredbychoosinganarchitectureandhyperpar...

展开>> 收起<<

Automatic Neural Network Hyperparameter Optimization for Extrapolation Lessons Learned from Visible and Near-Infrared Spectroscopy of Mango Fruit.pdf

共12页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Automatic Neural Network Hyperparameter Optimization for Extrapolation Lessons Learned from Visible and Near-Infrared Spectroscopy of Mango Fruit

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: