We need to talk about random seeds Steven Bethard University of Arizona

2025-05-06 0 0 166.44KB 6 页 10玖币

侵权投诉

We need to talk about random seeds

Steven Bethard

University of Arizona

bethard@arizona.edu

Abstract

Modern neural network libraries all take as a

hyperparameter a random seed, typically used

to determine the initial state of the model pa-

rameters. This opinion piece argues that there

are some safe uses for random seeds: as part

of the hyperparameter search to select a good

model, creating an ensemble of several mod-

els, or measuring the sensitivity of the training

algorithm to the random seed hyperparameter.

It argues that some uses for random seeds are

risky: using a ﬁxed random seed for “repli-

cability” and varying only the random seed

to create score distributions for performance

comparison. An analysis of 85 recent publica-

tions from the ACL Anthology ﬁnds that more

than 50% contain risky uses of random seeds.

1 Introduction

Modern neural network libraries all take as a hyper-

parameter a random seed, a number that is used to

initialize a pseudorandom number generator. That

generator is typically used to determine the initial

state of model parameters, but may also affect op-

timization (and inference) in other ways, such as

selecting which units to mask under dropout, or se-

lecting which instances of the training data go into

each minibatch during gradient descent. Like any

hyperparameter, neural network random seeds can

have a large or small impact on model performance

depending on the speciﬁcs of the architecture and

the data. Thus, it is important to optimize the ran-

dom seed hyperparameter as we would any other

hyperparameter, such as learning rate or regulariza-

tion strength.

Such tuning is especially important with the pre-

trained transformer architectures currently popular

in NLP (BERT, Devlin et al.,2019; RoBERTa Liu

et al.,2019; etc.), which are quite sensitive to their

random seeds (Risch and Krestel,2020;Dodge

et al.,2020;Mosbach et al.,2021). Several solu-

tions to this problem have been proposed, including

speciﬁc optimizer setups (Mosbach et al.,2021),

ensemble methods (Risch and Krestel,2020), and

explicitly tuning the random seed like other hyper-

parameters (Dodge et al.,2020).

The NLP community thus has some awareness

of the problems that random seeds present, but it

is inconsistent in its approaches to solving those

problems. The remainder of this opinion piece ﬁrst

presents a taxonomy of different ways that neural

network random seeds are used in the NLP commu-

nity, explaining which uses are safe and which are

risky. It then reviews 85 articles published in the

ACL Anthology, categorizing their random seed

uses based on the taxonomy. This analysis ﬁnds

that more than 50% of the articles include risky

uses of random seeds, suggesting that the NLP

community still needs a broader discussion about

how we approach random seeds.

2 A taxonomy of random seed uses

This section highlights ﬁve common uses of neural

network random seeds in the NLP community, and

categorizes them as either safe or risky.

2.1 Safe use: Model selection

The random seed is a hyperparameter of a neu-

ral network architecture that determines where in

the model parameter space optimization should be-

gin. It may also affect optimization by determining

the order of minibatches in gradient descent, or

through mechanisms like dropout’s random sam-

pling of unit activations. As the random seed is

a hyperparameter, it can and should be optimized

just as other hyperparameters are. Unlike some

other hyperparameters, there is no intuitive expla-

nation of why one random seed would be better or

worse than another, so the typical strategy is to try a

number of randomly selected seeds. For example:

Instead, we compensate for the inher-

ent randomness of the network by train-

arXiv:2210.13393v1 [cs.CL] 24 Oct 2022

ing multiple models with randomized ini-

tializations and use as the ﬁnal model

the one which achieved the best perfor-

mance on the validation set. . . (Björne

and Salakoski,2018)

The test results are derived from the 1-

best random seed on the validation set.

(Kuncoro et al.,2020)

2.2 Safe use: Ensemble creation

Ensemble methods are an effective way of com-

bining multiple machine-learning models to make

better predictions (Rokach,2010). A common ap-

proach to creating neural network ensembles is to

train the same architecture with different random

seeds, and have the resulting models vote (Perrone

and Cooper,1995). For example:

In order to improve the stability of the

RNNs, we ensemble ﬁve distinct models,

each initialized with a different random

seed. (Nicolai et al.,2017)

Our model is composed of the ensemble

of 8 single models. The hyperparameters

and the training procedure used in each

single model are the same except the ran-

dom seed. (Yang and Wang,2019)

2.3 Safe use: Sensitivity analysis

Sometimes it is useful to demonstrate how sensitive

a neural network architecture is to a particular hy-

perparameter. For example, Santurkar et al. (2018)

shows that batch normalization makes neural net-

work architectures less sensitive to the learning rate

hyperparameter. Similarly, it may be useful to show

how sensitive neural network architectures are to

their random seed hyperparameter. For example:

We next (§3.3) examine the expected vari-

ance in attention-produced weights by

initializing multiple training sequences

with different random seeds. . . (Wiegr-

effe and Pinter,2019)

Our model shows a lower standard de-

viation on each task, which means our

model is less sensitive to random seeds

than other models. (Hua et al.,2021)

2.4 Risky use: Single ﬁxed seed

NLP articles sometimes pick a single ﬁxed random

seed, claiming that this is done to improve consis-

tency or replicability. For example:

An arbitrary but ﬁxed random seed was

used for each run to ensure reproducibil-

ity. . . (Le and Fokkens,2018)

For consistency, we used the same set

of hyperparameters and a ﬁxed random

seed across all experiments. (Lin et al.,

2020)

Why is this risky? First, ﬁxing the random seed

does not guarantee replicability. For example, the

tensorﬂow library has a history of producing differ-

ent results given the same random seeds, especially

on GPUs (Two Sigma,2017;Kanwar et al.,2021).

Second, not optimizing the random seed hyperpa-

rameter has the same drawbacks as not optimizing

any other hyperparameter: performance will be an

underestimate of the performance the architecture

is capable of with an optimized model.

What should one do instead? The random seed

should be optimized as any other hyperparameter.

Dodge et al. (2020), for example, show that doing

so leads to simpler models exceeding the published

results of more complex state-of-the-art models on

multiple GLUE tasks (Wang et al.,2018). The

space of hyperparameters explored (and thus the

number of random seeds explored) can be restricted

to match the availability of compute resources with

techniques such as random hyperparameter search

(Bergstra and Bengio,2012) where

hyperparame-

ter settings are sampled from the space of all hyper-

parameter settings (with random seeds treated the

same as all other hyperparameters). In an extremely

resource-limited scenario, random search might se-

lect only a single value of some hyperparameter

(such as random seed), which might be acceptable

given the constraints, but should probably be ac-

companied by an explicit acknowledgement of the

risks of underestimating performance.

2.5 Risky use: Performance comparison

It is a good idea to compare not just the point esti-

mate of a single model’s performance, but distribu-

tions of model performance, as comparing perfor-

mance distributions results in more reliable conclu-

sions (Reimers and Gurevych,2017;Dodge et al.,

2019;Radosavovic et al.,2020). However, it has

sometimes been suggested that such distributions

can be obtained by training the same architecture

and varying only the random seed. For example:

We re-ran both implementations multi-

ple times, each time only changing the

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

WeneedtotalkaboutrandomseedsStevenBethardUniversityofArizonabethard@arizona.eduAbstractModernneuralnetworklibrariesalltakeasahyperparameterarandomseed,typicallyusedtodeterminetheinitialstateofthemodelpa-rameters.Thisopinionpiecearguesthattherearesomesafeusesforrandomseeds:aspartofthehyperparameterse...

展开>> 收起<<

We need to talk about random seeds Steven Bethard University of Arizona.pdf

共6页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

We need to talk about random seeds Steven Bethard University of Arizona

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: