We need to talk about random seeds Steven Bethard University of Arizona

2025-05-06 0 0 166.44KB 6 页 10玖币
侵权投诉
We need to talk about random seeds
Steven Bethard
University of Arizona
bethard@arizona.edu
Abstract
Modern neural network libraries all take as a
hyperparameter a random seed, typically used
to determine the initial state of the model pa-
rameters. This opinion piece argues that there
are some safe uses for random seeds: as part
of the hyperparameter search to select a good
model, creating an ensemble of several mod-
els, or measuring the sensitivity of the training
algorithm to the random seed hyperparameter.
It argues that some uses for random seeds are
risky: using a fixed random seed for “repli-
cability” and varying only the random seed
to create score distributions for performance
comparison. An analysis of 85 recent publica-
tions from the ACL Anthology finds that more
than 50% contain risky uses of random seeds.
1 Introduction
Modern neural network libraries all take as a hyper-
parameter a random seed, a number that is used to
initialize a pseudorandom number generator. That
generator is typically used to determine the initial
state of model parameters, but may also affect op-
timization (and inference) in other ways, such as
selecting which units to mask under dropout, or se-
lecting which instances of the training data go into
each minibatch during gradient descent. Like any
hyperparameter, neural network random seeds can
have a large or small impact on model performance
depending on the specifics of the architecture and
the data. Thus, it is important to optimize the ran-
dom seed hyperparameter as we would any other
hyperparameter, such as learning rate or regulariza-
tion strength.
Such tuning is especially important with the pre-
trained transformer architectures currently popular
in NLP (BERT, Devlin et al.,2019; RoBERTa Liu
et al.,2019; etc.), which are quite sensitive to their
random seeds (Risch and Krestel,2020;Dodge
et al.,2020;Mosbach et al.,2021). Several solu-
tions to this problem have been proposed, including
specific optimizer setups (Mosbach et al.,2021),
ensemble methods (Risch and Krestel,2020), and
explicitly tuning the random seed like other hyper-
parameters (Dodge et al.,2020).
The NLP community thus has some awareness
of the problems that random seeds present, but it
is inconsistent in its approaches to solving those
problems. The remainder of this opinion piece first
presents a taxonomy of different ways that neural
network random seeds are used in the NLP commu-
nity, explaining which uses are safe and which are
risky. It then reviews 85 articles published in the
ACL Anthology, categorizing their random seed
uses based on the taxonomy. This analysis finds
that more than 50% of the articles include risky
uses of random seeds, suggesting that the NLP
community still needs a broader discussion about
how we approach random seeds.
2 A taxonomy of random seed uses
This section highlights five common uses of neural
network random seeds in the NLP community, and
categorizes them as either safe or risky.
2.1 Safe use: Model selection
The random seed is a hyperparameter of a neu-
ral network architecture that determines where in
the model parameter space optimization should be-
gin. It may also affect optimization by determining
the order of minibatches in gradient descent, or
through mechanisms like dropout’s random sam-
pling of unit activations. As the random seed is
a hyperparameter, it can and should be optimized
just as other hyperparameters are. Unlike some
other hyperparameters, there is no intuitive expla-
nation of why one random seed would be better or
worse than another, so the typical strategy is to try a
number of randomly selected seeds. For example:
Instead, we compensate for the inher-
ent randomness of the network by train-
arXiv:2210.13393v1 [cs.CL] 24 Oct 2022
ing multiple models with randomized ini-
tializations and use as the final model
the one which achieved the best perfor-
mance on the validation set. . . (Björne
and Salakoski,2018)
The test results are derived from the 1-
best random seed on the validation set.
(Kuncoro et al.,2020)
2.2 Safe use: Ensemble creation
Ensemble methods are an effective way of com-
bining multiple machine-learning models to make
better predictions (Rokach,2010). A common ap-
proach to creating neural network ensembles is to
train the same architecture with different random
seeds, and have the resulting models vote (Perrone
and Cooper,1995). For example:
In order to improve the stability of the
RNNs, we ensemble five distinct models,
each initialized with a different random
seed. (Nicolai et al.,2017)
Our model is composed of the ensemble
of 8 single models. The hyperparameters
and the training procedure used in each
single model are the same except the ran-
dom seed. (Yang and Wang,2019)
2.3 Safe use: Sensitivity analysis
Sometimes it is useful to demonstrate how sensitive
a neural network architecture is to a particular hy-
perparameter. For example, Santurkar et al. (2018)
shows that batch normalization makes neural net-
work architectures less sensitive to the learning rate
hyperparameter. Similarly, it may be useful to show
how sensitive neural network architectures are to
their random seed hyperparameter. For example:
We next (§3.3) examine the expected vari-
ance in attention-produced weights by
initializing multiple training sequences
with different random seeds. . . (Wiegr-
effe and Pinter,2019)
Our model shows a lower standard de-
viation on each task, which means our
model is less sensitive to random seeds
than other models. (Hua et al.,2021)
2.4 Risky use: Single fixed seed
NLP articles sometimes pick a single fixed random
seed, claiming that this is done to improve consis-
tency or replicability. For example:
An arbitrary but fixed random seed was
used for each run to ensure reproducibil-
ity. . . (Le and Fokkens,2018)
For consistency, we used the same set
of hyperparameters and a fixed random
seed across all experiments. (Lin et al.,
2020)
Why is this risky? First, fixing the random seed
does not guarantee replicability. For example, the
tensorflow library has a history of producing differ-
ent results given the same random seeds, especially
on GPUs (Two Sigma,2017;Kanwar et al.,2021).
Second, not optimizing the random seed hyperpa-
rameter has the same drawbacks as not optimizing
any other hyperparameter: performance will be an
underestimate of the performance the architecture
is capable of with an optimized model.
What should one do instead? The random seed
should be optimized as any other hyperparameter.
Dodge et al. (2020), for example, show that doing
so leads to simpler models exceeding the published
results of more complex state-of-the-art models on
multiple GLUE tasks (Wang et al.,2018). The
space of hyperparameters explored (and thus the
number of random seeds explored) can be restricted
to match the availability of compute resources with
techniques such as random hyperparameter search
(Bergstra and Bengio,2012) where
n
hyperparame-
ter settings are sampled from the space of all hyper-
parameter settings (with random seeds treated the
same as all other hyperparameters). In an extremely
resource-limited scenario, random search might se-
lect only a single value of some hyperparameter
(such as random seed), which might be acceptable
given the constraints, but should probably be ac-
companied by an explicit acknowledgement of the
risks of underestimating performance.
2.5 Risky use: Performance comparison
It is a good idea to compare not just the point esti-
mate of a single model’s performance, but distribu-
tions of model performance, as comparing perfor-
mance distributions results in more reliable conclu-
sions (Reimers and Gurevych,2017;Dodge et al.,
2019;Radosavovic et al.,2020). However, it has
sometimes been suggested that such distributions
can be obtained by training the same architecture
and varying only the random seed. For example:
We re-ran both implementations multi-
ple times, each time only changing the
摘要:

WeneedtotalkaboutrandomseedsStevenBethardUniversityofArizonabethard@arizona.eduAbstractModernneuralnetworklibrariesalltakeasahyperparameterarandomseed,typicallyusedtodeterminetheinitialstateofthemodelpa-rameters.Thisopinionpiecearguesthattherearesomesafeusesforrandomseeds:aspartofthehyperparameterse...

展开>> 收起<<
We need to talk about random seeds Steven Bethard University of Arizona.pdf

共6页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:6 页 大小:166.44KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 6
客服
关注