LAYER ENSEMBLES Illia Oleksiienko and Alexandros Iosifidis Department of Electrical and Computer Engineering Aarhus University Denmark

2025-05-03 0 0 1.02MB 6 页 10玖币

侵权投诉

LAYER ENSEMBLES

Illia Oleksiienko and Alexandros Iosiﬁdis

Department of Electrical and Computer Engineering, Aarhus University, Denmark

ABSTRACT

Deep Ensembles, as a type of Bayesian Neural Networks, can

be used to estimate uncertainty on the prediction of multi-

ple neural networks by collecting votes from each network

and computing the difference in those predictions. In this pa-

per, we introduce a method for uncertainty estimation which

considers a set of independent categorical distributions for

each layer of the network, giving many more possible sam-

ples with overlapped layers than in the regular Deep Ensem-

bles. We further introduce an optimized inference procedure

that reuses common layer outputs, achieving up to 19x speed

up and reducing memory usage quadratically. We also show

that the method can be further improved by ranking samples,

resulting in models that require less memory and time to run

while achieving higher uncertainty quality than Deep Ensem-

bles.

Index Terms—Deep Ensembles, Bayesian neural net-

works, uncertainty estimation, uncertainty quality

1. INTRODUCTION

Uncertainty estimation in neural networks is an important task

for critical problems, such as autonomous driving, medical

image analysis, or other problems where silent failures of ma-

chine learning systems can lead to high-cost damages or en-

danger lives. Bayesian Neural Networks (BNNs) [1, 2, 3, 4]

provide a tool to estimate prediction uncertainty by exploiting

a distribution over the network weights and sampling a set of

models with slightly different predictions for a given input.

This difference in the predictions expresses the uncertainty of

the network, while the mean of all predictions is used as the

prediction of the network. The selection of the adopted distri-

bution affects the computational requirements and statistical

quality of the network, with Gaussian distribution resulting

in Bayes By Backpropagation (BBB) [5] and Hypermodel

[6] methods, Bernoulli distribution in Monte Carlo Dropout

(MCD) [7], and Categorical distribution in Deep Ensembles

[8].

This work has received funding from the European Union’s Horizon

2020 research and innovation programme (grant agreement No 871449

(OpenDR)).

We introduce Layer Ensembles, which consider a set of

weight options for each layer that are sampled using indepen-

dent Categorical distributions, resulting in a high number of

models that can have common layer samples. We show that

Layer Ensembles achieve better uncertainty quality than Deep

Ensembles for the same number of parameters, and they allow

to dynamically change the number of samples to keep the best

ratio between the uncertainty quality and time cost.

The proposed method is a good ﬁt for applications where

the computational budget can ﬂuctuate, and the requirements

for model speed or uncertainty quality are ﬂexible. Such sce-

narios include autonomous driving, robot control and Ma-

chine Learning on personal devices. In the case of a high com-

putational budget, the selected sample set can be of a bigger

size, leading to a better uncertainty estimation process, while

if there are not enough device capabilities, we can sacriﬁce

some of the uncertainty quality to keep the system operation

at the tolerable speed. This process is explained in details in

Section 3.

2. RELATED WORK

Output uncertainty estimation in deep neural networks is

usually done by approximating expectation and covariance

of outputs using the Monte Carlo integration with a limited

number of weight samples. This can be simpliﬁed to per-

forming inference a few times using different randomly sam-

pled weights for the network, and then computing the mean

and variance of the network output vectors. Epistemic Neural

Networks (ENNs) [9] propose a framework to estimate an un-

certainty quality of a model by generating a synthetic dataset

and training a Neural Network Gaussian Process (NNGP)

[10] on it that represents a true predictive distribution. The

model of interest is then evaluated by the KL-divergence [11]

between the true predictive distribution from NNGP and the

predictive distribution of the model of interest.

Monte Carlo Dropout (MCD) [7], instead of only using

Dropout [12] layers as a form of regularization during training

to avoid overtrusting particular neurons, keeps the Dropout

layers also during inference. This has the effect of adopting a

Bernoulli distribution of weights and sampling different mod-

els from this distribution. Bayes By Backpropagation (BBB)

[5] considers a Gaussian distribution over network weights,

arXiv:2210.04882v3 [cs.LG] 7 Jul 2023

which is estimated using the reparametrization trick [13] that

allows to use regular gradient computation.

Variational Neural Networks (VNNs) [14] can be consid-

ered in the same group as MCD and BBB from the Bayesian

Model Averaging perspective, where sampled models can

lie in the same loss-basin and be similar, i.e., describing

the problem from the same point of view, as explained in

[2]. VNNs consider a Gaussian distribution over each layer,

which is parameterized by the outputs of the corresponding

sub-layers. Hypermodels [6] consider an additional hyper-

model θ=gν(z)to generate parameters of a base model

fθ(x)using a random variable z∼ N (0, I)as an input to the

hypermodel.

Deep Ensembles [8] have a better uncertainty quality than

all other discussed methods and can be viewed as a BNN with

a Categorical distribution over weights, with the ideal num-

ber of weight samples being equal to the number of networks

in the ensemble. The addition of prior untrained models to

Deep Ensembles, as described in [8], improves the uncer-

tainty quality of the network. Deep Sub-Ensembles [15] split

the neural network into two parts, where the ﬁrst part contains

only a single trunk network, and the second part is a regular

Deep Ensemble network operating on features generated by

the trunk network. This reduces the memory and computa-

tional load compared to the Deep Ensembles and provides

a trade-off between the uncertainty quality and resource re-

quirements. Batch Ensembles [16] optimize Deep Ensembles

by using all weights in a single matrix operation and using

Hadamard product instead of matrix multiplication that in-

creases inference speed and reduces memory usage.

Kushibar et al. [17] propose to use a single deterministic

network to generate different outputs by using multiple early-

exits branches [18] from the same network, and compute the

variance in those outputs. Contrary to this approach using

a single deterministic network, we propose Layer Ensembles

in the context of Bayesian Neural Networks by considering

an independent Categorical distribution over weights of each

layer, as described in the following section.

3. LAYER ENSEMBLES

In this section, we ﬁrst provide a mathematical deﬁnition of

Layer Ensembles structure. Then, we provide the intuition

behind Layer Ensembles and the relations between Layer

Ensembles and other network structures. Furthermore, we

show how inference of Layer Ensembles can be optimized

by reusing common layer outputs. Finally, we propose a

method for selecting the best sample combinations based on

the quality of uncertainty metric. This metric is computed us-

ing Epistemic Neural Networks (ENNs) [9] experiments with

a synthetic dataset that has ground-truth uncertainty values,

as described in details in Section 3.2.

We consider a neural network F(x, w)with Nlayers

which takes xas input and is parameterized by the weights w.

(a) Deep Ensemble structure (b) Layer Ensemble structure

two layer options

Fig. 1: Example structures of (a) Deep Ensembles and (b)

Layer Ensembles for a 3-layer network with 3 ensembles

(N= 3,K= 3). While the memory structure remains iden-

tical, Layer Ensembles have many more options for sampling

that can be optimized considering the common layers in sam-

ples. Layer Ensembles with common layers earlier in the ar-

chitecture lead to faster inference (c).

A Deep Ensemble network is formed by Kidentically struc-

tured networks, each formed by Nlayers and the weights of

each network, i.e., wi, i ∈[1, K], are trained independently.

We formulate Layer Ensembles as a stochastic neural network

F(x, w)with Nlayers LEi(x, wi

q), i ∈[1, K], q ∈[1, N ]

and Kweight options for each layer:

q∼Categorical(K).(1)

This results in the same memory structure as for Deep Ensem-

bles, with KN weight sets for an ensemble of Knetworks

each formed by Nlayers. However, Layer Ensembles allow

for connections between the layers of different weight sets, by

sampling different layer options to form a network in the en-

semble. This greatly increases the number of possible weight

samples, while those can contain identical subnetworks. This

can be used to speed up the inference of a set of sampled lay-

ers.

The intuition behind Layer Ensembles is to deﬁne an en-

semble for each layer. These ensembles are independent and

can have a different number of members in each ensemble,

leading to a highly ﬂexible network structure. If a single ran-

dom variable controls all layer-wise ensembles, and they have

an identical number of members, this leads to the well-known

Deep Ensemble neural network. Fig. 1 illustrates how the

same memory structure of the ensembles is used in Deep En-

sembles (Fig. 1a) and Layer Ensembles (Fig. 1b).

Training of Layer Ensembles is done by using a regular

loss function and averaging over the outputs of different net-

work samples. The number of network samples for Deep En-

sembles is usually equal to the number of networks K, mean-

ing that all the ensembles are used for inference. The same

strategy is not required for Layer Ensembles, as each layer

option can be included in multiple networks. This means that

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

LAYERENSEMBLESIlliaOleksiienkoandAlexandrosIosifidisDepartmentofElectricalandComputerEngineering,AarhusUniversity,DenmarkABSTRACTDeepEnsembles,asatypeofBayesianNeuralNetworks,canbeusedtoestimateuncertaintyonthepredictionofmulti-pleneuralnetworksbycollectingvotesfromeachnetworkandcomputingthedifferen...

展开>> 收起<<

LAYER ENSEMBLES Illia Oleksiienko and Alexandros Iosifidis Department of Electrical and Computer Engineering Aarhus University Denmark.pdf

共6页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

LAYER ENSEMBLES Illia Oleksiienko and Alexandros Iosifidis Department of Electrical and Computer Engineering Aarhus University Denmark

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: