Uniﬁed Probabilistic Neural Architecture and Weight Ensembling Improves Model Robustness Sumegha Premchandar

2025-05-06 2 0 378.54KB 7 页 10玖币

侵权投诉

Uniﬁed Probabilistic Neural Architecture and Weight

Ensembling Improves Model Robustness

Sumegha Premchandar

Michigan State University

premchan@msu.edu

Sandeep Madireddy∗

Argonne National Laboratory

smadireddy@anl.gov

Sanket Jantre

Brookhaven National Laboratory

sjantre@bnl.gov

Prasanna Balaprakash

Argonne National Laboratory

pbalapra@anl.gov

Abstract

Robust machine learning models with accurately calibrated uncertainties are crucial

for safety-critical applications. Probabilistic machine learning and especially the

Bayesian formalism provide a systematic framework to incorporate robustness

through the distributional estimates and reason about uncertainty. Recent works

have shown that approximate inference approaches that take the weight space un-

certainty of neural networks to generate ensemble prediction are the state-of-the-art.

However, architecture choices have mostly been ad hoc, which essentially ignores

the epistemic uncertainty from the architecture space. To this end, we propose

niﬁed p

obabilistic

rchitecture and weight

nsembling

eural

rchitecture

earch (UraeNAS) that leverages advances in probabilistic neural architecture

search and approximate Bayesian inference to generate ensembles form the joint

distribution of neural network architectures and weights. The proposed approach

showed a signiﬁcant improvement both with in-distribution (

0.86%

in accuracy,

42%

in ECE) CIFAR-10 and out-of-distribution (

2.43%

in accuracy,

30%

in ECE)

CIFAR-10-C compared to the baseline deterministic approach.

1 Introduction

Bayesian neural networks have recently seen a lot of interest due to the potential of these models

to provide improved predictions with quantiﬁed uncertainty and robustness, which is crucial to

designing safe and reliable systems [

], especially for safety-critical applications such as autonomous

driving, medicine, and scientiﬁc applications such as model-based control of nuclear fusion reactors.

Even though modern Bayesian neural networks have great potential for robustness, their inference

is challenging due to the presence of millions of parameters and a multi-modal landscape. For

this reason, approximate inference techniques such as variational inference (VI) and stochastic

gradient Markov chain Monte Carlo are being increasingly adopted. However, VI, which typically

makes a unimodal approximation of the multimodal posterior, can be limiting. Recent works in the

realm of probabilistic deep learning have shown that ensembles of neural networks [

] have shown

superior accuracy and robustness properties over learning single models. This kind of ensembling

has been shown to be analogous to sampling models from different modes of multimodal Bayesian

posteriors [3, 4] and hence enjoys these superior properties.

While different techniques for ensembling neural networks have been explored, both in the

context of Bayesian and non-Bayesian inference, a key limitation is that ensembles are primarily

in the weight space, where the architecture of the neural networks is ﬁxed arbitrarily. For example,

techniques such as Monte Carlo dropout [

], dropConnect [

], Swapout [

], SSIG [

] deactivate

certain units/connections during training and testing. They are ‘implicit", as model ensembling is

happening internally in a single model and so are efﬁcient, but the gain in robustness is not signiﬁcant.

On the other hand, “explicit" ensembling techniques such as Deep Ensembles [

], BatchEnsemble [

MIMO [

] have shown superior accuracy and robustness gains over single models. Considering just

∗Corresponding Author.

arXiv:2210.04083v1 [cs.LG] 8 Oct 2022

the weight-space uncertainty/ensembles can be a limiting assumption since the architecture choice

also contributes to the epistemic (model-form) uncertainty of the prediction. The importance of

architecture choice over other considerations in Bayesian neural networks has been highlighted with

rightness in [11].

On the other hand, Neural Architecture Search (NAS) has received tremendous attention re-

cently because of its promise to democratize machine learning and enable the learning of custom,

data-speciﬁc neural architectures. The most popular approaches in this context are reinforcement

learning [

], Bayesian optimization [

], and evolutionary optimization [

], but usually incur a

large computational overhead. More recently, a differential neural architecture search framework,

DARTS [

] was proposed that adopts a continuous relaxation of categorical space to facilitate

architecture search through gradient-based optimizers. Distribution-based learning of architecture

parameters has recently been explored in DrNAS [

], BayesNAS [

], BaLeNAS [

] to avoid

suboptimal exploration observed with deterministic optimization [

] by introducing stochasticity

and encouraging exploration. However, these works were tasked with learning a point estimate of the

architecture and weights rather than uncertainty quantiﬁcation, ensembling, or robustness.

In this work, we develop

niﬁed p

obabilistic

rchitecture and weight

nsembling

eural

rchitecture

earch (UraeNAS) to improve the accuracy and robustness of neural network models.

We employ a distribution learning approach to differentiable NAS, which allows us to move beyond

ad hoc architecture selection and point estimation of architecture parameters to treat them as random

variables and estimate their distributions. This property of distribution learning of architectures, when

combined with the Bayesian formulation of neural network weights, allows us to characterize the

full epistemic uncertainty arising from the modeling choices of neural networks. With UraeNAS,

we are able to generate rich samples/ensembles from the joint distribution of the architecture and

weight parameters, which provides signiﬁcant improvement in uncertainty/calibration, accuracy, and

robustness in both in-distribution and out-of-distribution scenarios compared to deterministic models

and weight ensemble models.

2 Uniﬁed probabilistic architecture and weight ensembling NAS

2.1 Distributional formulation of differentiable NAS

In the differentiable NAS setup, the neural network search space is designed by repeatedly stacking

building blocks called cells [

]. The cells can be normal cells or reduction cells. Normal

cells maintain the spatial resolution of inputs, and reduction cells halve the spatial resolution, but

double the number of channels. Different neural network architectures are generated by changing the

basic cell structure. Each cell is represented by a Directed Acyclic Graph with N-ordered nodes and

E edges. The feature maps are denoted by

x(j),0≤j≤N−1

and each edge corresponds to an

operation

o(i,j)

. The feature map for each node is given by

x(j)=Pi<j o(i,j)(x(i))

, with

x(0)

and

x(1)

ﬁxed to be the output from the previous two cells. The ﬁnal output of each cell is obtained by

concatenating the outputs of each intermediate node, that is, (x(2),x(3) . . . , x(N−1)).

The operation selection problem is inherently discrete in nature. However, continuous relaxation of

the discrete space [

] leads to continuous architecture mixing weights (

ˆo(i,j)(x) = Po∈Oθ(i,j)

oo(x)

)

that can be learned through gradient-based optimization. The transformed operation

ˆo(i,j)

is a

weighted average of the operations selected from a ﬁnite candidate space

. The input features are

denoted by

and

θ(i,j)

represents the weight of operation

for the edge

(i, j)

. The operation mixing

weights

θ(i,j)= (θ(i,j)

1, θ(i,j)

2. . . θ(i,j)

|O|)

belong to a probability simplex, i.e.,

Po∈Oθ(i,j)

o= 1

Throughout this paper, we use the terms architecture parameters and operation mixing weights

interchangeably.

NAS as Bi-level Optimization:

With a differentiable architecture search (DAS) formulation, NAS

can be posed as a a bi-level optimization problem over neural network weights

and architecture

parameters θ[15] in the following manner:

min

θLval(w∗(θ),θ)s.t. w∗∈arg min

wLtrain(w,θ)(1)

However, it was observed in recent works [

] that optimizing directly over architecture parameters

can lead to overﬁtting due to insufﬁcient exploration of the architecture space. To alleviate this, dif-

ferent DAS strategies were employed [

]. Among them, the most versatile is the distribution

learning approach [

] in which the architecture parameters are sampled from a distribution such as

the Dirichlet distribution

θ(i,j)iid

∼Dirichlet(β(i,j))

that can inherently satisfy the simplex constraint

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

UniedProbabilisticNeuralArchitectureandWeightEnsemblingImprovesModelRobustnessSumeghaPremchandarMichiganStateUniversitypremchan@msu.eduSandeepMadireddyArgonneNationalLaboratorysmadireddy@anl.govSanketJantreBrookhavenNationalLaboratorysjantre@bnl.govPrasannaBalaprakashArgonneNationalLaboratorypbala...

展开>> 收起<<

Uniﬁed Probabilistic Neural Architecture and Weight Ensembling Improves Model Robustness Sumegha Premchandar.pdf

共7页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Uniﬁed Probabilistic Neural Architecture and Weight Ensembling Improves Model Robustness Sumegha Premchandar

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: