Neural Graphical Models Harsh Shrivastava Urszula Chajewska Microsoft Research Redmond USA

2025-05-02 0 0 6.32MB 24 页 10玖币

侵权投诉

Neural Graphical Models

Harsh Shrivastava & Urszula Chajewska

Microsoft Research, Redmond, USA

Contact:{hshrivastava,urszc}@microsoft.com

Abstract.

Probabilistic Graphical Models are often used to understand

dynamics of a system. They can model relationships between features

(nodes) and the underlying distribution. Theoretically these models can

represent very complex dependency functions, but in practice often simpli-

fying assumptions are made due to computational limitations associated

with graph operations. In this work we introduce Neural Graphical Models

(

NGMs

) which attempt to represent complex feature dependencies with

reasonable computational costs. Given a graph of feature relationships

and corresponding samples, we capture the dependency structure between

the features along with their complex function representations by using a

neural network as a multi-task learning framework. We provide eﬃcient

learning, inference and sampling algorithms.

NGMs

can ﬁt generic graph

structures including directed, undirected and mixed-edge graphs as well

as support mixed input data types. We present empirical studies that

show

NGMs

’ capability to represent Gaussian graphical models, perform

inference analysis of a lung cancer data and extract insights from a real

world infant mortality data provided by CDC.

Software:https://github.com/harshs27/neural-graphical-models

Keywords: Probabilistic Graphical Models, Deep learning, Learning representa-

tions

1 Introduction

Graphical models are a powerful tool to analyze data. They can represent

the relationships between features and provide underlying distributions that

model functional dependencies between them [

]. Learning, inference and

sampling are operations that make such graphical models useful for domain

exploration. Learning, in a broad sense, consists of ﬁtting the distribution function

parameters from data. Inference is the procedure of answering queries in the form

of conditional distributions with one or more observed variables. Sampling is the

ability to draw samples from the underlying distribution deﬁned by the graphical

model. One of the common bottlenecks of graphical model representations is

having high computational complexities for one or more of these procedures.

In particular, various graphical models have placed restrictions on the set of

distributions or types of variables in the domain. Some graphical models work

with continuous variables only (or categorical variables only) or place restrictions

arXiv:2210.00453v4 [cs.LG] 16 Aug 2023

2 H. Shrivastava, U. Chajewska

on the graph structure (e.g., that continuous variables cannot be parents of

categorical variables in a DAG). Other restrictions aﬀect the set of distributions

the models are capable of representing, e.g., to multivariate Gaussian.

For wide adoption of graphical models, the following properties are desired:

–Rich representations of complex underlying distributions.

–

Ability to simultaneously handle various input types such as categorical,

continuous, images and embedding representations.

–Eﬃcient algorithms for learning, inference and sampling.

–Support for various representations: directed, undirected, mixed-edge graphs.

–Access to the learned underlying distributions for analysis.

In this work, we propose Neural Graphical Models (

NGMs

) that satisfy the afore-

mentioned desiderata in a computationally eﬃcient way.

NGMs

accept a feature

dependency structure that can be given by an expert or learned from data. The

dependency structure may have the form of a graph with clearly deﬁned semantics

(e.g., a Bayesian network graph or a Markov network graph) or an adjacency

matrix. Note that the graph may be either directed or undirected. Based on this

dependency structure,

NGMs

learn to represent the probability function over the

domain using a deep neural network. The parameterization of such a network

can be learned from data eﬃciently, with a loss function that jointly optimizes

adherence to the given dependency structure and ﬁt to the data. Probability

functions represented by

NGMs

are unrestricted by any of the common restrictions

inherent in other PGMs. They also support eﬃcient inference and sampling.

2 Related works

Probabilistic Graphical Models (PGMs) aim to learn the underlying joint dis-

tribution from which input data is sampled. Often, to make learning of the

distribution computationally feasible, inducing an independence graph structure

between the features helps. In cases where this independence graph structure is

provided by a domain expert, the problem of ﬁtting PGMs reduces to learning

distributions over this graph. Alternatively, there are many methods traditionally

used to jointly learn the structure as well as the parameters [

] and

have been widely used to analyse data in many domains [2,6,7,26,25,1].

Recently, many interesting deep learning based approaches for DAG recovery

have been proposed [

]. These works primarily focus on structure

learning but technically they are learning a Probabilistic Graphical Model. These

works depend on the existing algorithms developed for the Bayesian networks for

the inference and sampling tasks. A parallel line of work combining graphical

models with deep learning are Bayesian deep learning approaches: Variational

AutoEncoders, Boltzmann Machines etc. [

]. The deep learning models

have signiﬁcantly more parameters than traditional Bayesian networks, which

makes them less suitable for datasets with a small number of samples. Using

these deep graphical models for downstream tasks is computationally expensive

and often impedes their adoption.

Neural Graphical Models 3

We would be remiss not to mention the technical similarities

NGMs

have

with some recent research works. We found "Learning sparse nonparametric

DAGs" [

] to be the closest in terms of representation ability. In one of their

versions, they model each independence structure with a diﬀerent neural network

(MLP). However, their choice of modeling feature independence criterion diﬀers

from

NGM

. They zero out the weights of the row in the ﬁrst layer of the neural

network to induce independence between the input and output features. This

formulation restricts them from sharing the NNs across diﬀerent factors. Second,

we found in [

] path norm formulations of using the product of NN weights

for input to output connectivity similar to

NGMs

. They used the path norm

to parametrize the DAG constraint for continuous optimization, while [

]

used them within unrolled algorithm framework to learn sparse gene regulatory

networks.

Methods that model the conditional independence graphs [

]

are a type of graphical models that are based on underlying multivariate Gaussian

distribution. Probabilistic Circuits [

], Conditional Random Fields or Markov

Networks [

] and some other PGM formulations like [

] are popular.

These PGMs often make simplifying assumptions about the underlying distribu-

tions and place restrictions on the accepted input data types. Real-world input

data often consist of mixed datatypes (real, categorical, text, images etc.) and it

is challenging for the existing graphical model formulations to handle.

3 Neural Graphical Models

We propose a new Probabilistic Graphical Model type, called Neural Graphical

Models (

NGMs

) and describe the associated learning, inference and sampling

algorithms. Our model accepts all input types and avoids placing any restrictions

on the form of underlying distributions.

Problem setting: We consider input data Xthat have

samples with

each sample consisting of

features. An example can be gene expression data,

where we have a matrix of the microarray expression values (samples) and genes

(features). In the medical domain, we can have a mix of continuous and categorical

data describing a patient’s health. We are also provided a graph Gwhich can be

directed, undirected or have mixed-edge types that represents our belief about

the feature dependency relationships (in a probabilistic sense). Such graphs are

often provided by experts and include inductive biases and domain knowledge

about the underlying system functions. In cases where the graph is not provided,

we make use of the state-of-the-art algorithms to recover DAGs or CI graphs,

refer to Sec. 2. The NGM input is the tuple (X,G).

3.1 Representation

Fig. 1 shows a sample graph recovered and how we view the value of each feature

as a function of the values of its neighbors. For directed graphs, each feature’s

value is represented as a function of its Markov blanket in the graph. We use

4 H. Shrivastava, U. Chajewska

Fig. 1: Graphical view of

NGMs

: The input graph G(undirected) for given input data

X∈RM×D

. Each feature

(

Nbrs

(

)) is a function of the neighboring features.

For a DAG, the functions between features will be deﬁned by the Markov Blanket

relationship

(

)). The adjacency matrix (right) represents the associated

dependency structures.

the graph Gto understand the domain’s dependency structure, but ignore any

potential parametrization associated with it.

We introduce a neural view which is another way of representing G, as shown

in Fig. 2. The neural networks used are multi-layer perceptrons with appropriate

input and output dimensions that represent graph connections in

NGMs

. We denote

a NN with

number of layers with the weights

{W1, W2,· · · , WL}

and

biases

{b1, b2,· · · , bL}

fW,B

(

)with non-linearity not mentioned explicitly.

We experimented with multiple non-linearities and found that

ReLU

ﬁts well

with our framework. Applying the NN to the input

evaluates the following

mathematical expression,

fW,B

(

) =

ReLU

(

WL·

(

· · ·

(

W2·ReLU

(

W1·X

) +

)

· · ·

) +

). The dimensions of the weights and biases are chosen such that the

neural network input and output units are equal to

|D|

with the hidden layers

dimension

remaining a design choice. In experiments, we start with

= 2

|D|

and subsequently adjust the dimensions based on the validation loss. The product

of the weights of the neural networks

Snn

l=1|Wl|

|W1| × |W2| × · · · × |WL|

where

|W|

computes the absolute value of each element in

, gives us path

dependencies between the input and the output units. For short hand, we denote

Snn

Πi|Wi|

. If

Snn

[

xi, xo

] = 0, then the output unit

is independent of the

input unit

. Increasing the layers and hidden dimensions of the NNs provide us

with richer dependence function complexities.

Initially, the NN is fully connected. Some of the connections will be dropped

during training, as the associated weights are zeroed out. We can view the

resulting NN as a glass-box model (indicating transparency), since we can discover

functional dependencies by analyzing paths from input to output.

3.2 Learning

Using the rich and compact functional representation achieved by using the

neural view, the learning task is to ﬁt the neural networks to achieve the desired

dependency structure S(encoded by the input graph G), along with ﬁtting the

regression to the input data X. Given the input data Xwe want to learn the

Neural Graphical Models 5

Fig. 2: Neural view of

NGMs

: NN as a multitask learning architecture capturing

non-linear dependencies for the features of the undirected graph in Fig. 1. If there is a

path from the input feature to an output feature, that indicates a dependency between

them. The dependency matrix between the input and output of the NN reduces to

matrix product operation

Snn

Πi|Wi|

|W1|×|W2|

. Note that not all the zeroed

out weights of the MLP (in black-dashed lines) are shown for the sake of clarity.

functions as described by the

NGMs

graphical view, Fig. 1. These can be obtained

by solving the multiple regression problems shown in neural view, Fig. 2. We

achieve this by using the neural view as a multi-task learning framework. The

goal is to ﬁnd the set of parameters

that minimize the loss expressed as the

distance from

(

)(averaged over all samples

) while maintaining

the dependency structure provided in the input graph G. We can deﬁne the

regression operation as follows:

arg min

W,B

k=1 

Xk−fW,B(Xk)



2s.t. ΠL

i=1|Wi|∗Sc= 0 (1)

where we introduced a soft-graph constraint. Here,

represents the complement

of the matrix

, which essentially replaces 0by 1and vice-versa. The

A∗B

represents the Hadamard operator which does an element-wise matrix multiplica-

tion between the same dimension matrices

A, B

. Including the constraint as a

Lagrangian term with

ℓ1

penalty and a constant

that acts a tradeoﬀ between

ﬁtting the regression and matching the graph dependency structure, we get the

following optimization formulation

arg min

W,B

k=1 

Xk−fW,B(Xk)



2+λlog 

ΠL

i=1|Wi|∗Sc

1(2)

In our implementation, the individual weights are normalized using

ℓ2

-norm

before taking the product. We normalize the regression loss and the structure

loss terms and apply appropriate scaling to the input data features.

Proximal Initialization strategy: To get a good initialization for the

NN parameters

and

we implement the following procedure. We solve

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

NeuralGraphicalModelsHarshShrivastava&UrszulaChajewskaMicrosoftResearch,Redmond,USAContact:{hshrivastava,urszc}@microsoft.comAbstract.ProbabilisticGraphicalModelsareoftenusedtounderstanddynamicsofasystem.Theycanmodelrelationshipsbetweenfeatures(nodes)andtheunderlyingdistribution.Theoreticallythesemo...

展开>> 收起<<

Neural Graphical Models Harsh Shrivastava Urszula Chajewska Microsoft Research Redmond USA.pdf

共24页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Neural Graphical Models Harsh Shrivastava Urszula Chajewska Microsoft Research Redmond USA

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: