Out of Distribution Reasoning by Weakly-Supervised Disentangled Logic Variational Autoencoder

2025-04-29 0 0 1.15MB 10 页 10玖币
侵权投诉
Out of Distribution Reasoning by
Weakly-Supervised Disentangled Logic Variational
Autoencoder
Zahra Rahiminasab
School of Computer Science
and Engineering
Nanyang Technological University
Singapore, Singapore
rahi0004@e.ntu.edu.sg
Michael Yuhas
Energy Research Institute
Nanyang Technological University
Singapore, Singapore
michaelj004@ntu.edu.sg
Arvind Easwaran
School of Computer Science
and Engineering
Nanyang Technological University
Singapore, Singapore
arvinde@ntu.edu.sg
Abstract—Out-of-distribution (OOD) detection, i.e., finding test
samples derived from a different distribution than the training
set, as well as reasoning about such samples (OOD reasoning),
are necessary to ensure the safety of results generated by
machine learning models. Recently there have been promising
results for OOD detection in the latent space of variational
autoencoders (VAEs). However, without disentanglement, VAEs
cannot perform OOD reasoning. Disentanglement ensures a one-
to-many mapping between generative factors of OOD (e.g., rain
in image data) and the latent variables to which they are encoded.
Although previous literature has focused on weakly-supervised
disentanglement on simple datasets with known and independent
generative factors. In practice, achieving full disentanglement
through weak supervision is impossible for complex datasets,
such as Carla, with unknown and abstract generative factors. As
a result, we propose an OOD reasoning framework that learns
a partially disentangled VAE to reason about complex datasets.
Our framework consists of three steps: partitioning data based
on observed generative factors, training a VAE as a logic tensor
network that satisfies disentanglement rules, and run-time OOD
reasoning. We evaluate our approach on the Carla dataset and
compare the results against three state-of-the-art methods. We
found that our framework outperformed these methods in terms
of disentanglement and end-to-end OOD reasoning.
Index Terms—Out-of-distribution reasoning, Weakly-
supervised disentanglement, Variational autoencoder, Logic
tensor network
I. INTRODUCTION
Since machine learning models are frequently used in
safety-critical applications such as autonomous driving, it
is important to identify whether the results generated by
machine learning models are safe. It has been shown that the
distribution of training and test samples can be different [1]
and as a result, it is important to identify test samples derived
from a different distribution than the training distribution, i.e.,
out of distribution (OOD) samples. In addition, identifying the
reason behind OOD behavior (OOD reasoning) can help to
This work is supported by MoE, Singapore, Tier-2 grant number MOE2019-
T2-2-040.
provide a safe-fail mechanism to prevent or alleviate damage
in safety-critical applications.
Consider a machine learning (ML) model that controls an
autonomous vehicle (AV). This model receives image data
and outputs steering and throttle set points. If the model’s
training set was gathered in urban environments with no
precipitation, both rural roads and rainy weather would be
OOD. However, the risk mitigation actions for each generative
factor (background, weather) should also be different, and
detecting that a sample is OOD is insufficient to ensure safety.
For example, samples coming from the rural road should return
control to a human driver if the hazards of rural operations
have not been sufficiently analyzed. Conversely, rainy samples
should trigger the switch to a more conservative ML model
to compensate for reduced traction if this poses less risk than
returning control to an unaware human driver.
For identifying OOD samples, different models such as a
variational autoencoder (VAE) [2] can be used. A VAE learns
a data distribution by encoding data in a lower-dimensional
representation (latent space) and regenerating original sam-
ples from encoded representations. In general, there are two
approaches for identifying OOD samples with VAEs. In the
first approach, the sample likelihood is calculated in the
output space of a VAE, and samples with lower likelihood are
identified as OOD samples. However, in [3], it is shown that
OOD samples can get a higher likelihood than in-distribution
samples; therefore, OOD detection in output space is unreli-
able. One solution for this problem is using the latent space of
a VAE rather than output space and comparing distributions
of a test sample and training samples in latent space [4, 5, 6].
OOD reasoning focuses on finding the source of OOD
behavior by analyzing one-to-many maps between generative
factors of data and corresponding latent dimensions that en-
code them. Generative factors are specific data characteristics
essential for data reconstruction, such as the rain intensity in
an image. Disentanglement of latent space is the process of
establishing such one-to-many maps between given generative
arXiv:2210.09959v1 [cs.LG] 18 Oct 2022
factors and their corresponding latent dimensions. However,
without inductive bias in the data or model, learning disen-
tangled latent space for a VAE is theoretically impossible [7].
Therefore, we should train the VAE with a degree of super-
vision on data or apply inductive bias to the model to learn
disentangled latent space.
In complex datasets such as Carla dataset [8], generative
factors are defined at a more abstract level. In addition, not
all generative factors are known, and the domain of observed
generative factors can be continuous. Therefore, providing
labels based on generative factors for each image can be
expensive. As a result, using match pairing [9] for a complex
dataset is more practical. However, structuring the latent space
of VAE without labels and just based on partitions for more
than one generative factor can be challenging. As shown
in [10], the disentanglement performance can decrease when
changes in other factors affect the learned distribution for a
given factor.
Although theoretically achieving full disentanglement with
weak supervision is possible, in practice, based on the size of
latent space, incomplete knowledge about generative factors,
level of abstraction in defining generative factors, etc., total
disentanglement may not be achieved. For example, if a fixed
number of latent dimensions are selected for a rain generative
factor, selected dimensions may not represent all the informa-
tion regarding that factor. However, these dimensions majorly
encode the rain information. Therefore, a mechanism is needed
to learn partial disentanglement for complex datasets.
Logic tensor networks (LTN) [11] distill knowledge in the
network weights based on a set of rules during training. For
this purpose, the loss function is defined as a set of rules. For
LTN, training is the process of optimizing network parameters
to minimize the satisfaction of the loss rule. Since LTN uses
first-order fuzzy logic semantics (real-logic), the rules can be
satisfied partially. Therefore, LTNs’ characteristics make them
suitable for defining partial disentanglement.
Currently, OOD detection and reasoning approaches [4]
try to achieve partial disentanglement through model-based
inductive bias. However, they cannot guarantee the mapping
between generative factors and specific latent dimensions.
Our contribution: To solve the aforementioned issues, we
propose an OOD reasoning framework that consists of three
phases: data partitioning, training OOD reasoners, and run-
time OOD reasoning. Data partitions are formed based on ob-
served values for generative factors, and OOD reasoners (latent
dimensions of a weakly-disentangled VAE) are designed with
match pairing supervision and LTN. Using the LTN version
of a VAE allows us to define disentanglement formally based
on given data partition samples. Inspired by [9] we define the
adaptation and isolation rules for achieving disentanglement.
The adaptation rule ensures that the change in generative
factor values is reflected in the distribution learned for the
corresponding latent dimensions. The isolation rule guarantees
that the change in a given factor is only reflected in its corre-
sponding latent dimensions. Since LTN uses first-order fuzzy
logic semantics (real-logic), adaptation and isolation rules can
be partially satisfied. As a result, the VAE can achieve a
proper level of disentanglement even when latent space size
is small, and some generative factors are not observed during
training. Finally, we use the corresponding dimension for a
given factor to identify OOD samples based on a given factor.
We show the effect of defined constraints on disentanglement
by visualization, and also mutual information [12]. We also
show that our approach achieves an AUROC of 0.98 and 0.89
on the Carla dataset for rain and city reasoners, respectively.
II. BACKGROUND
Our framework is built based on three concepts. In this
section, we introduce these concepts.
Variational autoencoder: A VAE is a machine learning
model formed by two attached neural networks: the encoder
and the decoder. Given an input x, encoder qφ(z|x)with
parameters φmaps the input to latent representation z. The
decoder pθ(x|z)with parameters θregenerates data from z
representation. Equation 1 shows the ELBO loss of a VAE.
loss =Eqφ(z|x)[log pθ(x|z)] KL(qφ(z|x)||p(z)) (1)
The first and second terms are reconstruction loss and
regularization losses, respectively. The reconstruction loss
ensures that the distribution learned for data reflects the main
factors required for data reconstruction. The regularization loss
evaluates the similarity between the learned distribution and
the prior distribution by using KL-divergence between learned
and prior (usually standard Gaussian distribution N(0,1))
distributions [13].
Logic tensor network: A logic tensor network (LTN)
uses logical constraints to distill knowledge about data and
a model in model weights during training. The knowledge
is formally described by first-order fuzzy logic named real
logic. Real logic uses a set of functions, predicates, variables,
and constants to form logical terms. These elements, alongside
operators such as negation, conjunction, etc., form the syntax
of real logic. The fuzzy semantic is defined for real logic
so that the rules can be satisfied partially. The operators
are semantically defined based on product real logic. Table
I, summarizes the operator definition. In this table a,b,
and a1, ..., anare predicates with values that fall in [0,1].
Learning for a logic tensor network is the process of finding
a set of parameters that maximize the satisfaction of rules
or minimizing the satisfaction of a loss rule defined by real
logic [11].
Weakly supervised disentanglement: Disentanglement is
defined by two concepts: consistency and restrictiveness.
Given a generative factor sencoded in dimensions iIof
latent space, consistency means changes in the distributions
of dimensions outside the specified set (I) do not affect given
factor s. Restrictiveness means other factors s0S\ {s}are
immune to changes in the distributions of specified dimensions
(iI) that encode generative factor s[9]. We can use different
levels of weak supervision to attain disentanglement, such as
摘要:

OutofDistributionReasoningbyWeakly-SupervisedDisentangledLogicVariationalAutoencoderZahraRahiminasabSchoolofComputerScienceandEngineeringNanyangTechnologicalUniversitySingapore,Singaporerahi0004@e.ntu.edu.sgMichaelYuhasEnergyResearchInstituteNanyangTechnologicalUniversitySingapore,Singaporemichaelj0...

展开>> 收起<<
Out of Distribution Reasoning by Weakly-Supervised Disentangled Logic Variational Autoencoder.pdf

共10页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:10 页 大小:1.15MB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 10
客服
关注