BELIEF in Dependence Leveraging Atomic Linearity in Data Bits for Rethinking Generalized Linear Models Benjamin Brown Kai Zhang Xiao-Li Meng

2025-05-06 0 0 603.38KB 34 页 10玖币
侵权投诉
BELIEF in Dependence: Leveraging Atomic Linearity
in Data Bits for Rethinking Generalized Linear Models
Benjamin Brown
, Kai Zhang
, Xiao-Li Meng
December 5, 2023
Abstract
Two linearly uncorrelated binary variables must be also independent because non-linear dependence
cannot manifest with only two possible states. This inherent linearity is the atom of dependency con-
stituting any complex form of relationship. Inspired by this observation, we develop a framework called
binary expansion linear effect (BELIEF) for understanding arbitrary relationships with a binary out-
come. Models from the BELIEF framework are easily interpretable because they describe the association
of binary variables in the language of linear models, yielding convenient theoretical insight and striking
Gaussian parallels. With BELIEF, one may study generalized linear models (GLM) through transparent
linear models, providing insight into how the choice of link affects modeling. For example, setting a GLM
interaction coefficient to zero does not necessarily lead to the kind of no-interaction model assumption
as understood under their linear model counterparts. Furthermore, for a binary response, maximum
likelihood estimation for GLMs paradoxically fails under complete separation, when the data are most
discriminative, whereas BELIEF estimation automatically reveals the perfect predictor in the data that
is responsible for complete separation. We explore these phenomena and provide related theoretical
results. We also provide preliminary empirical demonstration of some theoretical results.
Keywords: binary expansion, distribution-free, multi-resolution models, nonparametric statistics.
Benjamin Brown is a Ph.D. student (E-mail: brownb1@live.unc.edu), Department of Statistics and Operations Research,
University of North Carolina at Chapel Hill, Chapel Hill, NC 27599.
Kai Zhang is an Associate Professor (E-mail: zhangk@email.unc.edu), Department of Statistics and Operations Research,
University of North Carolina at Chapel Hill, Chapel Hill, NC 27599.
Xiao-Li Meng is Whipple V. N. Jones Professor of Statistics (E-mail: meng@stat.harvard.edu), Department of Statistics,
Harvard University, Cambridge, MA 02138.
1
arXiv:2210.10852v2 [math.ST] 4 Dec 2023
1 Nonparametric Modeling Through Data Bits
1.1 Taking Advantage of an Inherent Linearity
There are two kinds of classical scientists: those who believe their models, and those who model their
belief. As such, misspecification is the fundamental gremlin of statistical modeling: incorrect models tempt
practitoners with mathematical elegance while quietly belying reality. Statistics and data science have long
sought modeling strategies free of unduly restrictive assumptions, spanning from traditional settings (e.g.,
McCullagh and Nelder,2019;Hastie et al.,2009) to more recent efforts (e.g., Lei et al.,2018;Buja et al.,
2019;Barber,2020;Gupta et al.,2020;Barber et al.,2021;Li et al.,2022).
This paper reports an effort to derive a general modeling theory using the framework of binary expansion
statistics (BEStat) (Zhang,2019;Zhang et al.,2021), which leads to multi-resolution linear models for a
binary outcome under a fully nonparametric setting. Without any assumption on their joint distribution,
random variables can be effectively decomposed into data bits. These data bits can be regarded as the
atoms of information from both statistical and computer science perspectives. By constructing models and
formulating inference directly from the bits, this framework provides additional theoretical insight on the
binary world and suggests a new approach for analyzing generalized linear models (GLMs), as well as an
alternative modeling strategy.
To understand complex forms of dependency, we begin by studying the simplest form of dependency—the
dependency between two binary variables. Consider two dependent Rademacher variables Aand B, which
take values of ±1 with equal probability. Trivially, because E[B|A] can only take two states depending on
the value of A, it follows that
E[B|A] = β0+β1A, (1.1)
which is intrinsically a linear model with slopes β0, β1R. Moreover, because E[B] = 0, we must have
β0= 0, while β1= Cov(A, B)/Var(A) = E[AB] because Var(A) = 1. Furthermore, since the conditional
distribution P(B|A) is determined by its mean given in (1.1), Aand Bare independent if and only if
Cov(A, B) = 0. The linearity in this atomic case inspires us to think about the possibility of modeling any
form of association through binary variables.
While our above calculations assume fixed symmetric marginals, linearity is nonetheless inherent in
the binary nature of Aand B, because any non-linearity requires more than two states to reveal. We
emphasize that the key observation of linearity is completely general. It is useful to recognize that this
atomic linearity carries to arbitrarily many binary predictors to construct a saturated linear model of the
conditional probability. Moreover, the slopes, which we term BELIEF coefficients in the context of a binary
2
probability model, are unique when the second moment matrix of the binary predictors is positive definite
and can be estimated through the least squares algorithm, as we show in Theorem 2.1. Whenever this
uniqueness does not hold, we say that the predictors have a degenerate distribution, and we discuss the
situation in depth in Section 3.2.
Besides indicating the strength and direction of the dependence, the slopes from this intrinsic linearity
are also useful in specifying dependence structures in the joint distribution. For example, there is a direct
correspondence between the conditional independence and multiplicative subgroups of binary predictors with
nonzero slopes, as shown in Theorem 2.5. This connection provides a basis for regularization in estimation
and prediction problems, where screening of variables (Fan and Lv,2008;He et al.,2013;Zhang,2017) or
interactions (Fan et al.,2015;Thanei et al.,2018) are well understood. Moreover, the boundedness of binary
variables and their slopes (Theorem 2.1) facilitates the applications of machine learning and high-dimensional
statistical methods (uhlmann and van de Geer,2011).
Unlike the simple linear model in (1.1), a GLM connects slopes with the response by way of a (typically
nonlinear) link function. These slopes are estimated through maximum likelihood estimation (MLE), and
the specific choice of link function in the likelihood dictates how a practitioner interprets model coefficients.
For instance, in the famed logistic model for a binary response, a slope describes the effect of the corre-
sponding predictor on the log-odds. In contrast, interpretation of BELIEF coefficients occurs directly on
the level of cell probabilities. Well-known in econometrics, a related model called the linear probability
model (LPM) achieves similar interpretability by expressing the conditional response probability directly
as a linear function of the predictors. However, this approach does not consider the binary expansion of
explanatory variables. Indeed, for sufficiently extreme predictor values, the response probability falls out-
side [0,1], which is recognized as a substantial drawback compared to GLMs (Angrist and Pischke,2009;
Wooldridge,2010). Via binary expansion into data bits, BELIEF achieves linearity while also guaranteeing
valid response probabilities. We study the connections between GLMs and BELIEF in Section 4.1.
In addition, the linearity illustrated in (1.1) is evocative of classical Gaussian linear models. The Gaussian
analogy turns out to be a recurring theme in the study of binary random variables, as the binary world shares
many familiar Gaussian properties, while differing in unintuitive ways. Table 1summarizes the comparison
of properties between binary and Gaussian variables that we will establish in this article.
Going beyond binary variables, we can use binary expansion to approximate uniform variables to an
arbitrary accuracy, and hence to approximate any (continuous) variable via the probability integral trans-
formation. This fact is summarized in the following lemma in Zhang et al. (2021).
Lemma 1.1. Let U= (U1, U2,··· , Up)be a random vector supported within [1,1]p. There exists a
3
Table 1: A comparison of distributional and inferential properties for binary and Gaussian linear models.
Binary Gaussian Property
Y Y Independence is equivalent to uncorrelation (Theorem 2.1)
Y Y Conditional expectation is a linear equation with slopes β(Theorem 2.1)
Y Y Slopes that are zero relate to conditional independence (Theorem 2.5)
Y Y Least squares b
βis the MLE (asymp. normal + efficient) (Theorem 3.2)
Y Y b
βis unbiased whenever it exists (Theorem 3.6)
N Y Existence of residuals that are independent of b
βin general (supplementary material)
N Y KL-divergence is free of marginal information, in a sense described in
the supplementary material
Y N Slopes are within a compact convex set (Theorem 2.1).
sequence of binary random variables {Aj,d},j= 1,2,··· , p,d= 1,2,··· , D, which take only values 1and
1, such that max1jp{|UjUj,D|} → 0almost surely as D→ ∞, where Uj,D =PD
d=1 (Aj,d)/2d.
By Lemma 1.1, for any random variable U, the first Ddata bits {Ad}, d = 1,...D form a filtration to
approximate the distribution of U, and σD=σ(A1, . . . , AD) is the σ-field summarizing all information up
to depth Din the binary expansion. Hence Dis a resolution level, as in the multi-resolution framework of
Li and Meng (2021). Because of the aforementioned inherent linearity, when Uis used as a predictor for a
binary response, there is an intrinsic equation expressing the conditional expectation of the response as a
linear function of the binary variables in σD. We are thus able to approximate the dependency between U
and the binary response by extracting the hidden linearity through the binary expansion approach.
By combining the linear dependency of binary variables and the binary expansion approximation of the
distribution, there is a general distribution-free modeling strategy built upon atomic linearity in data bits,
as we show in Section 4.3. We thus refer to this modeling framework as the binary expansion linear effect
(BELIEF).
1.2 Revisiting GLMs with BELIEF
For the better part of a century, GLMs have been a prevalent tool for modeling binary outcome, as sum-
marized in McCullagh and Nelder (2019). Popular methods such as logistic and probit regressions work
well when class probabilities are monotone in the predictors but struggle otherwise. The log-linear model is
another useful GLM for contingency tables, where the linearity is an assumption over log cell probabilities.
To showcase the BELIEF framework as applied to GLMs, we begin with an illustrative example. Let
A1, A2, and Bbe binary random variables taking values ±1. A GLM model linking Bto A1, A2is given by
P(B= 1|A1, A2) = g1(γ0+γ1A1+γ2A2+γ12A1A2),(1.2)
4
where gis a chosen link function. The familiar expression in (1.2) represents a saturated GLM, in the sense
that all possible predictor interactions are included, as in for example Wahba et al. (1995).
Alternatively, without any link function or assumption, one can show
P(B= 1|A1, A2) = β0+β1A1+β2A2+β12A1A2(1.3)
for some slopes β= (β0, β1, β2, β12). Here, βis constrained to H4β1, where H4is the 4 ×
4 Hadamard matrix according to Sylvester’s construction. The equation (1.3) expresses the conditional
distribution of Bas a multilinear function of the predictors directly. We refer to βas the vector of BELIEF
coefficients, in contrast to the GLM coefficients γfor link function g.
At heart, our main insights for GLMs rely on the comparison between (1.2) and (1.3). Globally, neither
(1.2) nor (1.3) restricts the joint distribution of (A1, A2, B). As long as (A1, A2, B) takes the maximum of
23= 8 possible values with positive probability, then for any joint distribution on (A1, A2, B), there exists
exactly one γsatisfying (1.2) and exactly one βsatisfying (1.3), forming a bijection βγ.
However, the story changes for connecting individual components of γwith that of β. For example, the
null hypothesis H0:γ12 = 0 is not equivalent to H0:β12 = 0, that is, there is no interaction effect between
A1and A2governing the conditional distribution of B. Furthermore, no choice of link function resolves this
issue—the statement (γ12 = 0 β12 = 0) implies that gitself is linear, as discussed in Section 4.1.
Indeed, the unintuitive relationship between link functions and interaction terms is known to be problematic
in practice (Berry et al.,2010;Rainey,2016a).
Such differences between βand γcan go only so far, as both vectors must respect probabilistic properties
that are invariant to the model representation. For example, β1=β2= 0 if and only if γ1=γ2= 0, because
both conditions are equivalent to B(A1, A2)|A1A2. In Section 2, we will show in Theorem 2.5 that these
essential similarities can be captured by a group structure on the predictors.
We would like to remark that binary variables are often coded as 0/1 in traditional GLM settings. Here,
we adopt the Rademacher 1/1 for mathematical convenience. With this arithmetically symmetric coding,
multiplication of two binary variables corresponds to a group operation on the set of binary variables and their
interactions, which we relate to statistical modeling properties. Moreover, this coding does not sacrifice any
generality in the breadth of binary models that may be represented. For general statistical interpretations,
the choice of coding will be content dependent; see Cox (1972) and McCullagh (2000).
Aside from its theoretical value as a tool for model comparison, BELIEF suggests a modeling strategy
in its own right. Since the conditional expectation of a binary outcome determines its distribution, BELIEF
enjoys many theoretically optimal properties by directly estimating this crucial quantity with least squares.
5
摘要:

BELIEFinDependence:LeveragingAtomicLinearityinDataBitsforRethinkingGeneralizedLinearModelsBenjaminBrown∗,KaiZhang†,Xiao-LiMeng‡December5,2023AbstractTwolinearlyuncorrelatedbinaryvariablesmustbealsoindependentbecausenon-lineardependencecannotmanifestwithonlytwopossiblestates.Thisinherentlinearityisth...

展开>> 收起<<
BELIEF in Dependence Leveraging Atomic Linearity in Data Bits for Rethinking Generalized Linear Models Benjamin Brown Kai Zhang Xiao-Li Meng.pdf

共34页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:34 页 大小:603.38KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 34
客服
关注