BELIEF in Dependence Leveraging Atomic Linearity in Data Bits for Rethinking Generalized Linear Models Benjamin Brown Kai Zhang Xiao-Li Meng

2025-05-06 0 0 603.38KB 34 页 10玖币

侵权投诉

BELIEF in Dependence: Leveraging Atomic Linearity

in Data Bits for Rethinking Generalized Linear Models

Benjamin Brown∗

, Kai Zhang†

, Xiao-Li Meng‡

December 5, 2023

Abstract

Two linearly uncorrelated binary variables must be also independent because non-linear dependence

cannot manifest with only two possible states. This inherent linearity is the atom of dependency con-

stituting any complex form of relationship. Inspired by this observation, we develop a framework called

binary expansion linear eﬀect (BELIEF) for understanding arbitrary relationships with a binary out-

come. Models from the BELIEF framework are easily interpretable because they describe the association

of binary variables in the language of linear models, yielding convenient theoretical insight and striking

Gaussian parallels. With BELIEF, one may study generalized linear models (GLM) through transparent

linear models, providing insight into how the choice of link aﬀects modeling. For example, setting a GLM

interaction coeﬃcient to zero does not necessarily lead to the kind of no-interaction model assumption

as understood under their linear model counterparts. Furthermore, for a binary response, maximum

likelihood estimation for GLMs paradoxically fails under complete separation, when the data are most

discriminative, whereas BELIEF estimation automatically reveals the perfect predictor in the data that

is responsible for complete separation. We explore these phenomena and provide related theoretical

results. We also provide preliminary empirical demonstration of some theoretical results.

Keywords: binary expansion, distribution-free, multi-resolution models, nonparametric statistics.

∗Benjamin Brown is a Ph.D. student (E-mail: brownb1@live.unc.edu), Department of Statistics and Operations Research,

University of North Carolina at Chapel Hill, Chapel Hill, NC 27599.

†Kai Zhang is an Associate Professor (E-mail: zhangk@email.unc.edu), Department of Statistics and Operations Research,

University of North Carolina at Chapel Hill, Chapel Hill, NC 27599.

‡Xiao-Li Meng is Whipple V. N. Jones Professor of Statistics (E-mail: meng@stat.harvard.edu), Department of Statistics,

Harvard University, Cambridge, MA 02138.

arXiv:2210.10852v2 [math.ST] 4 Dec 2023

1 Nonparametric Modeling Through Data Bits

1.1 Taking Advantage of an Inherent Linearity

There are two kinds of classical scientists: those who believe their models, and those who model their

belief. As such, misspeciﬁcation is the fundamental gremlin of statistical modeling: incorrect models tempt

practitoners with mathematical elegance while quietly belying reality. Statistics and data science have long

sought modeling strategies free of unduly restrictive assumptions, spanning from traditional settings (e.g.,

McCullagh and Nelder,2019;Hastie et al.,2009) to more recent eﬀorts (e.g., Lei et al.,2018;Buja et al.,

2019;Barber,2020;Gupta et al.,2020;Barber et al.,2021;Li et al.,2022).

This paper reports an eﬀort to derive a general modeling theory using the framework of binary expansion

statistics (BEStat) (Zhang,2019;Zhang et al.,2021), which leads to multi-resolution linear models for a

binary outcome under a fully nonparametric setting. Without any assumption on their joint distribution,

random variables can be eﬀectively decomposed into data bits. These data bits can be regarded as the

atoms of information from both statistical and computer science perspectives. By constructing models and

formulating inference directly from the bits, this framework provides additional theoretical insight on the

binary world and suggests a new approach for analyzing generalized linear models (GLMs), as well as an

alternative modeling strategy.

To understand complex forms of dependency, we begin by studying the simplest form of dependency—the

dependency between two binary variables. Consider two dependent Rademacher variables Aand B, which

take values of ±1 with equal probability. Trivially, because E[B|A] can only take two states depending on

the value of A, it follows that

E[B|A] = β0+β1A, (1.1)

which is intrinsically a linear model with slopes β0, β1∈R. Moreover, because E[B] = 0, we must have

β0= 0, while β1= Cov(A, B)/Var(A) = E[AB] because Var(A) = 1. Furthermore, since the conditional

distribution P(B|A) is determined by its mean given in (1.1), Aand Bare independent if and only if

Cov(A, B) = 0. The linearity in this atomic case inspires us to think about the possibility of modeling any

form of association through binary variables.

While our above calculations assume ﬁxed symmetric marginals, linearity is nonetheless inherent in

the binary nature of Aand B, because any non-linearity requires more than two states to reveal. We

emphasize that the key observation of linearity is completely general. It is useful to recognize that this

atomic linearity carries to arbitrarily many binary predictors to construct a saturated linear model of the

conditional probability. Moreover, the slopes, which we term BELIEF coeﬃcients in the context of a binary

probability model, are unique when the second moment matrix of the binary predictors is positive deﬁnite

and can be estimated through the least squares algorithm, as we show in Theorem 2.1. Whenever this

uniqueness does not hold, we say that the predictors have a degenerate distribution, and we discuss the

situation in depth in Section 3.2.

Besides indicating the strength and direction of the dependence, the slopes from this intrinsic linearity

are also useful in specifying dependence structures in the joint distribution. For example, there is a direct

correspondence between the conditional independence and multiplicative subgroups of binary predictors with

nonzero slopes, as shown in Theorem 2.5. This connection provides a basis for regularization in estimation

and prediction problems, where screening of variables (Fan and Lv,2008;He et al.,2013;Zhang,2017) or

interactions (Fan et al.,2015;Thanei et al.,2018) are well understood. Moreover, the boundedness of binary

variables and their slopes (Theorem 2.1) facilitates the applications of machine learning and high-dimensional

statistical methods (B¨uhlmann and van de Geer,2011).

Unlike the simple linear model in (1.1), a GLM connects slopes with the response by way of a (typically

nonlinear) link function. These slopes are estimated through maximum likelihood estimation (MLE), and

the speciﬁc choice of link function in the likelihood dictates how a practitioner interprets model coeﬃcients.

For instance, in the famed logistic model for a binary response, a slope describes the eﬀect of the corre-

sponding predictor on the log-odds. In contrast, interpretation of BELIEF coeﬃcients occurs directly on

the level of cell probabilities. Well-known in econometrics, a related model called the linear probability

model (LPM) achieves similar interpretability by expressing the conditional response probability directly

as a linear function of the predictors. However, this approach does not consider the binary expansion of

explanatory variables. Indeed, for suﬃciently extreme predictor values, the response probability falls out-

side [0,1], which is recognized as a substantial drawback compared to GLMs (Angrist and Pischke,2009;

Wooldridge,2010). Via binary expansion into data bits, BELIEF achieves linearity while also guaranteeing

valid response probabilities. We study the connections between GLMs and BELIEF in Section 4.1.

In addition, the linearity illustrated in (1.1) is evocative of classical Gaussian linear models. The Gaussian

analogy turns out to be a recurring theme in the study of binary random variables, as the binary world shares

many familiar Gaussian properties, while diﬀering in unintuitive ways. Table 1summarizes the comparison

of properties between binary and Gaussian variables that we will establish in this article.

Going beyond binary variables, we can use binary expansion to approximate uniform variables to an

arbitrary accuracy, and hence to approximate any (continuous) variable via the probability integral trans-

formation. This fact is summarized in the following lemma in Zhang et al. (2021).

Lemma 1.1. Let U= (U1, U2,··· , Up)⊤be a random vector supported within [−1,1]p. There exists a

Table 1: A comparison of distributional and inferential properties for binary and Gaussian linear models.

Binary Gaussian Property

Y Y Independence is equivalent to uncorrelation (Theorem 2.1)

Y Y Conditional expectation is a linear equation with slopes β(Theorem 2.1)

Y Y Slopes that are zero relate to conditional independence (Theorem 2.5)

Y Y Least squares b

βis the MLE (asymp. normal + eﬃcient) (Theorem 3.2)

Y Y b

βis unbiased whenever it exists (Theorem 3.6)

N Y Existence of residuals that are independent of b

βin general (supplementary material)

N Y KL-divergence is free of marginal information, in a sense described in

the supplementary material

Y N Slopes are within a compact convex set (Theorem 2.1).

sequence of binary random variables {Aj,d},j= 1,2,··· , p,d= 1,2,··· , D, which take only values −1and

1, such that max1≤j≤p{|Uj−Uj,D|} → 0almost surely as D→ ∞, where Uj,D =PD

d=1 (Aj,d)/2d.

By Lemma 1.1, for any random variable U, the ﬁrst Ddata bits {Ad}, d = 1,...D form a ﬁltration to

approximate the distribution of U, and σD=σ(A1, . . . , AD) is the σ-ﬁeld summarizing all information up

to depth Din the binary expansion. Hence Dis a resolution level, as in the multi-resolution framework of

Li and Meng (2021). Because of the aforementioned inherent linearity, when Uis used as a predictor for a

binary response, there is an intrinsic equation expressing the conditional expectation of the response as a

linear function of the binary variables in σD. We are thus able to approximate the dependency between U

and the binary response by extracting the hidden linearity through the binary expansion approach.

By combining the linear dependency of binary variables and the binary expansion approximation of the

distribution, there is a general distribution-free modeling strategy built upon atomic linearity in data bits,

as we show in Section 4.3. We thus refer to this modeling framework as the binary expansion linear eﬀect

(BELIEF).

1.2 Revisiting GLMs with BELIEF

For the better part of a century, GLMs have been a prevalent tool for modeling binary outcome, as sum-

marized in McCullagh and Nelder (2019). Popular methods such as logistic and probit regressions work

well when class probabilities are monotone in the predictors but struggle otherwise. The log-linear model is

another useful GLM for contingency tables, where the linearity is an assumption over log cell probabilities.

To showcase the BELIEF framework as applied to GLMs, we begin with an illustrative example. Let

A1, A2, and Bbe binary random variables taking values ±1. A GLM model linking Bto A1, A2is given by

P(B= 1|A1, A2) = g−1(γ0+γ1A1+γ2A2+γ12A1A2),(1.2)

where gis a chosen link function. The familiar expression in (1.2) represents a saturated GLM, in the sense

that all possible predictor interactions are included, as in for example Wahba et al. (1995).

Alternatively, without any link function or assumption, one can show

P(B= 1|A1, A2) = β0+β1A1+β2A2+β12A1A2(1.3)

for some slopes β= (β0, β1, β2, β12)⊤. Here, βis constrained to ∥H4β∥∞≤1, where H4is the 4 ×

4 Hadamard matrix according to Sylvester’s construction. The equation (1.3) expresses the conditional

distribution of Bas a multilinear function of the predictors directly. We refer to βas the vector of BELIEF

coeﬃcients, in contrast to the GLM coeﬃcients γfor link function g.

At heart, our main insights for GLMs rely on the comparison between (1.2) and (1.3). Globally, neither

(1.2) nor (1.3) restricts the joint distribution of (A1, A2, B). As long as (A1, A2, B) takes the maximum of

23= 8 possible values with positive probability, then for any joint distribution on (A1, A2, B), there exists

exactly one γsatisfying (1.2) and exactly one βsatisfying (1.3), forming a bijection β↔γ.

However, the story changes for connecting individual components of γwith that of β. For example, the

null hypothesis H0:γ12 = 0 is not equivalent to H0:β12 = 0, that is, there is no interaction eﬀect between

A1and A2governing the conditional distribution of B. Furthermore, no choice of link function resolves this

issue—the statement (γ12 = 0 ⇐⇒ β12 = 0) implies that gitself is linear, as discussed in Section 4.1.

Indeed, the unintuitive relationship between link functions and interaction terms is known to be problematic

in practice (Berry et al.,2010;Rainey,2016a).

Such diﬀerences between βand γcan go only so far, as both vectors must respect probabilistic properties

that are invariant to the model representation. For example, β1=β2= 0 if and only if γ1=γ2= 0, because

both conditions are equivalent to B⊥⊥ (A1, A2)|A1A2. In Section 2, we will show in Theorem 2.5 that these

essential similarities can be captured by a group structure on the predictors.

We would like to remark that binary variables are often coded as 0/1 in traditional GLM settings. Here,

we adopt the Rademacher −1/1 for mathematical convenience. With this arithmetically symmetric coding,

multiplication of two binary variables corresponds to a group operation on the set of binary variables and their

interactions, which we relate to statistical modeling properties. Moreover, this coding does not sacriﬁce any

generality in the breadth of binary models that may be represented. For general statistical interpretations,

the choice of coding will be content dependent; see Cox (1972) and McCullagh (2000).

Aside from its theoretical value as a tool for model comparison, BELIEF suggests a modeling strategy

in its own right. Since the conditional expectation of a binary outcome determines its distribution, BELIEF

enjoys many theoretically optimal properties by directly estimating this crucial quantity with least squares.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

BELIEFinDependence:LeveragingAtomicLinearityinDataBitsforRethinkingGeneralizedLinearModelsBenjaminBrown∗,KaiZhang†,Xiao-LiMeng‡December5,2023AbstractTwolinearlyuncorrelatedbinaryvariablesmustbealsoindependentbecausenon-lineardependencecannotmanifestwithonlytwopossiblestates.Thisinherentlinearityisth...

展开>> 收起<<

BELIEF in Dependence Leveraging Atomic Linearity in Data Bits for Rethinking Generalized Linear Models Benjamin Brown Kai Zhang Xiao-Li Meng.pdf

共34页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

BELIEF in Dependence Leveraging Atomic Linearity in Data Bits for Rethinking Generalized Linear Models Benjamin Brown Kai Zhang Xiao-Li Meng

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: