Orthogonal Nonnegative Matrix Factorization with Sparsity Constraints Salar Basiri1 Alisina Bayati1 Srinivasa M. Salapaka1 Abstract This article presents a novel approach to solving

2025-04-29 0 0 882.69KB 8 页 10玖币

侵权投诉

Orthogonal Nonnegative Matrix Factorization with Sparsity Constraints

Salar Basiri1, Alisina Bayati1, Srinivasa M. Salapaka1

Abstract— This article presents a novel approach to solving

the sparsity-constrained Orthogonal Nonnegative Matrix Fac-

torization (SCONMF) problem, which requires decomposing a

non-negative data matrix into the product of two lower-rank

non-negative matrices,

X=W H

, where the mixing matrix

has orthogonal rows (

HH⊤

), while also satisfying an

upper bound on the number of nonzero elements in each row.

By reformulating SCONMF as a capacity-constrained facility-

location problem (CCFLP), the proposed method naturally

integrates non-negativity, orthogonality, and sparsity constraints.

Speciﬁcally, our approach integrates control-barrier function

(CBF) based framework used for dynamic optimal control design

problems with maximum-entropy-principle-based framework

used for facility location problems to enforce these constraints

while ensuring robust factorization. Additionally, this work

introduces a quantitative approach for determining the true rank

, equivalent to the number of true features—a critical

aspect in ONMF applications where the number of features

is unknown. Simulations on various datasets demonstrate

signiﬁcantly improved factorizations with low reconstruction

errors (as small as by 150 times) while strictly satisfying all

constraints, outperforming existing methods that struggle with

balancing accuracy and constraint adherence.

Index Terms— Pattern Recognition, Learning, Optimization

I. INTRODUCTION

In various machine learning, data science, and signal

processing applications involving large datasets, identifying

common features across data members and quantifying their

weights is critical. For instance, in compressing facial image

collections, it is essential to identify typical facial features and

the extent of their occurrence in each face. Since data in ﬁelds

like computer vision and bioinformatics is often nonnegative,

Nonnegative Matrix Factorization (NMF) is a powerful tool

for such tasks. NMF decomposes a nonnegative matrix into

two lower-rank nonnegative matrices: one capturing key

features and the other quantifying their contributions. The

nonnegativity constraint enhances interpretability, making

NMF especially effective for inherently nonnegative data

types, including images [

], [

], audio recordings [

], [

biomedical data [

], [

], spectrometry representations [

and other data types [

]. Unlike Singular Value Decomposition

(SVD) and Principal Component Analysis (PCA), which allow

negative values and rely on orthogonality, NMF produces part-

based, interpretable representations. This makes it particularly

useful for applications where negative components lack real-

world meaning.

Given a large nonnegative data matrix

X∈Rd×n

, NMF

seeks to approximate it as the product of two low-rank

University of Illinois at Urbana-Champaign, Urbana, Il, 61801, USA;

Emails:

{

sbasiri2, abayati2, salapaka

}

@illinois.edu. We acknowledge

the support of National Aeronautics and Space Administration under Grant

NASA 80NSSC22M0070 for this work.

nonnegative matrices

X≈W H

, where

W∈Rd×k

is the

feature matrix and

H∈Rk×n

is the mixing matrix. The rank

(with

k≪min(n, d)

) determines the number of features.

Each column of

represents a basis feature in the original

data space, while each column of

encodes the contribution

of these features to the corresponding data point. Speciﬁcally,

the

ℓth

column of

is approximated as a weighted sum of

features given by

s=1 hsℓws.

The quality of approximation

is typically measured using the Frobenius norm

∥X−W H∥F

Various additional constraints have been imposed on the

matrices

and

in the literature. One important constraint

is orthogonality, where in the single-orthogonality case, either

(rows) or

(columns) must be orthogonal. In the more

restrictive double-orthogonality case, both matrices must have

orthogonal rows and columns, respectively. Certain applica-

tions, such as those in signal processing and bioinformatics,

speciﬁcally require orthogonal features [

], [

]. Another

widely used constraint is sparsity, which refers to enforcing a

large number of zero or near-zero elements in either the fea-

ture or mixing matrix. This constraint enhances interpretability

and improves computational efﬁciency [

]. While enforcing

both orthogonality and nonnegativity naturally induces some

level of sparsity, certain applications require a predeﬁned

minimum sparsity level while maintaining these constraints

[12].

Various algorithms have been developed to solve the

Orthogonal NMF (ONMF) problem, including methods based

on iterative projection updates [

], [

], gradient

descent [

], and multiplicative gradient projections [

], [

Additionally, some approaches frame ONMF as a clustering

problem [

] or propose EM-like algorithms [

]. In the

broader context of NMF, promoting sparsity has led to the

incorporation of various penalty functions into the objective

function as a regularization term. Most studies focus on

ℓ1, ℓ1

or mixed

ℓ1/ℓ2

-norm constraints to enforce sparsity [

], [

[

], while comparatively less attention has been given to the

ℓ0

-”norm” measure [

]. The

ℓ0

-”norm” of a vector

z∈Rn

counts its nonzero elements, that is,

||x||ℓ0=Pn

i=1 I(xi̸= 0)

where

is the indicator function. While not a proper norm due

to its lack of homogeneity, it is widely used in the literature

to quantify sparsity.

To our knowledge, no existing work addresses the NMF

problem while simultaneously enforcing both

ℓ0

-sparsity and

orthogonality constraints. Current methods cannot impose

distinct sparsity bounds for each feature individually. More-

over, only a few approaches ensure orthogonality, often at the

cost of reconstruction accuracy or computational efﬁciency.

Additionally, most existing methods are highly sensitive to

the initialization of

and

and typically require to ﬁx

arXiv:2210.02672v4 [cs.DS] 4 Apr 2025

the number

of features a priori. However, computational

time, reconstruction error, and their trade-offs are directly

inﬂuenced by the choice of

, making it a crucial yet often

restrictive parameter.

In this paper, we propose a mathematical framework for

solving the ONMF problem while enforcing an

ℓ0

-”norm”

sparsity constraint. Our approach is ﬂexible and accommo-

dates scenarios where the number of features is unknown

or needs to be determined adaptively. Our key insight is

to reinterpret sparsity-constrained ONMF (SCONMF) as a

Capacity-Constrained Facility Location Problem (CCFLP) -

a class of

N P

-hard Facility Location Problems. In CCFLP,

each facility has a limited capacity that restricts the number or

demand of consumers it can serve. The goal is to determine

optimal facility locations and assignments to minimize overall

costs, such as transportation distance or operational expenses,

while ensuring that no facility exceeds its capacity. SCONMF

is a CCFLP, where columns of

represent consumer

locations, columns of

(feature vectors) represent facility

locations; and the mixing matrix

encodes the assignments.

We leverage a Maximum-Entropy Principle (MEP) based

Deterministic Annealing (DA) algorithm—widely used in

data compression and pattern recognition [

]—to solve

FLPs. DA uses an iterative annealing process where, at

each iteration, every consumer is associated with all facilities

via a probability distribution. This distribution, along with

facility locations, is obtained by solving a relaxed optimization

problem, using the previous iteration’s solution as an improved

initial guess. In initial iterations, high-entropy distributions

ensure equitable associations across facilities, reducing sensi-

tivity to initialization; at later iterations, as entropy decreases,

the distributions harden until each consumer is assigned to

a single facility. However, standard DA does not handle the

inequality constraints of capacity in CCFLPs. To overcome

this, we reformulate the static relaxed CCFLP (and thus

the SCONMF problem) at each iteration as a constrained

dynamic control problem. We demonstrate that solutions to

this dynamic problem satisfy the Karush-Kuhn-Tucker (KKT)

conditions of the original CCFLP. To solve the constrained

dynamic problem efﬁciently, we employ a Control Barrier

Functions (CBF)-based framework [

], [

]. This adaptation

allows us to compute facility locations and probability

distributions at each iteration, thereby extending DA to handle

sparsity-constrained ONMF problems.

We assert that posing the SCONMF as a CCFLP and

solving it through MEP provides remarkable advantages,

such as guaranteeing nonnegativity and orthogonality, while

maintaining invariance to initialization. Furthermore, Our

method evolves hierarchically, enabling determination of the

true number of features. At initial iterations, all

features

are identical, but with the iterations, distinct features emerge,

and reconstruction error

∥X−W H∥2

decreases. In [

we showed (in the FLP context) that beyond a certain

k⋆

reconstruction error reduction for every additional distinct

feature becomes relatively negligible. This true number

k⋆

is determinable from our algorithm, and also corresponds to

the number of distinct features that persists over the widest

range of reconstruction error values [27].

In simulations on synthetic and standard datasets, our

algorithm outperformed other ONMF methods by achieving

state-of-the-art reconstruction error, full orthogonality, and

higher sparsity—all while improving computational efﬁciency.

For example, on synthetic data, our method achieved a

reconstruction error over 150 times smaller than the average

of other methods, exhibited complete orthogonality, attained

the highest sparsity, and ran the fastest. Additionally, it

delivered up to 175% higher sparsity compared to competing

approaches.

II. PROBLEM FORMULATION

Denote

as the set of

k×k

diagonal matrices with

positive diagonal entries, and

the set of

k×k

permutation

matrices (square matrices with exactly one entry of 1 in

each row and column, and 0 elsewhere). Deﬁne the set

of generalized permutation matrices

∆k

+:= {DP |D∈

+, P ∈ Pk}

. Consider the data matrix

X∈Rd×n

that

we want to approximate by the product of two nonnegative

matrices

W∈Rd×k

and

H∈Rk×n

. The ONMF poses the

following optimization problem:

min

W,H D(X, W H)s.t. HH⊤=Ik;Wij , Hij ≥0(1)

where

k×k

identity matrix,

D(X, W H)

is the distance

function (representing the reconstruction error), and is often

chosen to be the squared Frobenius norm

||X−W H||2

. The

orthogonality constraint may be replaced by W⊤W=Ik.

Remark 1:

Without the orthogonality constraint

HH⊤=

in (1), if a solution

(¯

W , ¯

exists, there exists a large

number of square matrices

B∈ B

(

needs to be square to

preserve inner dimension

) such that any other pair of the

form

(¯

W B−1, B ¯

is also a solution to the problem as long

as nonnegativity of factors are preserved (

∆k

+⊂ B

). However,

this degree of freedom is removed with the presence of the

orthogonality constraint

HH⊤=Ik

, and

is restricted to

be in

, making the solution unique up to permutation. For

proof, see Proposition 1 in [13].

For ONMF, the constraint

HH⊤=Ik

is often overly

restrictive, as it is sufﬁcient for the rows of

to be

orthogonal without requiring unit length. Therefore, a relaxed

formulation is commonly used, replacing

HH⊤=I

with

HH⊤∈ Dk

, which typically results in lower reconstruction

errors compared to the original ONMF formulation (1).

Moreover, we can introduce the sparsity constraints on

requiring number of non-zero values of each row denoted by

cj:= ||Hj:||ℓ0

to be smaller than

¯cj∈N

. Therefore, we can

formulate the SCONMF problem as follows:

min

Θmin

W, ˆ

D(X, W ˆ

HΘ) s.t. ˆ

Hˆ

H⊤=Ik,Θ∈ Dn

+,(2)

|| ˆ

Hj:||ℓ0≤¯cj, Wij ,ˆ

Hij ≥0.

where in this formulation

H=ˆ

HΘ

and

HH⊤∈ Dk

Note that since

Θ∈ Dn

+,||Hj:||ℓ0=|| ˆ

Hj:||ℓ0

and hence the

sparsity constraint can be imposed on

. In the following

theorem, we discuss the uniqueness of the solutions of (2).

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

OrthogonalNonnegativeMatrixFactorizationwithSparsityConstraintsSalarBasiri1,AlisinaBayati1,SrinivasaM.Salapaka1Abstract—Thisarticlepresentsanovelapproachtosolvingthesparsity-constrainedOrthogonalNonnegativeMatrixFac-torization(SCONMF)problem,whichrequiresdecomposinganon-negativedatamatrixintotheprod...

展开>> 收起<<

Orthogonal Nonnegative Matrix Factorization with Sparsity Constraints Salar Basiri1 Alisina Bayati1 Srinivasa M. Salapaka1 Abstract This article presents a novel approach to solving.pdf

共8页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Orthogonal Nonnegative Matrix Factorization with Sparsity Constraints Salar Basiri1 Alisina Bayati1 Srinivasa M. Salapaka1 Abstract This article presents a novel approach to solving

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: