Orthogonal Nonnegative Matrix Factorization with Sparsity Constraints Salar Basiri1 Alisina Bayati1 Srinivasa M. Salapaka1 Abstract This article presents a novel approach to solving

2025-04-29 0 0 882.69KB 8 页 10玖币
侵权投诉
Orthogonal Nonnegative Matrix Factorization with Sparsity Constraints
Salar Basiri1, Alisina Bayati1, Srinivasa M. Salapaka1
Abstract This article presents a novel approach to solving
the sparsity-constrained Orthogonal Nonnegative Matrix Fac-
torization (SCONMF) problem, which requires decomposing a
non-negative data matrix into the product of two lower-rank
non-negative matrices,
X=W H
, where the mixing matrix
H
has orthogonal rows (
HH
=I
), while also satisfying an
upper bound on the number of nonzero elements in each row.
By reformulating SCONMF as a capacity-constrained facility-
location problem (CCFLP), the proposed method naturally
integrates non-negativity, orthogonality, and sparsity constraints.
Specifically, our approach integrates control-barrier function
(CBF) based framework used for dynamic optimal control design
problems with maximum-entropy-principle-based framework
used for facility location problems to enforce these constraints
while ensuring robust factorization. Additionally, this work
introduces a quantitative approach for determining the true rank
of
W
or
H
, equivalent to the number of true features—a critical
aspect in ONMF applications where the number of features
is unknown. Simulations on various datasets demonstrate
significantly improved factorizations with low reconstruction
errors (as small as by 150 times) while strictly satisfying all
constraints, outperforming existing methods that struggle with
balancing accuracy and constraint adherence.
Index Terms Pattern Recognition, Learning, Optimization
I. INTRODUCTION
In various machine learning, data science, and signal
processing applications involving large datasets, identifying
common features across data members and quantifying their
weights is critical. For instance, in compressing facial image
collections, it is essential to identify typical facial features and
the extent of their occurrence in each face. Since data in fields
like computer vision and bioinformatics is often nonnegative,
Nonnegative Matrix Factorization (NMF) is a powerful tool
for such tasks. NMF decomposes a nonnegative matrix into
two lower-rank nonnegative matrices: one capturing key
features and the other quantifying their contributions. The
nonnegativity constraint enhances interpretability, making
NMF especially effective for inherently nonnegative data
types, including images [
1
], [
2
], audio recordings [
3
], [
4
],
biomedical data [
5
], [
6
], [
7
], spectrometry representations [
8
],
and other data types [
9
]. Unlike Singular Value Decomposition
(SVD) and Principal Component Analysis (PCA), which allow
negative values and rely on orthogonality, NMF produces part-
based, interpretable representations. This makes it particularly
useful for applications where negative components lack real-
world meaning.
Given a large nonnegative data matrix
XRd×n
+
, NMF
seeks to approximate it as the product of two low-rank
1
University of Illinois at Urbana-Champaign, Urbana, Il, 61801, USA;
Emails:
{
sbasiri2, abayati2, salapaka
}
@illinois.edu. We acknowledge
the support of National Aeronautics and Space Administration under Grant
NASA 80NSSC22M0070 for this work.
nonnegative matrices
XW H
, where
WRd×k
+
is the
feature matrix and
HRk×n
+
is the mixing matrix. The rank
k
(with
kmin(n, d)
) determines the number of features.
Each column of
W
represents a basis feature in the original
data space, while each column of
H
encodes the contribution
of these features to the corresponding data point. Specifically,
the
th
column of
X
is approximated as a weighted sum of
features given by
Pk
s=1 hsℓws.
The quality of approximation
is typically measured using the Frobenius norm
XW HF
.
Various additional constraints have been imposed on the
matrices
W
and
H
in the literature. One important constraint
is orthogonality, where in the single-orthogonality case, either
H
(rows) or
W
(columns) must be orthogonal. In the more
restrictive double-orthogonality case, both matrices must have
orthogonal rows and columns, respectively. Certain applica-
tions, such as those in signal processing and bioinformatics,
specifically require orthogonal features [
10
], [
5
]. Another
widely used constraint is sparsity, which refers to enforcing a
large number of zero or near-zero elements in either the fea-
ture or mixing matrix. This constraint enhances interpretability
and improves computational efficiency [
11
]. While enforcing
both orthogonality and nonnegativity naturally induces some
level of sparsity, certain applications require a predefined
minimum sparsity level while maintaining these constraints
[12].
Various algorithms have been developed to solve the
Orthogonal NMF (ONMF) problem, including methods based
on iterative projection updates [
13
], [
14
], [
15
], gradient
descent [
16
], and multiplicative gradient projections [
17
], [
18
].
Additionally, some approaches frame ONMF as a clustering
problem [
19
] or propose EM-like algorithms [
20
]. In the
broader context of NMF, promoting sparsity has led to the
incorporation of various penalty functions into the objective
function as a regularization term. Most studies focus on
1, ℓ1
2
,
or mixed
1/ℓ2
-norm constraints to enforce sparsity [
21
], [
22
],
[
12
], while comparatively less attention has been given to the
0
-”norm” measure [
23
]. The
0
-”norm” of a vector
zRn
counts its nonzero elements, that is,
||x||0=Pn
i=1 I(xi̸= 0)
,
where
I
is the indicator function. While not a proper norm due
to its lack of homogeneity, it is widely used in the literature
to quantify sparsity.
To our knowledge, no existing work addresses the NMF
problem while simultaneously enforcing both
0
-sparsity and
orthogonality constraints. Current methods cannot impose
distinct sparsity bounds for each feature individually. More-
over, only a few approaches ensure orthogonality, often at the
cost of reconstruction accuracy or computational efficiency.
Additionally, most existing methods are highly sensitive to
the initialization of
W
and
H
and typically require to fix
arXiv:2210.02672v4 [cs.DS] 4 Apr 2025
the number
k
of features a priori. However, computational
time, reconstruction error, and their trade-offs are directly
influenced by the choice of
k
, making it a crucial yet often
restrictive parameter.
In this paper, we propose a mathematical framework for
solving the ONMF problem while enforcing an
0
-”norm”
sparsity constraint. Our approach is flexible and accommo-
dates scenarios where the number of features is unknown
or needs to be determined adaptively. Our key insight is
to reinterpret sparsity-constrained ONMF (SCONMF) as a
Capacity-Constrained Facility Location Problem (CCFLP) -
a class of
N P
-hard Facility Location Problems. In CCFLP,
each facility has a limited capacity that restricts the number or
demand of consumers it can serve. The goal is to determine
optimal facility locations and assignments to minimize overall
costs, such as transportation distance or operational expenses,
while ensuring that no facility exceeds its capacity. SCONMF
is a CCFLP, where columns of
X
represent consumer
locations, columns of
W
(feature vectors) represent facility
locations; and the mixing matrix
H
encodes the assignments.
We leverage a Maximum-Entropy Principle (MEP) based
Deterministic Annealing (DA) algorithm—widely used in
data compression and pattern recognition [
24
]—to solve
FLPs. DA uses an iterative annealing process where, at
each iteration, every consumer is associated with all facilities
via a probability distribution. This distribution, along with
facility locations, is obtained by solving a relaxed optimization
problem, using the previous iteration’s solution as an improved
initial guess. In initial iterations, high-entropy distributions
ensure equitable associations across facilities, reducing sensi-
tivity to initialization; at later iterations, as entropy decreases,
the distributions harden until each consumer is assigned to
a single facility. However, standard DA does not handle the
inequality constraints of capacity in CCFLPs. To overcome
this, we reformulate the static relaxed CCFLP (and thus
the SCONMF problem) at each iteration as a constrained
dynamic control problem. We demonstrate that solutions to
this dynamic problem satisfy the Karush-Kuhn-Tucker (KKT)
conditions of the original CCFLP. To solve the constrained
dynamic problem efficiently, we employ a Control Barrier
Functions (CBF)-based framework [
25
], [
26
]. This adaptation
allows us to compute facility locations and probability
distributions at each iteration, thereby extending DA to handle
sparsity-constrained ONMF problems.
We assert that posing the SCONMF as a CCFLP and
solving it through MEP provides remarkable advantages,
such as guaranteeing nonnegativity and orthogonality, while
maintaining invariance to initialization. Furthermore, Our
method evolves hierarchically, enabling determination of the
true number of features. At initial iterations, all
k
features
are identical, but with the iterations, distinct features emerge,
and reconstruction error
XW H2
F
decreases. In [
27
],
we showed (in the FLP context) that beyond a certain
k
,
reconstruction error reduction for every additional distinct
feature becomes relatively negligible. This true number
k
is determinable from our algorithm, and also corresponds to
the number of distinct features that persists over the widest
range of reconstruction error values [27].
In simulations on synthetic and standard datasets, our
algorithm outperformed other ONMF methods by achieving
state-of-the-art reconstruction error, full orthogonality, and
higher sparsity—all while improving computational efficiency.
For example, on synthetic data, our method achieved a
reconstruction error over 150 times smaller than the average
of other methods, exhibited complete orthogonality, attained
the highest sparsity, and ran the fastest. Additionally, it
delivered up to 175% higher sparsity compared to competing
approaches.
II. PROBLEM FORMULATION
Denote
Dk
+
as the set of
k×k
diagonal matrices with
positive diagonal entries, and
Pk
the set of
k×k
permutation
matrices (square matrices with exactly one entry of 1 in
each row and column, and 0 elsewhere). Define the set
of generalized permutation matrices
k
+:= {DP |D
Dk
+, P ∈ Pk}
. Consider the data matrix
XRd×n
+
that
we want to approximate by the product of two nonnegative
matrices
WRd×k
+
and
HRk×n
+
. The ONMF poses the
following optimization problem:
min
W,H D(X, W H)s.t. HH=Ik;Wij , Hij 0(1)
where
Ik
is
k×k
identity matrix,
D(X, W H)
is the distance
function (representing the reconstruction error), and is often
chosen to be the squared Frobenius norm
||XW H||2
F
. The
orthogonality constraint may be replaced by WW=Ik.
Remark 1:
Without the orthogonality constraint
HH=
Ik
in (1), if a solution
(¯
W , ¯
H)
exists, there exists a large
number of square matrices
B∈ B
(
B
needs to be square to
preserve inner dimension
k
) such that any other pair of the
form
(¯
W B1, B ¯
H)
is also a solution to the problem as long
as nonnegativity of factors are preserved (
k
+⊂ B
). However,
this degree of freedom is removed with the presence of the
orthogonality constraint
HH=Ik
, and
B
is restricted to
be in
Pk
, making the solution unique up to permutation. For
proof, see Proposition 1 in [13].
For ONMF, the constraint
HH=Ik
is often overly
restrictive, as it is sufficient for the rows of
H
to be
orthogonal without requiring unit length. Therefore, a relaxed
formulation is commonly used, replacing
HH=I
with
HH∈ Dk
+
, which typically results in lower reconstruction
errors compared to the original ONMF formulation (1).
Moreover, we can introduce the sparsity constraints on
H
,
requiring number of non-zero values of each row denoted by
cj:= ||Hj:||0
to be smaller than
¯cjN
. Therefore, we can
formulate the SCONMF problem as follows:
min
Θmin
W, ˆ
H
D(X, W ˆ
HΘ) s.t. ˆ
Hˆ
H=Ik,Θ∈ Dn
+,(2)
|| ˆ
Hj:||0¯cj, Wij ,ˆ
Hij 0.
where in this formulation
H=ˆ
HΘ
and
HH∈ Dk
+
.
Note that since
Θ∈ Dn
+,||Hj:||0=|| ˆ
Hj:||0
and hence the
sparsity constraint can be imposed on
ˆ
H
. In the following
theorem, we discuss the uniqueness of the solutions of (2).
摘要:

OrthogonalNonnegativeMatrixFactorizationwithSparsityConstraintsSalarBasiri1,AlisinaBayati1,SrinivasaM.Salapaka1Abstract—Thisarticlepresentsanovelapproachtosolvingthesparsity-constrainedOrthogonalNonnegativeMatrixFac-torization(SCONMF)problem,whichrequiresdecomposinganon-negativedatamatrixintotheprod...

展开>> 收起<<
Orthogonal Nonnegative Matrix Factorization with Sparsity Constraints Salar Basiri1 Alisina Bayati1 Srinivasa M. Salapaka1 Abstract This article presents a novel approach to solving.pdf

共8页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:8 页 大小:882.69KB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 8
客服
关注