
Orthogonal Nonnegative Matrix Factorization with Sparsity Constraints
Salar Basiri1, Alisina Bayati1, Srinivasa M. Salapaka1
Abstract— This article presents a novel approach to solving
the sparsity-constrained Orthogonal Nonnegative Matrix Fac-
torization (SCONMF) problem, which requires decomposing a
non-negative data matrix into the product of two lower-rank
non-negative matrices,
X=W H
, where the mixing matrix
H
has orthogonal rows (
HH⊤
=I
), while also satisfying an
upper bound on the number of nonzero elements in each row.
By reformulating SCONMF as a capacity-constrained facility-
location problem (CCFLP), the proposed method naturally
integrates non-negativity, orthogonality, and sparsity constraints.
Specifically, our approach integrates control-barrier function
(CBF) based framework used for dynamic optimal control design
problems with maximum-entropy-principle-based framework
used for facility location problems to enforce these constraints
while ensuring robust factorization. Additionally, this work
introduces a quantitative approach for determining the true rank
of
W
or
H
, equivalent to the number of true features—a critical
aspect in ONMF applications where the number of features
is unknown. Simulations on various datasets demonstrate
significantly improved factorizations with low reconstruction
errors (as small as by 150 times) while strictly satisfying all
constraints, outperforming existing methods that struggle with
balancing accuracy and constraint adherence.
Index Terms— Pattern Recognition, Learning, Optimization
I. INTRODUCTION
In various machine learning, data science, and signal
processing applications involving large datasets, identifying
common features across data members and quantifying their
weights is critical. For instance, in compressing facial image
collections, it is essential to identify typical facial features and
the extent of their occurrence in each face. Since data in fields
like computer vision and bioinformatics is often nonnegative,
Nonnegative Matrix Factorization (NMF) is a powerful tool
for such tasks. NMF decomposes a nonnegative matrix into
two lower-rank nonnegative matrices: one capturing key
features and the other quantifying their contributions. The
nonnegativity constraint enhances interpretability, making
NMF especially effective for inherently nonnegative data
types, including images [
1
], [
2
], audio recordings [
3
], [
4
],
biomedical data [
5
], [
6
], [
7
], spectrometry representations [
8
],
and other data types [
9
]. Unlike Singular Value Decomposition
(SVD) and Principal Component Analysis (PCA), which allow
negative values and rely on orthogonality, NMF produces part-
based, interpretable representations. This makes it particularly
useful for applications where negative components lack real-
world meaning.
Given a large nonnegative data matrix
X∈Rd×n
+
, NMF
seeks to approximate it as the product of two low-rank
1
University of Illinois at Urbana-Champaign, Urbana, Il, 61801, USA;
Emails:
{
sbasiri2, abayati2, salapaka
}
@illinois.edu. We acknowledge
the support of National Aeronautics and Space Administration under Grant
NASA 80NSSC22M0070 for this work.
nonnegative matrices
X≈W H
, where
W∈Rd×k
+
is the
feature matrix and
H∈Rk×n
+
is the mixing matrix. The rank
k
(with
k≪min(n, d)
) determines the number of features.
Each column of
W
represents a basis feature in the original
data space, while each column of
H
encodes the contribution
of these features to the corresponding data point. Specifically,
the
ℓth
column of
X
is approximated as a weighted sum of
features given by
Pk
s=1 hsℓws.
The quality of approximation
is typically measured using the Frobenius norm
∥X−W H∥F
.
Various additional constraints have been imposed on the
matrices
W
and
H
in the literature. One important constraint
is orthogonality, where in the single-orthogonality case, either
H
(rows) or
W
(columns) must be orthogonal. In the more
restrictive double-orthogonality case, both matrices must have
orthogonal rows and columns, respectively. Certain applica-
tions, such as those in signal processing and bioinformatics,
specifically require orthogonal features [
10
], [
5
]. Another
widely used constraint is sparsity, which refers to enforcing a
large number of zero or near-zero elements in either the fea-
ture or mixing matrix. This constraint enhances interpretability
and improves computational efficiency [
11
]. While enforcing
both orthogonality and nonnegativity naturally induces some
level of sparsity, certain applications require a predefined
minimum sparsity level while maintaining these constraints
[12].
Various algorithms have been developed to solve the
Orthogonal NMF (ONMF) problem, including methods based
on iterative projection updates [
13
], [
14
], [
15
], gradient
descent [
16
], and multiplicative gradient projections [
17
], [
18
].
Additionally, some approaches frame ONMF as a clustering
problem [
19
] or propose EM-like algorithms [
20
]. In the
broader context of NMF, promoting sparsity has led to the
incorporation of various penalty functions into the objective
function as a regularization term. Most studies focus on
ℓ1, ℓ1
2
,
or mixed
ℓ1/ℓ2
-norm constraints to enforce sparsity [
21
], [
22
],
[
12
], while comparatively less attention has been given to the
ℓ0
-”norm” measure [
23
]. The
ℓ0
-”norm” of a vector
z∈Rn
counts its nonzero elements, that is,
||x||ℓ0=Pn
i=1 I(xi̸= 0)
,
where
I
is the indicator function. While not a proper norm due
to its lack of homogeneity, it is widely used in the literature
to quantify sparsity.
To our knowledge, no existing work addresses the NMF
problem while simultaneously enforcing both
ℓ0
-sparsity and
orthogonality constraints. Current methods cannot impose
distinct sparsity bounds for each feature individually. More-
over, only a few approaches ensure orthogonality, often at the
cost of reconstruction accuracy or computational efficiency.
Additionally, most existing methods are highly sensitive to
the initialization of
W
and
H
and typically require to fix
arXiv:2210.02672v4 [cs.DS] 4 Apr 2025