Bisparse Blind Deconvolution through Hierarchical Sparse Recovery Axel Flinth1 Ingo Roth23 and Gerhard Wunder3

2025-05-06 0 0 706.42KB 24 页 10玖币
侵权投诉
Bisparse Blind Deconvolution through Hierarchical Sparse
Recovery
Axel Flinth1, Ingo Roth2,3, and Gerhard Wunder3
1Department of Mathematics and Mathematical Statistics, Ume˚a University, Sweden
2Quantum Centre, Technology Innovation Institute, Abu Dhab, UAE
3Dahlem Center for Complex Quantum Systems, Freie Universit¨at Berlin, Germany
November 12, 2024
Abstract
The hierarchical sparsity framework, and in particular the HiHTP algorithm, has been successfully
applied to many relevant communication engineering problems recently, particularly when the signal
space is hierarchically structured. In this paper, the applicability of the HiHTP algorithm for solving
the bi-sparse blind deconvolution problem is studied. The bi-sparse blind deconvolution setting here
consists of recovering hand bfrom the knowledge of h(Qb), where Qis some linear operator, and
both band hare both assumed to be sparse. The approach rests upon lifting the problem to a linear
one, and then applying HiHTP, through the hierarchical sparsity framework. Then, for a Gaussian
draw of the random matrix Q, it is theoretically shown that an s-sparse hKµand σ-sparse bKn
with high probability can be recovered when µslog(s)2log(µ) log(µn) + log(n).
Keywords: Blind deconvolution, sparsity, restricted isometry property.
1 Introduction
Consider an underdetermined system of linear equations, i.e. y=Hx for an HKm,n with mn
(Kdenotes either Ror C). Such a system does of course not have a unique solution. However, if
one assumes some structure of the ground truth x0, it can still be recovered. This is the compressed
sensing paradigm[17]. The original example of such a structure is sparsity, i.e. the assumption that
only a few entries x0(i) of the ground truth are non-zero. In many applications such as communication
engineering [43], there is even more prior knowledge on the sparsity, i.e. the signals are hierarchically
sparse.
Definition 1. [36] Let µ, n, s, σ N. A tensor
w=X
k[µ]
ekwkKµKn
is (s, σ)-sparse if
At most sof the blocks wiKnare non-zero.
Each block wiis σ-sparse.
Here {ek}k[µ]denotes the standard basis for Kµwith entries ek(j)=1for k=jand ek(j)=0otherwise.
We will also simply refer to was hierarchically sparse.
In slightly different terms, we can think of a hierarchically sparse signal as a signal subdivided into
µfixed groups, each of sized n. Each such group is either zero, or σ-sparse, and in addition, at most s
groups are non-zero. See Figure 1 for an illustration.
1
arXiv:2210.11993v3 [cs.IT] 10 Nov 2024
Algorithm 1 (HiHTP)
Require: Measurement operator A:KµKn, measurement vector y, block column sparsity (s, σ)
1: w0= 0
2: repeat
3: Calculate the support Ωk+1 of the best approximation of (wk+A(yAwk)) in the set of (s, σ)-
sparse vectors.
4: wk+1 = argminzKµKn{∥y− Az,supp(z)k+1}.
5: until stopping criterion is met at ˜
k=k
6: return (s, σ)-sparse vector w˜
k
Figure 1: Two tensors which are both 9-sparse. However, only the left one is hierarchically sparse – it is
(3,3)-sparse.
1.1 HiHTP
To recover a structured signal from linear measurements, so called hard thresholding pursuit (HTP)
algorithms can be applied. In short, the HTP entails alternately taking gradient descent steps to solve a
least-squares problem and projections onto the set of structured signals. Originally developed for sparse
vectors [15], it was generalized to a considerably more inclusive model, under the moniker of model-based
compressed sensing in [5]. Applying the concept to hierarchically sparse signal yields the hierarchical
hard thresholding pursuit, the HiHTP (Algorithm 1)[36].
The main feature of HiHTP that distinguishes it from the generic model-based compressed sensing
algorithm is that the projection step (line 3 in Algorithm 1) can be performed efficiently: To calculate
the best (s, σ)-approximation of a tensor w, one first calculates the best σ-sparse approximation to each
block wk, which simply entails selecting the σlargest entries of each block. One then selects the sof the
σ-sparse proxies with the highest 2-norm. The computational complexity of such a projection is O(µn),
which is the same complexity as the projection onto the set of sparse signals. In addition, the first phase
of approximations can be highly parallelised, which is not the case for the projection onto sparse signals.
As for any model-based compressed sensing algorithm [5] the HiHTP will converge to a structured
signal under a restricted isometry condition. The standard restricted isometry constants are defined for
sparse signals,
δs(A) = min
w s-sparse
w∥≤1∥Aw2− ∥w2
and the hi-sparse version similarly:
Definition 2. [36] Let A:KµKnKm. The (s, σ)-HiRIP constant of Ais given by
δ(s,σ)(A) = min
w(s,σ)-sparse
w∥≤1∥Aw2− ∥w2.
Theorem 1. [36, Theorem 1] If δ3s,2σ<1/3, the sequence wkdefined by HiHTP for a y=Aw0+efor
a(s, σ)-sparse ground truth w0and error vector eKmsatisfies
wkw0∥ ≤ ρkw0w0+τe,
with ρ= (2δ3s,2σ/(1 δ2
2s,2σ))1
/2<1,τ=5.15/(1 ρ). In particular, when the measurements yare noise-free,
wkconverges towards w0at a linear rate.
We will, deliberately imprecisely, say that an operator Ahas the HiRIP, if the HiRIP-constant is
small enough for big enough sizes of sand σto ensure the stable and robust recovery of any (s, σ)-sparse
vector, as in Theorem 1.
2
In a series of papers [36, 35, 14, 12, 43, 40] by, among others, the authors of this article have shown
that a number of important classes of operators have the HiRIP.
Gaussian operators. Operators A:KµKnKmwhose matrix representations have i.i.d Gaussian
entries, have the HiRIP with high probability when mslog(µ) + log(n), where refers to an
inequality up to a universal multiplicative constant.
Kronecker products. If AKm0and BKm1,n have s- and σ-RIP constants δs(A), δσ(B),
respectively, the Kronecker product AB:KµKnKm0Km1has (s, σ)-HiRIP constant
smaller than δs+δσ+δsδσ.
Hierarchical operators. A hierarchical measurement operator is an operator H:KµKnKm0
Km1defined as a mixture of matrices (B)[µ],BKm1,n as follows:
H(w)k=X
[µ]
akℓBw
for some matrix AKm0,n. If Ahas the s-RIP constant δs(A), and all Bhave σ-RIP constants
smaller than δ, the (s, σ)-HiRIP constant of His not larger than δs(A) + δ+δs(A)·δ.
The purpose of this article is to prove that a new class of operators has the (s, σ)-RIP: namely, blind
deconvolution operators.
1.2 Blind deconvolution
The blind deconvolution problem is the problem of determining a filter hKµand a signal xKµfrom
its (circular) convolution, i.e.
Kµy=hx, y() = X
k[µ]
h(k)x(k).
It is standard to lift this bilinear recovery problem to a linear one (a detailed derivation will be presented
later). The resulting problem is severely underdetermined – the µentries of ywill not be enough to
uniquely identify the µ2-dimensional hx. Even the fact that hxis a one-rank tensor is not enough
– additional structural assumptions on hand xare necessary. In this paper, we will assume that his
s-sparse, and that xcan be written as x=Qb for a known matrix QKµ,n, and bKnaσ-sparse
vector. Hence, we are confronted with the bi-sparse version of the blind deconvolution problem.
These sets of assumptions can be motivated using the following simple model of mobile communi-
cations: Alice and Bob are communicating with each other. Bob would like to send a message bKn
to Alice. To do so, he first linearly encodes his message in a signal Qb Kµ, and then sends it over a
wireless channel to Alice. Due to scattering in the environment, Alice will not receive Qb, but rather
a superposition of several time- and phase-shifted versions of the signal Qb – i.e., she will observe the
convolution y=h(Qb) of Qb with a filter hKµmodelling the impulse response of the environment.
Since the scattered signal typically reaches the receiver antenna along a few, distinct paths, it is reason-
able to assume that the filter his sparse [4]. The message bcan on the other hand be rendered sparse
by system design.
Using the lifting approach, the blind deconvolution problem then effectively turns into recovery prob-
lem of the tensor hb. Importantly, if both hand bare sparse, hbis hierarchically sparse: The blocks
of hbare given by h(k)b,k[µ] – they are all σ-sparse, and only the svalues for kfor which h(k)̸= 0
corresponds to a non-zero block. This makes it natural to apply the HiHTP algorithm to the problem.
This was done by a subset of the authors in [42], even adding more layers of hierarchy (see Section ??)
due to multiple antennas and users with good empirical success . It was also nicely applied in [40, 41]
as a component of a secure communication scheme. In neither of the papers, theoretical guarantees were
provided, however. To provide such is essential in the security context. If they are not given, one can
e.g. not establish guarantees for provable correctness of the scheme in [40, 41].
1.3 The main result
Let us present a, somewhat simplified, version of the main result of this article.
3
Main Result. Consider the operator Ccorresponding to the blind deconvolution measurement (h, b)7→
h(Qb), where QKµ,n is a Gaussian matrix. If
µslog(s)2log(µ) log(µn) + log(n),
Cwith high probability have the (s, σ)-HiRIP.
Disregarding log-terms, the blind-deconvolution operators hence have the HiRIP under the same
sample complexity assumptions as the considerably simple Gaussian ones. Counting degrees of freedoms
in describing a (s, σ)-sparse vector, it is also the sample optimal one.
Remark 1. It is important to consider that with regards to the bisparse recovery problem, a sampling
complexity of measurement far from optimal. Counting degrees of freedom in the two vectors hand b,
one would instead hope for s+σmeasurements to be enough. In other words, hbhas more structure
than only being (s, σ)-sparse, that one intuitively would like to take advantage of. However, if one does
not make additional assumptions on hand b, a sample complexity of ·polylog(µ, n)is actually the
state of the art for feasible algorithms[18]. We will give a detailed explanation of this in Section 2.
1.4 Extensions
A feature of the hierarchical approach to sparse deconvolution problems is that it readily generalizes to
more complex settings, e.g. arising in multi-user communications. In this paper also discuss how to
solve a combined deconvolution and demixing problem using the hierarchically sparse framework: Given
Mmeasurements of the form
yq=X
p[N]
dq,php(Qpbp),
where dq,p K, q [M], p [N] are ‘mixing factors’, recover the collection of Nfilter-signal pairs
(hp, bp), p [N]. This problem, e.g., appears when users are separated both through their spatial angles
and their multipath delay as in [43]. Under the assumption that only Sfilters messages are non-zero, we
can interpret this again as a hierarchical sparse recovery problem (albeit now in three levels of hierarchy).
Using recent results from [14], we show that under the same assumptions on the individual Qpas above,
HiHTP can recover all signals and all filters from MSlog(N) mixtures.
1.5 Outline of the paper
We begin our paper with a detailed literature review of different computational approaches to the blind
deconvolution problem in Section 2. In Section 3, we give a formal introduction to our approach and our
assumptions, and also formally state the main result, Theorem 2. We also briefly discuss the multi-user
setup there. Section 4 is in its entirety devoted to proving the main result. In Section 5, we perform a
small numerical experiment to validate our theory.
Notation Most of the notation is either standard or will be introduced at first use. For a vector vFn,
we write v(i) for its i:th entry. We use ∥·∥to denote the Euclidean norm for both vectors and tensors,
∥·∥Ffor the Frobenius norm of a matrix, and ∥·∥22for the induced 22-operator norm of a
matrix.
2 Literature Review
In this section, we give a literature review of previous approaches to the blind deconvolution problem in
general, and sparse blind deconvolution in particular. The approaches can broadly be divided into two
categories:
As was described in the intro, one can identify the bilinear map γ(h, b) = hQb on Kµ×Knwith
a linear map Con KµKn. One can then minimize
K:KµKnR, w 7→ K(w) = K(y, C(w), w).
where Krefers to some suitable loss function. This procedure is generally referred to as lifting, and
we will therefore refer to this class of methods as lifted methods. HiHTP is within this class.
4
One can also directly minimize a function of the form
L:Kµ×KnR,(h, b)7→ L(h, b) = L(y, h (Qb), h, b),
defined in terms of a suitable loss function L. Let us call such methods direct.
We will give a brief overview of the relevant results from the literature about methods from both
categories. Throughout the discussion, it is instructive to remember that the optimal sample complexity
(i.e. the number of needed observations for injectivity) is given by µ+nin the dense setting and s+σ
in the sparse setting [20, 26].
Lifted approaches As for lifted approaches, it is crucial to understand that the lifted ground truth
signal h0b0or equivalently its matrix form h0b
0, where b
0denotes the Hermitian transpose of b0,
simultaneously enjoys two structures: it has rank one and is sparse (in a structured manner). Each of
these structures can individually be exploited using convex regularizers – if we take the low-rank property
as the crucial one, the nuclear norm is the canonical regularizer [2, 28, 19], whereas if the sparsity is taken
as the crucial property, the 1-norm [27] or its relative, the 1,2-norm [13] should be used. While the
convex approaches are stable, mathematically pleasing and globally solveable by off-the-shelf algorithms,
they are computationally intensive and, thus, often impractical. Furthermore, recovery guarantees are
only known for generic subspace models. Under similar assumption of known supports as above, the
nuclear norm approach recovers the ground truth when µmax(s, σ) log(µ)2[28, 19, 9]. However, not
assuming know supports, that guarantee degenerates to µmax(µ, n) log(µ)2. The sparse models (1and
1,2) recover the ground truth with high probability when µlog(sn) log(µ)2under the assumption
that the support of his known [27, 13]. If the support of his not known, the guarantee again degenerates
to µµσ log(n) log(µ)2.
There also is a natural non-convex-way to solve the lifted problem: a gradient descent projected onto
the set of (bi)-sparse and low-rank matrices. As is thoroughly discussed in [16], there is however no
practical algorithm available to compute this projection. The bi-sparse unit-rank projection is known as
the sparse PCA problem, that has been shown to be worst-case NP-hard, as well as under approximation
and on average [29, 8, 7]. A canonical way to circumvent this obstacle is to alternate between projections
onto the two sets. This approach is for instance investigated in [11]. There, a local convergence guarantee
is presented under optimal sample complexity using the considerably simpler measurement model with a
Gaussian linear map without the structure of the convolution. A global one is not.
In this context, [3] should also be mentioned – there the authors obtain global convergence in just
two alternations steps, however by assuming a nested measurement structure tailor-made for a jointly
low-rank and sparse setting, which is often not given.
Finally, in [1], a different but related problem was considered. In the paper, they consider the case
of several convolutions yi=h(Qibi), i [N] are independently observed. Importantly, the authors
do not to assume that the support of his known, only that it is sparse in a basis Bwith a certain
incoherence. However, they instead assume that the supports of the biare known, and that the Qi
are independently Gaussian distributed. Under those assumptions, they achieve recovery (through a
nuclear norm minimization approach) of both hand all bialready when µ(σ+slog(s)2) log(µN)4and
Nlog(Nµ). If we drop the assumption of known supports of the bi, the guarantee again degenerates,
to µ(µ+slog(s)) log(µs)2.
Direct approaches Among direct approaches, alternating minimization, or sparse power factorization,
is probably the most prominent. It was introduced in [31] as a means for solving the so-called phase-
retrieval problem and later adapted to blind deconvolution in [22, 23]. Alternating minimization generally
refers to taking turns in solving the following two problems,
minimize
h(sparse) L(h, b) and minimize
b(sparse) L(h, b).
Since the convolution is linear in each argument, each subproblem is effectively a classical compressed
sensing problem, and can be solved using a number of techniques, e.g. iterative hard thresholding [15] or
CoSAMP [30].
As for the question of success of alternating minimization, the authors of [22, 23] give a recovery
guarantee in a special setting: First, they use a generic subspace model, where hhas a representation
h=Bg with some s-sparse vector g, rather than being sparse itself, and Bis assumed to be a matrix with
random Gaussian entries. The measurement matrix Qis also assumed to be Gaussian. An additional
5
摘要:

BisparseBlindDeconvolutionthroughHierarchicalSparseRecoveryAxelFlinth1,IngoRoth2,3,andGerhardWunder31DepartmentofMathematicsandMathematicalStatistics,Ume˚aUniversity,Sweden2QuantumCentre,TechnologyInnovationInstitute,AbuDhab,UAE3DahlemCenterforComplexQuantumSystems,FreieUniversit¨atBerlin,GermanyNov...

展开>> 收起<<
Bisparse Blind Deconvolution through Hierarchical Sparse Recovery Axel Flinth1 Ingo Roth23 and Gerhard Wunder3.pdf

共24页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:24 页 大小:706.42KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 24
客服
关注