Bisparse Blind Deconvolution through Hierarchical Sparse Recovery Axel Flinth1 Ingo Roth23 and Gerhard Wunder3

2025-05-06 0 0 706.42KB 24 页 10玖币

侵权投诉

Bisparse Blind Deconvolution through Hierarchical Sparse

Recovery

Axel Flinth1, Ingo Roth2,3, and Gerhard Wunder3

1Department of Mathematics and Mathematical Statistics, Ume˚a University, Sweden

2Quantum Centre, Technology Innovation Institute, Abu Dhab, UAE

3Dahlem Center for Complex Quantum Systems, Freie Universit¨at Berlin, Germany

November 12, 2024

Abstract

The hierarchical sparsity framework, and in particular the HiHTP algorithm, has been successfully

applied to many relevant communication engineering problems recently, particularly when the signal

space is hierarchically structured. In this paper, the applicability of the HiHTP algorithm for solving

the bi-sparse blind deconvolution problem is studied. The bi-sparse blind deconvolution setting here

consists of recovering hand bfrom the knowledge of h∗(Qb), where Qis some linear operator, and

both band hare both assumed to be sparse. The approach rests upon lifting the problem to a linear

one, and then applying HiHTP, through the hierarchical sparsity framework. Then, for a Gaussian

draw of the random matrix Q, it is theoretically shown that an s-sparse h∈Kµand σ-sparse b∈Kn

with high probability can be recovered when µ≳slog(s)2log(µ) log(µn) + sσ log(n).

Keywords: Blind deconvolution, sparsity, restricted isometry property.

1 Introduction

Consider an underdetermined system of linear equations, i.e. y=Hx for an H∈Km,n with m≪n

(Kdenotes either Ror C). Such a system does of course not have a unique solution. However, if

one assumes some structure of the ground truth x0, it can still be recovered. This is the compressed

sensing paradigm[17]. The original example of such a structure is sparsity, i.e. the assumption that

only a few entries x0(i) of the ground truth are non-zero. In many applications such as communication

engineering [43], there is even more prior knowledge on the sparsity, i.e. the signals are hierarchically

sparse.

Deﬁnition 1. [36] Let µ, n, s, σ ∈N. A tensor

w=X

k∈[µ]

ek⊗wk∈Kµ⊗Kn

is (s, σ)-sparse if

•At most sof the blocks wi∈Knare non-zero.

•Each block wiis σ-sparse.

Here {ek}k∈[µ]denotes the standard basis for Kµwith entries ek(j)=1for k=jand ek(j)=0otherwise.

We will also simply refer to was hierarchically sparse.

In slightly diﬀerent terms, we can think of a hierarchically sparse signal as a signal subdivided into

µﬁxed groups, each of sized n. Each such group is either zero, or σ-sparse, and in addition, at most s

groups are non-zero. See Figure 1 for an illustration.

arXiv:2210.11993v3 [cs.IT] 10 Nov 2024

Algorithm 1 (HiHTP)

Require: Measurement operator A:Kµ⊗Kn, measurement vector y, block column sparsity (s, σ)

1: w0= 0

2: repeat

3: Calculate the support Ωk+1 of the best approximation of (wk+A∗(y−Awk)) in the set of (s, σ)-

sparse vectors.

4: wk+1 = argminz∈Kµ⊗Kn{∥y− Az∥,supp(z)⊂Ωk+1}.

5: until stopping criterion is met at ˜

k=k

6: return (s, σ)-sparse vector w˜

Figure 1: Two tensors which are both 9-sparse. However, only the left one is hierarchically sparse – it is

(3,3)-sparse.

1.1 HiHTP

To recover a structured signal from linear measurements, so called hard thresholding pursuit (HTP)

algorithms can be applied. In short, the HTP entails alternately taking gradient descent steps to solve a

least-squares problem and projections onto the set of structured signals. Originally developed for sparse

vectors [15], it was generalized to a considerably more inclusive model, under the moniker of model-based

compressed sensing in [5]. Applying the concept to hierarchically sparse signal yields the hierarchical

hard thresholding pursuit, the HiHTP (Algorithm 1)[36].

The main feature of HiHTP that distinguishes it from the generic model-based compressed sensing

algorithm is that the projection step (line 3 in Algorithm 1) can be performed eﬃciently: To calculate

the best (s, σ)-approximation of a tensor w, one ﬁrst calculates the best σ-sparse approximation to each

block wk, which simply entails selecting the σlargest entries of each block. One then selects the sof the

σ-sparse proxies with the highest ℓ2-norm. The computational complexity of such a projection is O(µn),

which is the same complexity as the projection onto the set of sparse signals. In addition, the ﬁrst phase

of approximations can be highly parallelised, which is not the case for the projection onto sparse signals.

As for any model-based compressed sensing algorithm [5] the HiHTP will converge to a structured

signal under a restricted isometry condition. The standard restricted isometry constants are deﬁned for

sparse signals,

δs(A) = min

w s-sparse

∥w∥≤1∥Aw∥2− ∥w∥2

and the hi-sparse version similarly:

Deﬁnition 2. [36] Let A:Kµ⊗Kn→Km. The (s, σ)-HiRIP constant of Ais given by

δ(s,σ)(A) = min

w(s,σ)-sparse

∥w∥≤1∥Aw∥2− ∥w∥2.

Theorem 1. [36, Theorem 1] If δ3s,2σ<1/√3, the sequence wkdeﬁned by HiHTP for a y=Aw0+efor

a(s, σ)-sparse ground truth w0and error vector e∈Kmsatisﬁes

∥wk−w0∥ ≤ ρk∥w0−w0∥+τ∥e∥,

with ρ= (2δ3s,2σ/(1 −δ2

2s,2σ))1

/2<1,τ=5.15/(1 −ρ). In particular, when the measurements yare noise-free,

wkconverges towards w0at a linear rate.

We will, deliberately imprecisely, say that an operator Ahas the HiRIP, if the HiRIP-constant is

small enough for big enough sizes of sand σto ensure the stable and robust recovery of any (s, σ)-sparse

vector, as in Theorem 1.

In a series of papers [36, 35, 14, 12, 43, 40] by, among others, the authors of this article have shown

that a number of important classes of operators have the HiRIP.

•Gaussian operators. Operators A:Kµ⊗Kn→Kmwhose matrix representations have i.i.d Gaussian

entries, have the HiRIP with high probability when m≳slog(µ) + sσ log(n), where ≳refers to an

inequality up to a universal multiplicative constant.

•Kronecker products. If A∈Km0,µ and B∈Km1,n have s- and σ-RIP constants δs(A), δσ(B),

respectively, the Kronecker product A⊗B:Kµ⊗Kn→Km0⊗Km1has (s, σ)-HiRIP constant

smaller than δs+δσ+δsδσ.

•Hierarchical operators. A hierarchical measurement operator is an operator H:Kµ⊗Kn→Km0⊗

Km1deﬁned as a mixture of matrices (Bℓ)ℓ∈[µ],Bℓ∈Km1,n as follows:

H(w)k=X

ℓ∈[µ]

akℓBℓwℓ

for some matrix A∈Km0,n. If Ahas the s-RIP constant δs(A), and all Bℓhave σ-RIP constants

smaller than δ, the (s, σ)-HiRIP constant of His not larger than δs(A) + δ+δs(A)·δ.

The purpose of this article is to prove that a new class of operators has the (s, σ)-RIP: namely, blind

deconvolution operators.

1.2 Blind deconvolution

The blind deconvolution problem is the problem of determining a ﬁlter h∈Kµand a signal x∈Kµfrom

its (circular) convolution, i.e.

Kµ∋y=h∗x, y(ℓ) = X

k∈[µ]

h(ℓ−k)x(k).

It is standard to lift this bilinear recovery problem to a linear one (a detailed derivation will be presented

later). The resulting problem is severely underdetermined – the µentries of ywill not be enough to

uniquely identify the µ2-dimensional h⊗x. Even the fact that h⊗xis a one-rank tensor is not enough

– additional structural assumptions on hand xare necessary. In this paper, we will assume that his

s-sparse, and that xcan be written as x=Qb for a known matrix Q∈Kµ,n, and b∈Knaσ-sparse

vector. Hence, we are confronted with the bi-sparse version of the blind deconvolution problem.

These sets of assumptions can be motivated using the following simple model of mobile communi-

cations: Alice and Bob are communicating with each other. Bob would like to send a message b∈Kn

to Alice. To do so, he ﬁrst linearly encodes his message in a signal Qb ∈Kµ, and then sends it over a

wireless channel to Alice. Due to scattering in the environment, Alice will not receive Qb, but rather

a superposition of several time- and phase-shifted versions of the signal Qb – i.e., she will observe the

convolution y=h∗(Qb) of Qb with a ﬁlter h∈Kµmodelling the impulse response of the environment.

Since the scattered signal typically reaches the receiver antenna along a few, distinct paths, it is reason-

able to assume that the ﬁlter his sparse [4]. The message bcan on the other hand be rendered sparse

by system design.

Using the lifting approach, the blind deconvolution problem then eﬀectively turns into recovery prob-

lem of the tensor h⊗b. Importantly, if both hand bare sparse, h⊗bis hierarchically sparse: The blocks

of h⊗bare given by h(k)b,k∈[µ] – they are all σ-sparse, and only the svalues for kfor which h(k)̸= 0

corresponds to a non-zero block. This makes it natural to apply the HiHTP algorithm to the problem.

This was done by a subset of the authors in [42], even adding more layers of hierarchy (see Section ??)

due to multiple antennas and users with good empirical success . It was also nicely applied in [40, 41]

as a component of a secure communication scheme. In neither of the papers, theoretical guarantees were

provided, however. To provide such is essential in the security context. If they are not given, one can

e.g. not establish guarantees for provable correctness of the scheme in [40, 41].

1.3 The main result

Let us present a, somewhat simpliﬁed, version of the main result of this article.

Main Result. Consider the operator Ccorresponding to the blind deconvolution measurement (h, b)7→

h∗(Qb), where Q∈Kµ,n is a Gaussian matrix. If

µ≳slog(s)2log(µ) log(µn) + sσ log(n),

Cwith high probability have the (s, σ)-HiRIP.

Disregarding log-terms, the blind-deconvolution operators hence have the HiRIP under the same

sample complexity assumptions as the considerably simple Gaussian ones. Counting degrees of freedoms

in describing a (s, σ)-sparse vector, it is also the sample optimal one.

Remark 1. It is important to consider that with regards to the bisparse recovery problem, a sampling

complexity of sσ measurement far from optimal. Counting degrees of freedom in the two vectors hand b,

one would instead hope for s+σmeasurements to be enough. In other words, h⊗bhas more structure

than only being (s, σ)-sparse, that one intuitively would like to take advantage of. However, if one does

not make additional assumptions on hand b, a sample complexity of sσ ·polylog(µ, n)is actually the

state of the art for feasible algorithms[18]. We will give a detailed explanation of this in Section 2.

1.4 Extensions

A feature of the hierarchical approach to sparse deconvolution problems is that it readily generalizes to

more complex settings, e.g. arising in multi-user communications. In this paper also discuss how to

solve a combined deconvolution and demixing problem using the hierarchically sparse framework: Given

Mmeasurements of the form

yq=X

p∈[N]

dq,php∗(Qpbp),

where dq,p ∈K, q ∈[M], p ∈[N] are ‘mixing factors’, recover the collection of Nﬁlter-signal pairs

(hp, bp), p ∈[N]. This problem, e.g., appears when users are separated both through their spatial angles

and their multipath delay as in [43]. Under the assumption that only Sﬁlters messages are non-zero, we

can interpret this again as a hierarchical sparse recovery problem (albeit now in three levels of hierarchy).

Using recent results from [14], we show that under the same assumptions on the individual Qpas above,

HiHTP can recover all signals and all ﬁlters from M≳Slog(N) mixtures.

1.5 Outline of the paper

We begin our paper with a detailed literature review of diﬀerent computational approaches to the blind

deconvolution problem in Section 2. In Section 3, we give a formal introduction to our approach and our

assumptions, and also formally state the main result, Theorem 2. We also brieﬂy discuss the multi-user

setup there. Section 4 is in its entirety devoted to proving the main result. In Section 5, we perform a

small numerical experiment to validate our theory.

Notation Most of the notation is either standard or will be introduced at ﬁrst use. For a vector v∈Fn,

we write v(i) for its i:th entry. We use ∥·∥to denote the Euclidean norm for both vectors and tensors,

∥·∥Ffor the Frobenius norm of a matrix, and ∥·∥2→2for the induced ℓ2→ℓ2-operator norm of a

matrix.

2 Literature Review

In this section, we give a literature review of previous approaches to the blind deconvolution problem in

general, and sparse blind deconvolution in particular. The approaches can broadly be divided into two

categories:

•As was described in the intro, one can identify the bilinear map γ(h, b) = h∗Qb on Kµ×Knwith

a linear map Con Kµ⊗Kn. One can then minimize

K:Kµ⊗Kn→R, w 7→ K(w) = K(y, C(w), w).

where Krefers to some suitable loss function. This procedure is generally referred to as lifting, and

we will therefore refer to this class of methods as lifted methods. HiHTP is within this class.

•One can also directly minimize a function of the form

L:Kµ×Kn→R,(h, b)7→ L(h, b) = L(y, h ∗(Qb), h, b),

deﬁned in terms of a suitable loss function L. Let us call such methods direct.

We will give a brief overview of the relevant results from the literature about methods from both

categories. Throughout the discussion, it is instructive to remember that the optimal sample complexity

(i.e. the number of needed observations for injectivity) is given by µ+nin the dense setting and s+σ

in the sparse setting [20, 26].

Lifted approaches As for lifted approaches, it is crucial to understand that the lifted ground truth

signal h0⊗b0or equivalently its matrix form h0b∗

0, where b∗

0denotes the Hermitian transpose of b0,

simultaneously enjoys two structures: it has rank one and is sparse (in a structured manner). Each of

these structures can individually be exploited using convex regularizers – if we take the low-rank property

as the crucial one, the nuclear norm is the canonical regularizer [2, 28, 19], whereas if the sparsity is taken

as the crucial property, the ℓ1-norm [27] or its relative, the ℓ1,2-norm [13] should be used. While the

convex approaches are stable, mathematically pleasing and globally solveable by oﬀ-the-shelf algorithms,

they are computationally intensive and, thus, often impractical. Furthermore, recovery guarantees are

only known for generic subspace models. Under similar assumption of known supports as above, the

nuclear norm approach recovers the ground truth when µ≳max(s, σ) log(µ)2[28, 19, 9]. However, not

assuming know supports, that guarantee degenerates to µ≳max(µ, n) log(µ)2. The sparse models (ℓ1and

ℓ1,2) recover the ground truth with high probability when µ≳sσ log(sn) log(µ)2under the assumption

that the support of his known [27, 13]. If the support of his not known, the guarantee again degenerates

to µ≳µσ log(n) log(µ)2.

There also is a natural non-convex-way to solve the lifted problem: a gradient descent projected onto

the set of (bi)-sparse and low-rank matrices. As is thoroughly discussed in [16], there is however no

practical algorithm available to compute this projection. The bi-sparse unit-rank projection is known as

the sparse PCA problem, that has been shown to be worst-case NP-hard, as well as under approximation

and on average [29, 8, 7]. A canonical way to circumvent this obstacle is to alternate between projections

onto the two sets. This approach is for instance investigated in [11]. There, a local convergence guarantee

is presented under optimal sample complexity using the considerably simpler measurement model with a

Gaussian linear map without the structure of the convolution. A global one is not.

In this context, [3] should also be mentioned – there the authors obtain global convergence in just

two alternations steps, however by assuming a nested measurement structure tailor-made for a jointly

low-rank and sparse setting, which is often not given.

Finally, in [1], a diﬀerent but related problem was considered. In the paper, they consider the case

of several convolutions yi=h∗(Qibi), i ∈[N] are independently observed. Importantly, the authors

do not to assume that the support of his known, only that it is sparse in a basis Bwith a certain

incoherence. However, they instead assume that the supports of the biare known, and that the Qi

are independently Gaussian distributed. Under those assumptions, they achieve recovery (through a

nuclear norm minimization approach) of both hand all bialready when µ≳(σ+slog(s)2) log(µN)4and

N≳log(Nµ). If we drop the assumption of known supports of the bi, the guarantee again degenerates,

to µ≳(µ+slog(s)) log(µs)2.

Direct approaches Among direct approaches, alternating minimization, or sparse power factorization,

is probably the most prominent. It was introduced in [31] as a means for solving the so-called phase-

retrieval problem and later adapted to blind deconvolution in [22, 23]. Alternating minimization generally

refers to taking turns in solving the following two problems,

minimize

h(sparse) L(h, b′) and minimize

b(sparse) L(h′, b).

Since the convolution is linear in each argument, each subproblem is eﬀectively a classical compressed

sensing problem, and can be solved using a number of techniques, e.g. iterative hard thresholding [15] or

CoSAMP [30].

As for the question of success of alternating minimization, the authors of [22, 23] give a recovery

guarantee in a special setting: First, they use a generic subspace model, where hhas a representation

h=Bg with some s-sparse vector g, rather than being sparse itself, and Bis assumed to be a matrix with

random Gaussian entries. The measurement matrix Qis also assumed to be Gaussian. An additional

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

BisparseBlindDeconvolutionthroughHierarchicalSparseRecoveryAxelFlinth1,IngoRoth2,3,andGerhardWunder31DepartmentofMathematicsandMathematicalStatistics,Ume˚aUniversity,Sweden2QuantumCentre,TechnologyInnovationInstitute,AbuDhab,UAE3DahlemCenterforComplexQuantumSystems,FreieUniversit¨atBerlin,GermanyNov...

展开>> 收起<<

Bisparse Blind Deconvolution through Hierarchical Sparse Recovery Axel Flinth1 Ingo Roth23 and Gerhard Wunder3.pdf

共24页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Bisparse Blind Deconvolution through Hierarchical Sparse Recovery Axel Flinth1 Ingo Roth23 and Gerhard Wunder3

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: