Sketching Meets Differential Privacy Fast Algorithm for Dynamic Kronecker Projection Maintenance Zhao SongXin YangYuanyuan YangLichen Zhang

2025-04-24 0 0 1.01MB 56 页 10玖币

侵权投诉

Sketching Meets Diﬀerential Privacy:

Fast Algorithm for Dynamic Kronecker Projection Maintenance∗

Zhao Song†Xin Yang‡Yuanyuan Yang§Lichen Zhang¶

Abstract

Projection maintenance is one of the core data structure tasks. Eﬃcient data structures

for projection maintenance have led to recent breakthroughs in many convex programming

algorithms. In this work, we further extend this framework to the Kronecker product structure.

Given a constraint matrix Aand a positive semi-deﬁnite matrix W∈Rn×nwith a sparse

eigenbasis, we consider the task of maintaining the projection in the form of B⊤(BB⊤)−1B,

where B=A(W⊗I) or B=A(W1/2⊗W1/2). At each iteration, the weight matrix Wreceives

a low rank change and we receive a new vector h. The goal is to maintain the projection matrix

and answer the query B⊤(BB⊤)−1Bhwith good approximation guarantees. We design a fast

dynamic data structure for this task and it is robust against an adaptive adversary. Following the

beautiful and pioneering work of [Beimel, Kaplan, Mansour, Nissim, Saranurak and Stemmer,

STOC’22], we use tools from diﬀerential privacy to reduce the randomness required by the data

structure and further improve the running time.

∗A preliminary version of this paper appeared at ICML 2023.

†zsong@adobe.com. Adobe Research.

‡yx1992@cs.washington.edu. University of Washington. Supported in part by NSF grant No. CCF-2006359.

§yyangh@cs.washington.edu. University of Washington. Supported by NSF grant No. CCF-2045402 and NSF

grant No. CCF-2019844.

¶lichenz@mit.edu. MIT. Supported by NSF grant No. CCF-1955217 and NSF grant No. CCF-2022448.

arXiv:2210.11542v3 [cs.DS] 6 Sep 2024

1 Introduction

Projection maintenance is one of the most important data structure problems in recent years.

Many convex optimization algorithms that give the state-of-the-art running time heavily rely on

an eﬃcient and robust projection maintenance data structure [CLS19,LSZ19,JLSW20,JKL+20,

HJS+22b]. Let B∈Rm×n, consider the projection matrix P=B(B⊤B)−1B⊤. The projection

maintenance task aims for the design of a data structure with the following guarantees: it can

preprocess and compute an initial projection. At each iteration, Breceives a low rank or sparse

change, and the data structure needs to update Bto reﬂect these changes. It will then be asked to

approximately compute the matrix-vector product, between the updated Pand an online vector h.

For example, in linear programming, one sets B=√W A, where A∈Rm×nis the constraint matrix

and Wis the diagonal slack matrix. Each iteration, Wreceives relatively small perturbations.

Then, the data structure needs to output an approximate vector to √W A(A⊤W A)−1A⊤√W h, for

an online vector h∈Rn.

In this work, we consider a speciﬁc type of projection maintenance problem. Concretely, our ma-

trix B=A(W⊗I) or B=A(W1/2⊗W1/2), where W∈Rn×nis a positive semideﬁnite matrix and

A∈Rm×n2is a matrix whose i-th row is the vectorization of an n×nmatrix. We call the problem

for maintaining such kind of matrices, the Dynamic Kronecker Product Projection Maintenance

Problem. Maintaining the Kronecker product projection matrix has important implications for

solving semi-deﬁnite programs using interior point method [JKL+20,HJS+22b,HJS+22a,GS22].

Speciﬁcally, one has mconstraint matrices A1, . . . , Am∈Rn×n, and Ais constructed by vector-

ization each of the Ai’s as its rows. The matrix Wis typically the complementary slack of the

program. In many cases, the constraint matrices A1, . . . , Amare simultaneously diagonalizable,

meaning that there exists a matrix Psuch that P⊤AiPis diagonal. This leads to the matrix W

also being simultaneously diagonalizable, which implies potential faster algorithms for this kind of

SDP [JL16]. We study this setting in a more abstract and general fashion: suppose A1, . . . , Amare

ﬁxed. The sequence of update matrices W(0), W (1), . . . , W (T)∈Rn×nshare the same eigenbasis.

The data structure design problem can be decomposed into 2 parts: 1). How to update the

projection matrix fast, and 2). How to answer the query eﬃciently. For the update portion, we

leverage the fact that the updates are relatively small. Hence, by updating the inverse part of

projection in a lazy fashion, we can give a fast update algorithm.

For the query portion, it is similar to the Online Matrix Vector Multiplication Problem [HKNS15,

LW17,CKL18], with a changing matrix-to-multiply. To speed up this process, prior works either

use importance sampling to sparsify the vector h(t), or using sketching matrices to reduce the di-

mension. For the latter, the idea is to prepare multiple instances of sketching matrices beforehand,

and batch-multiplying them with the initial projection matrix P(0). Let R∈Rn2×T b denote the

batched sketching matrices, where bis the sketching dimension for each matrix, one prepares the

matrix PR. At each query phase, one only needs to use one sketching matrix, R(t), and compute

(P(R(t))⊤)R(t)h. By computing this product from right to left, we eﬀectively reduce the running

time from O(n4) to O(n2b). One signiﬁcant drawback of this approach is that, if the number of

iterations Tis large, then the preprocessing and update phase become less eﬃcient due to the sheer

number of sketching matrices.

We observe that the fundamental reason that applying one uniform sketch for all iterations

will fail is due to the dependence between the current output and all previous inputs: When using

the projection maintenance data structure in an iterative process, the new input query is typically

formed by a combination of the previous outputs. This means that our data structure should be

robust against an adaptive adversary. Such an adversary can infer the randomness from observing

the output of the data structure and design new input to the data structure. Prior works combat

this issue by using a uniformly-chosen sketching matrix that won’t be used again.

To make our data structure both robust against an adaptive adversary and reduce the number

of sketches to use, we adapt a diﬀerential privacy framework as in the fantastic work [BKM+22].

Given a data structure against an oblivious adversary that outputs a real number, the pioneering

work [BKM+22] proves that it is enough to use e

O(√T) data structures instead of Tfor adaptive

adversary, while the runtime is only blew up by a polylogarithmic factor. However, their result is

not immediately useful for our applications, since we need an approximate vector with n2numbers.

We generalize their result to n2dimension by applying the strong composition theorem, which gives

rise to e

O(n√T). While not directly applicable to the SDP problem, we hope the diﬀerential privacy

framework we develop could be useful for applications when n < √T, i.e., problems require a large

number of iterations. Due to the tightness of strong composition [KOV15], we conjecture our result

is essentially tight. If one wants to remove the ndependence in the number of sketches, one might

need resort to much more sophisticated machinery such as diﬀerentially private mean estimation

in nearly-linear time. We further abstract the result as a generic set query data structure, with the

number of sketches required scaling with the number of coordinates one wants to output.

Nevertheless, we develop a primal-dual framework based on lazy update and amortization,

that improves upon the current state-of-the-art general SDP solver [HJS+22b] for simultaneously

diagonalizable constraints under a wide range of parameters.

We start with deﬁning notations to simplify further discussions.

Deﬁnition 1.1 (Time complexity for preprocessing, update and query).Let B(0), B(1), . . . , B(T)∈

Rm×nbe an online sequence of matrices and h(1), . . . , h(T)∈Rnbe an online sequence of vectors.

•We deﬁne Tprep(m, n, s, b) as the preprocessing time of a dynamic data structure with input

matrix B∈Rm×n, sketching dimension band the number of sketches s.

•We deﬁne Tupdate(m, n, s, b, ε) as the update time of a dynamic data structure with update

matrix B(t)∈Rm×n, sketching dimension band the number of sketches s. This operation

should update the projection to be (1 ±ε) spectral approximation.

•We deﬁne Tquery(m, n, b, ε, δ) as the query time of a dynamic data structure with query vector

h∈Rn, sketching dimension band the number of sketches s. This operation should output

a vector e

h(t)∈Rnsuch that for any i∈[n], |e

h(t)

i−(P(t)h(t))i| ≤ ε∥h(t)∥2with probability at

least 1 −δ.

Now, we turn to the harder problem of maintaining a Kronecker product projection matrix.

While the update matrices are positive semi-deﬁnite instead of diagonal, we note that they still

share the same eigenbasis. Essentially, the updates are a sequence of diagonal eigenvalues, and we

can leverage techniques from prior works [CLS19,LSZ19,SY21] with lazy updates.

Our data structure also relies on using sketches to speed up the query phase. We present an

informal version of our result below.

Theorem 1.2. Let A∈Rm×n2and B=A(W⊗I)or B=A(W1/2⊗W1/2). Let R∈Rsb×n2

be a batch of ssketching matrices, each with dimension b. Given a sequence of online matrices

W(1), . . . , W (T)∈Rn×nwhere W(t)=UΛ(t)U⊤and online vectors h(1), . . . , h(T)∈Rn2. The data

structure has the following operations:

•Init: The data structure preprocesses and generates an initial projection matrix in time

mnω+mω+Tmat(m, m, n2) + Tmat(n2, n2, sb).

•Update(W): the data structure updates and maintains a projection e

Psuch that

(1 −ε)·P⪯e

P⪯(1 + ε)·P,

where Pis the projection matrix updated by W. Moreover, if

i=1

(ln λi(W)−ln λi(Wold))2≤C2,

the expected time per call of Update(W)is

C/ε2·(max{nf(a,c)+ω−2.5+nf(a,c)−a/2,

nf(a,c)+ω−4.5sb +nf(a,c)−2−a/2sb}).

•Query(h): the data structure outputs e

P R⊤

lRl·h. If nnz(U) = O(n1.5+a/2), then it takes time

n3+a+n2+b.

Here, a∈(0,1) is a parameter that can be chosen and f(a, c)∈[4,5) is a function deﬁned as in

Def. B.14.

For the sake of discussion, let us consider the parameter setting

m=n2, b =e

O(n1.5/ε2), ω = 2, f(a, c)=4, T =m1/4

and C/ε2=O(1). Then the running time simpliﬁes to

•Preprocessing in mωtime.

•Update in m1.75 +m2−a/4time.

•Query in m1.5+a/2time.

Since there are m1/4iterations in total, as long as we ensure a < 1, we obtain an algorithm with

an overall runtime of mω+o(m2+1/4), which presents an improvement over [HJS+22b].

Roadmap. In Section 2, we present our related work on sketching, diﬀerential privacy and pro-

jection maintenance. In Section 3, we provide the basic notations of this paper, preliminaries on

Kronecker product calculation, and the formulation of the data structure design problem. In Sec-

tion 4, we provide an overview on techniques we used in this paper. In Section 5, we provide the

description of our algorithm , running time analysis and proof of correctness of Kronecker Product

Maintenance Data Structure. In Section 6, we present the deﬁnition of the set query problem, the

set query estimation data structure and its analysis of correctness and approximation guarantee.

2 Related Work

Sketching. Sketching is a fundamental tool and has many applications in machine learning

and beyond, such as linear regression, low-rank approximation [CW13,NN13,MM13,BW14,

RSW16,SWZ17,ALS+18,SWZ19], distributed problems [WZ16,BWZ16], reinforcement learn-

ing [WZD+20], projected gradient descent [XSS21], tensor decomposition [SWZ19], clustering

[EMZ21], signal interpolation [SSWZ22], distance oracles [DSWZ22], generative adversarial net-

works [XZZ18], training neural networks [LSS+20,BPSW21,SYZ21,SZZ21,HSWZ22], matrix com-

pletion [GSYZ23], matrix sensing [DLS23b,QSZ23], attention scheme inspired regression [LSZ23,

DLS23a,LSX+23,GSY23b,GMS23], sparsiﬁcation of attention matrix [DMS23], discrepancy

minimization [DSW22], dynamic tensor regression problem [RSZ22], John Ellipsoid computation

[SYYZ22], NLP tasks [LSW+20], total least square regression [DSWY19].

Diﬀerential Privacy. First introduced in [DKM+06], diﬀerential privacy has been playing an

important role in providing theoretical privacy guarantees for enormous algorithms [DR14], for

example, robust learning a mixture of Gaussians [KSSU19], hypothesis selection [BKSW21], hy-

perparameter selection [MSH+22], convex optimization [KLZ22], ﬁrst-order method [SVK21] and

mean estimation [KSU20,HKM22]. Techniques from diﬀerential privacy are also widely studied and

applied in machine learning [CM08,WM10,JE19,TF20], deep neural networks [ACG+16,BPS19],

computer vision [ZYCW20,LWAFF21,TKP19], natural language processing [YDW+21,WK18],

large language models [GSY23a,YNB+22], label protection in split learning [YSY+22], multiple

data release [WYY+22], federated learning [SYY+22] and peer review [DKWS22]. Recent works

also show that robust statistical estimator implies diﬀerential privacy [HKMN23].

Projection Maintenance Data Structure. The design of eﬃcient projection maintenance data

structure is a core step that lies in many optimization problems, such as linear programming [Vai89b,

CLS19,Son19,LSZ19,Bra20,SY21,JSWZ21,BLSS20,Ye20,DLY21,GS22], cutting plane method

[Vai89a,JLSW20], integral minimization [JLSZ23], empirical risk minimization [LSZ19,QSZZ23],

semideﬁnite programming [JLSW20,JKL+20,HJS+22b,HJS+22a,GS22], dynamic least-square

regression [JPW23], large language models [BSZ23], and sum-of-squares optimization [JNW22].

3 Preliminaries & Problem Formulation

In Section 3.1, we introduce the basic notations that we will use in the remainder of the paper. In

Section 3.2, we describe the data structure design problem of our paper.

3.1 Notations and Basic Deﬁnitions.

For any integer n > 0, let [n] denote the set {1,2,··· , n}. Let Pr[·] denote probability and E[·]

denote expectation. We use ∥x∥2to denote the ℓ2norm of a vector x. We use N(µ, σ2) to denote

the Gaussian distribution with mean µand variance σ2. We use e

O(f(n)) to denote O(f(n)·

poly log(f(n)). We use Tmat(m, n, k) to denote the time for matrix multiplication for matrix with

dimension m×nand matrix with dimension n×k. We denote ω≈2.38 as the current matrix

multiplication exponent, i.e., Tmat(n, n, n) = nω. We denote α≈0.31 as the dual exponent of

matrix multiplication.

We use ∥A∥and ∥A∥Fto denote the spectral norm and the Frobenius norm of matrix A,

respectively. We use A⊤to denote the transpose of matrix A. We use Imto denote the identity

matrix of size m×m. For αbeing a vector or matrix, we use ∥α∥0to denote the number of

nonzero entries of α. Given a real square matrix A, we use λmax(A) and λmin(A) to denote its

largest and smallest eigenvalue, respectively. Given a real matrix A, we use σmax(A) and σmin(A)

to denote its largest and smallest singular value, respectively. We use A−1to denote the matrix

inverse for matrix A. For a square matrix A, we use tr[A] to denote the trace of A. We use band

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

SketchingMeetsDifferentialPrivacy:FastAlgorithmforDynamicKroneckerProjectionMaintenance∗ZhaoSong†XinYang‡YuanyuanYang§LichenZhang¶AbstractProjectionmaintenanceisoneofthecoredatastructuretasks.Efficientdatastructuresforprojectionmaintenancehaveledtorecentbreakthroughsinmanyconvexprogrammingalgorithms...

展开>> 收起<<

Sketching Meets Differential Privacy Fast Algorithm for Dynamic Kronecker Projection Maintenance Zhao SongXin YangYuanyuan YangLichen Zhang.pdf

共56页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Sketching Meets Differential Privacy Fast Algorithm for Dynamic Kronecker Projection Maintenance Zhao SongXin YangYuanyuan YangLichen Zhang

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: