Sketching Meets Differential Privacy Fast Algorithm for Dynamic Kronecker Projection Maintenance Zhao SongXin YangYuanyuan YangLichen Zhang

2025-04-24 0 0 1.01MB 56 页 10玖币
侵权投诉
Sketching Meets Differential Privacy:
Fast Algorithm for Dynamic Kronecker Projection Maintenance
Zhao SongXin YangYuanyuan Yang§Lichen Zhang
Abstract
Projection maintenance is one of the core data structure tasks. Efficient data structures
for projection maintenance have led to recent breakthroughs in many convex programming
algorithms. In this work, we further extend this framework to the Kronecker product structure.
Given a constraint matrix Aand a positive semi-definite matrix WRn×nwith a sparse
eigenbasis, we consider the task of maintaining the projection in the form of B(BB)1B,
where B=A(WI) or B=A(W1/2W1/2). At each iteration, the weight matrix Wreceives
a low rank change and we receive a new vector h. The goal is to maintain the projection matrix
and answer the query B(BB)1Bhwith good approximation guarantees. We design a fast
dynamic data structure for this task and it is robust against an adaptive adversary. Following the
beautiful and pioneering work of [Beimel, Kaplan, Mansour, Nissim, Saranurak and Stemmer,
STOC’22], we use tools from differential privacy to reduce the randomness required by the data
structure and further improve the running time.
A preliminary version of this paper appeared at ICML 2023.
zsong@adobe.com. Adobe Research.
yx1992@cs.washington.edu. University of Washington. Supported in part by NSF grant No. CCF-2006359.
§yyangh@cs.washington.edu. University of Washington. Supported by NSF grant No. CCF-2045402 and NSF
grant No. CCF-2019844.
lichenz@mit.edu. MIT. Supported by NSF grant No. CCF-1955217 and NSF grant No. CCF-2022448.
arXiv:2210.11542v3 [cs.DS] 6 Sep 2024
1 Introduction
Projection maintenance is one of the most important data structure problems in recent years.
Many convex optimization algorithms that give the state-of-the-art running time heavily rely on
an efficient and robust projection maintenance data structure [CLS19,LSZ19,JLSW20,JKL+20,
HJS+22b]. Let BRm×n, consider the projection matrix P=B(BB)1B. The projection
maintenance task aims for the design of a data structure with the following guarantees: it can
preprocess and compute an initial projection. At each iteration, Breceives a low rank or sparse
change, and the data structure needs to update Bto reflect these changes. It will then be asked to
approximately compute the matrix-vector product, between the updated Pand an online vector h.
For example, in linear programming, one sets B=W A, where ARm×nis the constraint matrix
and Wis the diagonal slack matrix. Each iteration, Wreceives relatively small perturbations.
Then, the data structure needs to output an approximate vector to W A(AW A)1AW h, for
an online vector hRn.
In this work, we consider a specific type of projection maintenance problem. Concretely, our ma-
trix B=A(WI) or B=A(W1/2W1/2), where WRn×nis a positive semidefinite matrix and
ARm×n2is a matrix whose i-th row is the vectorization of an n×nmatrix. We call the problem
for maintaining such kind of matrices, the Dynamic Kronecker Product Projection Maintenance
Problem. Maintaining the Kronecker product projection matrix has important implications for
solving semi-definite programs using interior point method [JKL+20,HJS+22b,HJS+22a,GS22].
Specifically, one has mconstraint matrices A1, . . . , AmRn×n, and Ais constructed by vector-
ization each of the Ai’s as its rows. The matrix Wis typically the complementary slack of the
program. In many cases, the constraint matrices A1, . . . , Amare simultaneously diagonalizable,
meaning that there exists a matrix Psuch that PAiPis diagonal. This leads to the matrix W
also being simultaneously diagonalizable, which implies potential faster algorithms for this kind of
SDP [JL16]. We study this setting in a more abstract and general fashion: suppose A1, . . . , Amare
fixed. The sequence of update matrices W(0), W (1), . . . , W (T)Rn×nshare the same eigenbasis.
The data structure design problem can be decomposed into 2 parts: 1). How to update the
projection matrix fast, and 2). How to answer the query efficiently. For the update portion, we
leverage the fact that the updates are relatively small. Hence, by updating the inverse part of
projection in a lazy fashion, we can give a fast update algorithm.
For the query portion, it is similar to the Online Matrix Vector Multiplication Problem [HKNS15,
LW17,CKL18], with a changing matrix-to-multiply. To speed up this process, prior works either
use importance sampling to sparsify the vector h(t), or using sketching matrices to reduce the di-
mension. For the latter, the idea is to prepare multiple instances of sketching matrices beforehand,
and batch-multiplying them with the initial projection matrix P(0). Let RRn2×T b denote the
batched sketching matrices, where bis the sketching dimension for each matrix, one prepares the
matrix PR. At each query phase, one only needs to use one sketching matrix, R(t), and compute
(P(R(t)))R(t)h. By computing this product from right to left, we effectively reduce the running
time from O(n4) to O(n2b). One significant drawback of this approach is that, if the number of
iterations Tis large, then the preprocessing and update phase become less efficient due to the sheer
number of sketching matrices.
We observe that the fundamental reason that applying one uniform sketch for all iterations
will fail is due to the dependence between the current output and all previous inputs: When using
the projection maintenance data structure in an iterative process, the new input query is typically
formed by a combination of the previous outputs. This means that our data structure should be
robust against an adaptive adversary. Such an adversary can infer the randomness from observing
the output of the data structure and design new input to the data structure. Prior works combat
1
this issue by using a uniformly-chosen sketching matrix that won’t be used again.
To make our data structure both robust against an adaptive adversary and reduce the number
of sketches to use, we adapt a differential privacy framework as in the fantastic work [BKM+22].
Given a data structure against an oblivious adversary that outputs a real number, the pioneering
work [BKM+22] proves that it is enough to use e
O(T) data structures instead of Tfor adaptive
adversary, while the runtime is only blew up by a polylogarithmic factor. However, their result is
not immediately useful for our applications, since we need an approximate vector with n2numbers.
We generalize their result to n2dimension by applying the strong composition theorem, which gives
rise to e
O(nT). While not directly applicable to the SDP problem, we hope the differential privacy
framework we develop could be useful for applications when n < T, i.e., problems require a large
number of iterations. Due to the tightness of strong composition [KOV15], we conjecture our result
is essentially tight. If one wants to remove the ndependence in the number of sketches, one might
need resort to much more sophisticated machinery such as differentially private mean estimation
in nearly-linear time. We further abstract the result as a generic set query data structure, with the
number of sketches required scaling with the number of coordinates one wants to output.
Nevertheless, we develop a primal-dual framework based on lazy update and amortization,
that improves upon the current state-of-the-art general SDP solver [HJS+22b] for simultaneously
diagonalizable constraints under a wide range of parameters.
We start with defining notations to simplify further discussions.
Definition 1.1 (Time complexity for preprocessing, update and query).Let B(0), B(1), . . . , B(T)
Rm×nbe an online sequence of matrices and h(1), . . . , h(T)Rnbe an online sequence of vectors.
We define Tprep(m, n, s, b) as the preprocessing time of a dynamic data structure with input
matrix BRm×n, sketching dimension band the number of sketches s.
We define Tupdate(m, n, s, b, ε) as the update time of a dynamic data structure with update
matrix B(t)Rm×n, sketching dimension band the number of sketches s. This operation
should update the projection to be (1 ±ε) spectral approximation.
We define Tquery(m, n, b, ε, δ) as the query time of a dynamic data structure with query vector
hRn, sketching dimension band the number of sketches s. This operation should output
a vector e
h(t)Rnsuch that for any i[n], |e
h(t)
i(P(t)h(t))i| ≤ εh(t)2with probability at
least 1 δ.
Now, we turn to the harder problem of maintaining a Kronecker product projection matrix.
While the update matrices are positive semi-definite instead of diagonal, we note that they still
share the same eigenbasis. Essentially, the updates are a sequence of diagonal eigenvalues, and we
can leverage techniques from prior works [CLS19,LSZ19,SY21] with lazy updates.
Our data structure also relies on using sketches to speed up the query phase. We present an
informal version of our result below.
Theorem 1.2. Let ARm×n2and B=A(WI)or B=A(W1/2W1/2). Let RRsb×n2
be a batch of ssketching matrices, each with dimension b. Given a sequence of online matrices
W(1), . . . , W (T)Rn×nwhere W(t)=UΛ(t)Uand online vectors h(1), . . . , h(T)Rn2. The data
structure has the following operations:
Init: The data structure preprocesses and generates an initial projection matrix in time
mnω+mω+Tmat(m, m, n2) + Tmat(n2, n2, sb).
2
Update(W): the data structure updates and maintains a projection e
Psuch that
(1 ε)·Pe
P(1 + ε)·P,
where Pis the projection matrix updated by W. Moreover, if
n
X
i=1
(ln λi(W)ln λi(Wold))2C2,
the expected time per call of Update(W)is
C2·(max{nf(a,c)+ω2.5+nf(a,c)a/2,
nf(a,c)+ω4.5sb +nf(a,c)2a/2sb}).
Query(h): the data structure outputs e
P R
lRl·h. If nnz(U) = O(n1.5+a/2), then it takes time
n3+a+n2+b.
Here, a(0,1) is a parameter that can be chosen and f(a, c)[4,5) is a function defined as in
Def. B.14.
For the sake of discussion, let us consider the parameter setting
m=n2, b =e
O(n1.52), ω = 2, f(a, c)=4, T =m1/4
and C2=O(1). Then the running time simplifies to
Preprocessing in mωtime.
Update in m1.75 +m2a/4time.
Query in m1.5+a/2time.
Since there are m1/4iterations in total, as long as we ensure a < 1, we obtain an algorithm with
an overall runtime of mω+o(m2+1/4), which presents an improvement over [HJS+22b].
Roadmap. In Section 2, we present our related work on sketching, differential privacy and pro-
jection maintenance. In Section 3, we provide the basic notations of this paper, preliminaries on
Kronecker product calculation, and the formulation of the data structure design problem. In Sec-
tion 4, we provide an overview on techniques we used in this paper. In Section 5, we provide the
description of our algorithm , running time analysis and proof of correctness of Kronecker Product
Maintenance Data Structure. In Section 6, we present the definition of the set query problem, the
set query estimation data structure and its analysis of correctness and approximation guarantee.
2 Related Work
Sketching. Sketching is a fundamental tool and has many applications in machine learning
and beyond, such as linear regression, low-rank approximation [CW13,NN13,MM13,BW14,
RSW16,SWZ17,ALS+18,SWZ19], distributed problems [WZ16,BWZ16], reinforcement learn-
ing [WZD+20], projected gradient descent [XSS21], tensor decomposition [SWZ19], clustering
3
[EMZ21], signal interpolation [SSWZ22], distance oracles [DSWZ22], generative adversarial net-
works [XZZ18], training neural networks [LSS+20,BPSW21,SYZ21,SZZ21,HSWZ22], matrix com-
pletion [GSYZ23], matrix sensing [DLS23b,QSZ23], attention scheme inspired regression [LSZ23,
DLS23a,LSX+23,GSY23b,GMS23], sparsification of attention matrix [DMS23], discrepancy
minimization [DSW22], dynamic tensor regression problem [RSZ22], John Ellipsoid computation
[SYYZ22], NLP tasks [LSW+20], total least square regression [DSWY19].
Differential Privacy. First introduced in [DKM+06], differential privacy has been playing an
important role in providing theoretical privacy guarantees for enormous algorithms [DR14], for
example, robust learning a mixture of Gaussians [KSSU19], hypothesis selection [BKSW21], hy-
perparameter selection [MSH+22], convex optimization [KLZ22], first-order method [SVK21] and
mean estimation [KSU20,HKM22]. Techniques from differential privacy are also widely studied and
applied in machine learning [CM08,WM10,JE19,TF20], deep neural networks [ACG+16,BPS19],
computer vision [ZYCW20,LWAFF21,TKP19], natural language processing [YDW+21,WK18],
large language models [GSY23a,YNB+22], label protection in split learning [YSY+22], multiple
data release [WYY+22], federated learning [SYY+22] and peer review [DKWS22]. Recent works
also show that robust statistical estimator implies differential privacy [HKMN23].
Projection Maintenance Data Structure. The design of efficient projection maintenance data
structure is a core step that lies in many optimization problems, such as linear programming [Vai89b,
CLS19,Son19,LSZ19,Bra20,SY21,JSWZ21,BLSS20,Ye20,DLY21,GS22], cutting plane method
[Vai89a,JLSW20], integral minimization [JLSZ23], empirical risk minimization [LSZ19,QSZZ23],
semidefinite programming [JLSW20,JKL+20,HJS+22b,HJS+22a,GS22], dynamic least-square
regression [JPW23], large language models [BSZ23], and sum-of-squares optimization [JNW22].
3 Preliminaries & Problem Formulation
In Section 3.1, we introduce the basic notations that we will use in the remainder of the paper. In
Section 3.2, we describe the data structure design problem of our paper.
3.1 Notations and Basic Definitions.
For any integer n > 0, let [n] denote the set {1,2,··· , n}. Let Pr[·] denote probability and E[·]
denote expectation. We use x2to denote the 2norm of a vector x. We use N(µ, σ2) to denote
the Gaussian distribution with mean µand variance σ2. We use e
O(f(n)) to denote O(f(n)·
poly log(f(n)). We use Tmat(m, n, k) to denote the time for matrix multiplication for matrix with
dimension m×nand matrix with dimension n×k. We denote ω2.38 as the current matrix
multiplication exponent, i.e., Tmat(n, n, n) = nω. We denote α0.31 as the dual exponent of
matrix multiplication.
We use Aand AFto denote the spectral norm and the Frobenius norm of matrix A,
respectively. We use Ato denote the transpose of matrix A. We use Imto denote the identity
matrix of size m×m. For αbeing a vector or matrix, we use α0to denote the number of
nonzero entries of α. Given a real square matrix A, we use λmax(A) and λmin(A) to denote its
largest and smallest eigenvalue, respectively. Given a real matrix A, we use σmax(A) and σmin(A)
to denote its largest and smallest singular value, respectively. We use A1to denote the matrix
inverse for matrix A. For a square matrix A, we use tr[A] to denote the trace of A. We use band
4
摘要:

SketchingMeetsDifferentialPrivacy:FastAlgorithmforDynamicKroneckerProjectionMaintenance∗ZhaoSong†XinYang‡YuanyuanYang§LichenZhang¶AbstractProjectionmaintenanceisoneofthecoredatastructuretasks.Efficientdatastructuresforprojectionmaintenancehaveledtorecentbreakthroughsinmanyconvexprogrammingalgorithms...

展开>> 收起<<
Sketching Meets Differential Privacy Fast Algorithm for Dynamic Kronecker Projection Maintenance Zhao SongXin YangYuanyuan YangLichen Zhang.pdf

共56页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:56 页 大小:1.01MB 格式:PDF 时间:2025-04-24

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 56
客服
关注