[EMZ21], signal interpolation [SSWZ22], distance oracles [DSWZ22], generative adversarial net-
works [XZZ18], training neural networks [LSS+20,BPSW21,SYZ21,SZZ21,HSWZ22], matrix com-
pletion [GSYZ23], matrix sensing [DLS23b,QSZ23], attention scheme inspired regression [LSZ23,
DLS23a,LSX+23,GSY23b,GMS23], sparsification of attention matrix [DMS23], discrepancy
minimization [DSW22], dynamic tensor regression problem [RSZ22], John Ellipsoid computation
[SYYZ22], NLP tasks [LSW+20], total least square regression [DSWY19].
Differential Privacy. First introduced in [DKM+06], differential privacy has been playing an
important role in providing theoretical privacy guarantees for enormous algorithms [DR14], for
example, robust learning a mixture of Gaussians [KSSU19], hypothesis selection [BKSW21], hy-
perparameter selection [MSH+22], convex optimization [KLZ22], first-order method [SVK21] and
mean estimation [KSU20,HKM22]. Techniques from differential privacy are also widely studied and
applied in machine learning [CM08,WM10,JE19,TF20], deep neural networks [ACG+16,BPS19],
computer vision [ZYCW20,LWAFF21,TKP19], natural language processing [YDW+21,WK18],
large language models [GSY23a,YNB+22], label protection in split learning [YSY+22], multiple
data release [WYY+22], federated learning [SYY+22] and peer review [DKWS22]. Recent works
also show that robust statistical estimator implies differential privacy [HKMN23].
Projection Maintenance Data Structure. The design of efficient projection maintenance data
structure is a core step that lies in many optimization problems, such as linear programming [Vai89b,
CLS19,Son19,LSZ19,Bra20,SY21,JSWZ21,BLSS20,Ye20,DLY21,GS22], cutting plane method
[Vai89a,JLSW20], integral minimization [JLSZ23], empirical risk minimization [LSZ19,QSZZ23],
semidefinite programming [JLSW20,JKL+20,HJS+22b,HJS+22a,GS22], dynamic least-square
regression [JPW23], large language models [BSZ23], and sum-of-squares optimization [JNW22].
3 Preliminaries & Problem Formulation
In Section 3.1, we introduce the basic notations that we will use in the remainder of the paper. In
Section 3.2, we describe the data structure design problem of our paper.
3.1 Notations and Basic Definitions.
For any integer n > 0, let [n] denote the set {1,2,··· , n}. Let Pr[·] denote probability and E[·]
denote expectation. We use ∥x∥2to denote the ℓ2norm of a vector x. We use N(µ, σ2) to denote
the Gaussian distribution with mean µand variance σ2. We use e
O(f(n)) to denote O(f(n)·
poly log(f(n)). We use Tmat(m, n, k) to denote the time for matrix multiplication for matrix with
dimension m×nand matrix with dimension n×k. We denote ω≈2.38 as the current matrix
multiplication exponent, i.e., Tmat(n, n, n) = nω. We denote α≈0.31 as the dual exponent of
matrix multiplication.
We use ∥A∥and ∥A∥Fto denote the spectral norm and the Frobenius norm of matrix A,
respectively. We use A⊤to denote the transpose of matrix A. We use Imto denote the identity
matrix of size m×m. For αbeing a vector or matrix, we use ∥α∥0to denote the number of
nonzero entries of α. Given a real square matrix A, we use λmax(A) and λmin(A) to denote its
largest and smallest eigenvalue, respectively. Given a real matrix A, we use σmax(A) and σmin(A)
to denote its largest and smallest singular value, respectively. We use A−1to denote the matrix
inverse for matrix A. For a square matrix A, we use tr[A] to denote the trace of A. We use band
4