Faster Linear Algebra for Distance Matrices Piotr Indyk MIT

2025-05-06 0 0 607.91KB 30 页 10玖币
侵权投诉
Faster Linear Algebra for Distance Matrices
Piotr Indyk
MIT
indyk@mit.edu
Sandeep Silwal
MIT
silwal@mit.edu
Abstract
The distance matrix of a dataset
X
of
n
points with respect to a distance function
f
represents all
pairwise distances between points in
X
induced by
f
. Due to their wide applicability, distance matrices
and related families of matrices have been the focus of many recent algorithmic works. We continue this
line of research and take a broad view of algorithm design for distance matrices with the goal of designing
fast algorithms, which are specifically tailored for distance matrices, for fundamental linear algebraic
primitives. Our results include efficient algorithms for computing matrix-vector products for a wide class
of distance matrices, such as the
`1
metric for which we get a linear runtime, as well as an Ω(
n2
) lower
bound for any algorithm which computes a matrix-vector product for the
`
case, showing a separation
between the
`1
and the
`
metrics. Our upper bound results, in conjunction with recent works on the
matrix-vector query model, have many further downstream applications, including the fastest algorithm
for computing a relative error low-rank approximation for the distance matrix induced by
`1
and
`2
2
functions and the fastest algorithm for computing an additive error low-rank approximation for the
`2
metric, in addition to applications for fast matrix multiplication among others. We also give algorithms
for constructing distance matrices and show that one can construct an approximate
`2
distance matrix
in time faster than the bound implied by the Johnson-Lindenstrauss lemma.
1 Introduction
Given a set of
n
points
X
=
{x1, . . . , xn}
, the distance matrix of
X
with respect to a distance function
f
is
defined as the
n×n
matrix
A
satisfying
Ai,j
=
f
(
xi, xj
). Distances matrices are ubiquitous objects arising
in various applications ranging from learning image manifolds [
TSL00
,
WS06
], signal processing [
SY07
],
biological analysis [
HS93
], and non-linear dimensionality reduction [
Kru64
,
Kru78
,
TSL00
,
CC08
], to name
a few
1
. Unfortunately, explicitly computing and storing
A
requires at least Ω(
n2
) time and space. Such
complexities are prohibitive for scaling to large datasets.
A silver lining is that in many settings, the matrix
A
is not explicitly required. Indeed in many applications,
it suffices to compute some underlying function or property of
A
, such as the eigenvalues and eigenvectors
of
A
or a low-rank approximation of
A
. Thus an algorithm designer can hope to use the special geometric
structure encoded by Ato design faster algorithms tailored for such tasks.
Therefore, it is not surprising that many recent works explicitly take advantage of the underlying geometric
structure of distance matrices, and other related families of matrices, to design fast algorithms (see Section
1.2 for a thorough discussion of prior works). In this work, we continue this line of research and take a broad
view of algorithm design for distance matrices. Our main motivating question is the following:
Can we design algorithms for fundamental linear algebraic primitives which are specifically tailored
for distance matrices and related families of matrices?
We make progress towards the motivating question by studying three of the most fundamental primitives
in algorithmic linear algebra. Specifically:
1We refer the reader to the survey [DPRV15] for a more thorough discussion of applications of distance matrices.
1
arXiv:2210.15114v1 [cs.DS] 27 Oct 2022
1.
We study upper and lower bounds for computing matrix-vector products for a wide array of distance
matrices,
2. We give algorithms for multiplying distance matrices faster than general matrices, and,
3. We give fast algorithms for constructing distance matrices.
1.1 Our Results
We now describe our contributions in more detail.
1. We study upper and lower bounds for constructing matrix-vector queries for a wide array of
distance matrices.
A matrix-vector query algorithm accepts a vector
z
as input and outputs the vector
Az
. There is substantial
motivation for studying such queries. Indeed, there is now a rich literature for fundamental linear algebra
algorithms which are in the “matrix free” or “implicit” model. These algorithms only assume access to the
underlying matrix via matrix-vector queries. Some well known algorithms in this model include the power
method for computing eigenvalues and the conjugate gradient descent method for solving a system of linear
equations. For many fundamental functions of
A
, nearly optimal bounds in terms of the number of queries
have been achieved [
MM15
,
BHSW20
,
BCW22
]. Furthermore, having access to matrix-vector queries also
allows the simulation of any randomized sketching algorithm, a well studied algorithmic paradigm in its own
right [
Woo14
]. This is because randomized sketching algorithms operate on the matrix Π
A
or
A
Π where Π
is a suitably chosen random matrix, such as a Gaussian matrix. Typically, Π is chosen so that the sketches
Π
A
or
A
Π have significantly smaller row or column dimension compared to
A
. If
A
is symmetric, we can
easily acquire both types of matrix sketches via a small number of matrix-vector queries.
Therefore, creating efficient versions of matrix-vector queries for distance matrices automatically lends
itself to many further downstream applications. We remark that our algorithms can access to the set of input
points but do not explicitly create the distance matrix. A canonical example of our upper bound results is
the construction of matrix-vector queries for the function f(x, y) = kxykp
p.
Theorem 1.1.
Let
p
1be an integer. Suppose we are given a dataset of
n
points
X
=
{x1, . . . , xn} ⊂ Rd
.
X
implicitly defines the matrix
Ai,j
=
kxixjkp
p
. Given a query
zRn
, we can compute
Az
exactly in time
O(ndp). If pis odd, we also require O(nd log n)preprocessing time.
We give similar guarantees for a wide array of functions
f
and we refer the reader to Table 1 which
summarizes our matrix-vector query upper bound results. Note that some of the functions
f
we study
in Table 1 do not necessarily induce a metric in the strict mathematical sense (for example the function
f
(
x, y
) =
kxyk2
2
does not satisfy the triangle inequality). Nevertheless, we still refer to such functions
under the broad umbrella term of “distance functions” for ease of notation. We always explicitly state the
function fwe are referring to.
Crucially, most of our bounds have a linear dependency on
n
which allows for scalable computation as
the size of the dataset Xgrows. Our upper bounds are optimal in many cases, see Theorem A.13.
Combining our upper bound results with optimized matrix-free methods, immediate corollaries of our
results include faster algorithms for eigenvalue and singular value computations and low-rank approximations.
Low-rank approximation is of special interest as it has been widely studied for distance matrices; for low-rank
approximation, our bounds outperform prior results for specific distance functions. For example, for the
`1
and
`2
2
case (and in general PSD matrices), [
BCW20
] showed that a rank-
k
approximation can be found in
time
O
(
ndk
+
nkw1w1
). This bound has extra
poly
(1
) overhead compared to our bound stated in
Table 2. The work of [
IVWW19
] has a worse
poly
(
k,
1
) overhead for an additive error approximation for the
`2
case. See Section 1.2 for further discussion of prior works. The downstream applications of matrix-vector
queries are summarized in Table 2.
We also study fundamental limits for any upper bound algorithms. In particular, we show that no algo-
rithm can compute a matrix-vector query for general inputs for the
`
metric in subquadratic time, assuming a
2
Function f(x, y) Preprocessing Query Time Reference
`p
pfor peven kxykp
pO(ndp) Thms. A.1 / A.3
`p
pfor podd kxykp
pO(nd log n)O(ndp) Thms. 2.2 / A.4
Mixed `maxi,j |xiyj|O(nd log n)O(n2) Thm. A.5
Mahalanobis Distance2xTMy O(nd2)O(nd) Thm. A.6
Polynomial Kernel hx, yipO(ndp) Thm. A.7
Total Variation Distance TV(x, y)O(nd log n)O(nd) Thm. A.8
KL Divergence DKL(xky)O(nd) Thm. A.2
Symmetric Divergence DKL(xky)+DKL(ykx)O(nd) Thm. A.9
Cross Entropy H(x, y)O(nd) Thm. A.9
Hellinger Distance2Pd
i=1 px(i)y(i)O(nd) Thm. A.10
Table 1: A summary of our results for exact matrix-vector queries.
standard complexity-theory assumption called the Strong Exponential Time Hypothesis (SETH) [
IP01
,
IPZ01
].
Theorem 1.2.
For any
α >
0and
d
=
ω
(
log n
), any algorithm for exactly computing
Az
for any input
z
,
where Ais the `distance matrix, requires Ω(n2α)time (assuming SETH).
This shows a separation between the functions listed in Table 1 and the
`
metric. Surprisingly, we can
create queries for the approximate matrix-vector query in substantially faster time.
Theorem 1.3.
Suppose
X⊆ {
0
,
1
, . . . , O
(1)
}d
. We can compute
By
in time
O
(
n·dO(dlog(d/ε))
)where
kABkε.
To put the above result into context, the lower bound of Theorem 1.2 holds for points sets in
{
0
,
1
,
2
}d
in
dlog n
dimensions. In contrast, if we relax to an approximation guarantee, we can obtain a subquadratic-
time algorithm for dup to Θ(log2(n)/log log(n)).
Finally, we provide a general understanding of the limits of our upper bound techniques. In Theorem B.1,
we show that essentially the only
f
for which our upper bound techniques apply have a “linear structure”
after a suitable transformation. We refer to Appendix Section B for details.
2. We give algorithms for multiplying distance matrices faster than general matrices.
Fast matrix-vector queries also automatically imply fast matrix multiplication, which can be reduced to
a series of matrix-vector queries. For concreteness, if
f
is the
`p
p
function which induces
A
, and
B
is any
n×n
matrix, we can compute
AB
in time
O
(
n2dp
). This is substantially faster than the general matrix
multiplication bound of
nwn2.37
. We also give an improvement of this result for the case where we are
multiplying two distance matrices arising from `2
2. See Table 2 for summary.
3. We give fast algorithms for constructing distance matrices.
Finally, we give fast algorithms for constructing approximate distance matrices. To establish some context,
recall the classical Johnson-Lindenstrauss (JL) lemma which (roughly) states that a random projection of a
dataset
XRd
of size
n
onto a dimension of size
O
(
log n
) approximately preserves all pairwise distances
[
JL84
]. A common applications of this lemma is to instantiate the
`2
distance matrix. A naive algorithm
which computes the distance matrix after performing the JL projection requires approximately
O
(
n2log n
)
time. Surprisingly, we show that the JL lemma is not tight with respect to creating an approximate
`2
distance matrix; we show that one can initialize the `2distance in an asymptotically better runtime.
3
Problem f(x, y) Runtime Prior Work
(1 + ε) Relative error rank k
low-rank approximation `1, `2
2
˜
Ondk
ε1/3+nkw1
ε(w1)/3
Theorem C.4
Ondk
ε+nkw1
εw1
[BCW20]
Additive error εkAkFrank k
low-rank approximation `2
˜
Ondk
ε1/3+nkw1
ε(w1)/3
Theorem C.6
˜
O(nd ·poly(k, 1))
[IVWW19]
(1 + ε) Relative error rank k
low-rank approximation Any in Table 1
˜
OT k
ε1/3+nkw1
ε(w1)/3
Theorem C.7
˜
On2dk
ε1/3+nkw1
ε(w1)/3
[BCW22]
(1 ±ε) Approximation to
top ksingular values Any in Table 1
˜
OT k
ε1/2+nk2
ε+k3
ε3/2
Theorem C.8
˜
On2dk
ε1/2+nk2
ε+k3
ε
3/2
[MM15]
Multiply distance matrix A
with any BRn×nAny in Table 1 O(T n)
Lemma C.9 O(nw)
Multiply two distance
matrices Aand B`2
2
O(n2dw2)
Lemma C.11 O(nw)
Table 2: Applications of our matrix-vector query results.
T
denotes the matrix-vector query time, given in
Table 1. w2.37 is the matrix multiplication constant [AW21].
Theorem 1.4
(Informal; See Theorem D.5 )
.
We can calculate a
n×n
matrix
B
such that each (
i, j
)entry
Bij of Bsatisfies (1 ε)kxixjk2Bij (1 + ε)kxixjk2in time O(ε2n2log2(ε1log n)).
Our result can be viewed as the natural runtime bound which would follow if the JL lemma implied an
embedding dimension bound of
O
(
poly
(
log log n
)). While this is impossible, as it would imply an exponential
improvement over the JL bound which is tight [
LN17
], we achieve our speedup by carefully reusing distance
calculations via tools from metric compression [
IRW17
]. Our results also extend to the
`1
distance matrix;
see Theorem D.5 for details.
Notation.
Our dataset will be the
n
points
X
=
{x1, . . . , xn} ⊂ Rd
. For points in
X
, we denote
xi
(
j
) to
be the
j
th coordinate of point
xi
for clarity. For all other vectors
v
,
vi
denotes the
i
th coordinate. We are
interested in matrices of the form
Ai,j
=
f
(
xi, xj
) for
f
:
Rd×RdR
which measures the similarity between
any pair of points.
f
might not necessarily be a distance function but we use the terminology “distance
function” for ease of notation. We will always explicitly state the function
f
as needed.
w
2
.
37 denotes
the matrix multiplication constant, i.e., the exponent of
n
in the time required to compute the product of
two n×nmatrix [AW21].
1.2 Related Works
Matrix-Vector Products Queries.
Our work can be understood as being part of a long line of classical
works on the matrix free or implicit model as well as the active recent line of works on the matrix-vector
query model. Many widely used linear algebraic algorithms such as the power method, the Lanczos algorithm
[
Lan50
], conjugate gradient descent [
S+94
], and Wiedemann’s coordinate recurrence algorithm [
Wie86
], to
name a few, all fall into this paradigm. Recent works such as [
MM15
,
BHSW20
,
BCW22
] have succeeded
in precisely nailing down the query complexity of these classical algorithms in addition to various other
algorithmic tasks such as low-rank approximation [
BCW22
], trace estimation [
MMMW21
], and other linear-
algebraic functions [
SWYZ21b
,
RWZ20
]. There is also a rich literature on query based algorithms in other
contexts with the goal of minimizing the number of queries used. Examples include graph queries [
Gol17
],
4
distribution queries [
Can20
], and constraint based queries [
ES20
] in property testing, inner product queries
in compressed sensing [EK12], and quantum queries [LSZ21, CHL21].
Most prior works on query based models assume black-box access to matrix-vector queries. While this is
a natural model which allows for the design non-trivial algorithms and lower bounds, it is not always clear
how such queries can be initialized. In contrast, the focus of our work is not on obtaining query complexity
bounds, but rather complementing prior works by creating an efficient matrix-vector query for a natural class
of matrices.
Subquadratic Algorithms for Distance Matrices.
Most work on subquadratic algorithms for distance
matrices have focused on the problem of computing a low-rank approximation. [
BW18
,
IVWW19
] both
obtain an additive error low-rank approximation applicable for all distance matrices. These works only
assume access to the entries of the distance matrix whereas we assume we also have access to the underlying
dataset. [
BCW20
] study the problem of computing the low-rank approximation of PSD matrices with also
sample access to the entries of the matrix. Their results extend to low-rank approximation for the
`1
and
`2
2
distance matrices in addition to other more specialized metrics such as spherical metrics. Table 2 lists the
runtime comparisons between their results and ours.
Practically, the algorithm of [
IVWW19
] is the easiest to implement and has outstanding empirical
performance. We note that we can easily simulate their algorithm with no overall asymptotic runtime
overhead using
O
(
log n
) vector queries. Indeed, their algorithm proceeds by sampling rows of the matrix
according to their
`2
2
value and then post-processing these rows. The sampling probabilities only need to
be accurate up to a factor of two. We can acquire these sampling probabilities by performing
O
(
log n
)
matrix-vector queries which sketches the rows onto dimension
O
(
log n
) and preserves all row-norms up to
a factor of two with high probability due to the Johnson-Lindenstrauss lemma [
JL84
]. This procedure only
incurs an additional runtime of O(Tlog n) where Tis the time required to perform a matrix-vector query.
The paper [
ILLP04
] shows that the exact
L1
distance matrix can be created in time
O
(
n(w+3)/2
)
n2.69
in the case of
d
=
n
, which is asymptotically faster than the naive bound of
O
(
n2d
) =
O
(
n3
). In contrast,
we focus on creating an (entry-wise) approximate distance matrices for all values of d.
We also compare to the paper of [
ACSS20
]. In summary, their main upper bounds are approximation
algorithms while we mainly focus on exact algorithms. Concretely, they study matrix vector products for
matrices of the form
Ai,j
=
f
(
kxixjk2
2
) for some function
f
:
RR
. They present results on approximating
the matrix vector product of Awhere the approximation error is additive. They also consider a wide range
of
f
, including polynomials and other kernels, but the input to is always the
`2
distance squared. In contrast,
we also present exact algorithms, i.e., with no approximation errors. For example one of our main upper
bounds is an exact algorithm when
Ai,j
=
kxixjk1
(see Table 1 for the full list). Since it is possible to
approximately embed the
`1
distance into
`2
2
, their methods could be used to derive approximate algorithms
for
`1
, but not the exact ones. Furthermore, we also study a wide variety of other distance functions such as
`
and
`p
p
(and others listed in Table 1) which are not studied in Alman et al. In terms of technique, the main
upper bound technique of Alman et al. is to expand
f
(
kxixjk2
2
) and approximate the resulting quantity
via a polynomial. This is related to our upper bound results for
`p
p
for even
p
where we also use polynomials.
However, our results are exact, while theirs are approximate. Our
`1
upper bound technique is orthogonal to
the polynomial approximation techniques used in Alman et al. We also employ polynomial techniques to give
upper bounds for the approximate
`
distance function which is not studied in Alman et al. Lastly, Alman
et al. also focus on the Laplacian matrix of the weighted graph represented by the distance matrix, such
as spectral sparsification and Laplacian system solving. In contrast, we study different problems including
low-rank approximations, eigenvalue estimation, and the task of initializing an approximate distance matrix.
We do not consider the distance matrix as a graph or consider the associated Laplacian matrix.
It is also easy to verify the “folklore” fact that for a gram matrix
AAT
, we can compute
AATv
in time
O
(
nd
) if
ARn×d
by computing
ATv
first and then
A
(
ATv
). Our upper bound for the
`2
2
function can
be reduced to this folklore fact by noting that
kxyk2
2
=
kxk2
2
+
kyk2
2
2
hx, yi
. Thus the
`2
2
matrix can
be decomposed into two rank one components due to the terms
kxk2
2
and
kyk2
2
, and a gram matrix due to
the term
hx, yi
. This decomposition of the
`2
2
matrix is well-known (see Section 2 in [
DPRV15
]). Hence, a
5
摘要:

FasterLinearAlgebraforDistanceMatricesPiotrIndykMITindyk@mit.eduSandeepSilwalMITsilwal@mit.eduAbstractThedistancematrixofadatasetXofnpointswithrespecttoadistancefunctionfrepresentsallpairwisedistancesbetweenpointsinXinducedbyf.Duetotheirwideapplicability,distancematricesandrelatedfamiliesofmatricesh...

展开>> 收起<<
Faster Linear Algebra for Distance Matrices Piotr Indyk MIT.pdf

共30页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:30 页 大小:607.91KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 30
客服
关注