Faster Linear Algebra for Distance Matrices Piotr Indyk MIT

2025-05-06 0 0 607.91KB 30 页 10玖币

侵权投诉

Faster Linear Algebra for Distance Matrices

Piotr Indyk

MIT

indyk@mit.edu

Sandeep Silwal

MIT

silwal@mit.edu

Abstract

The distance matrix of a dataset

points with respect to a distance function

represents all

pairwise distances between points in

induced by

. Due to their wide applicability, distance matrices

and related families of matrices have been the focus of many recent algorithmic works. We continue this

line of research and take a broad view of algorithm design for distance matrices with the goal of designing

fast algorithms, which are speciﬁcally tailored for distance matrices, for fundamental linear algebraic

primitives. Our results include eﬃcient algorithms for computing matrix-vector products for a wide class

of distance matrices, such as the

metric for which we get a linear runtime, as well as an Ω(

) lower

bound for any algorithm which computes a matrix-vector product for the

`∞

case, showing a separation

between the

and the

`∞

metrics. Our upper bound results, in conjunction with recent works on the

matrix-vector query model, have many further downstream applications, including the fastest algorithm

for computing a relative error low-rank approximation for the distance matrix induced by

and

functions and the fastest algorithm for computing an additive error low-rank approximation for the

metric, in addition to applications for fast matrix multiplication among others. We also give algorithms

for constructing distance matrices and show that one can construct an approximate

distance matrix

in time faster than the bound implied by the Johnson-Lindenstrauss lemma.

1 Introduction

Given a set of

points

{x1, . . . , xn}

, the distance matrix of

with respect to a distance function

deﬁned as the

n×n

matrix

satisfying

Ai,j

(

xi, xj

). Distances matrices are ubiquitous objects arising

in various applications ranging from learning image manifolds [

TSL00

WS06

], signal processing [

SY07

biological analysis [

HS93

], and non-linear dimensionality reduction [

Kru64

Kru78

TSL00

CC08

], to name

a few

. Unfortunately, explicitly computing and storing

requires at least Ω(

) time and space. Such

complexities are prohibitive for scaling to large datasets.

A silver lining is that in many settings, the matrix

is not explicitly required. Indeed in many applications,

it suﬃces to compute some underlying function or property of

, such as the eigenvalues and eigenvectors

or a low-rank approximation of

. Thus an algorithm designer can hope to use the special geometric

structure encoded by Ato design faster algorithms tailored for such tasks.

Therefore, it is not surprising that many recent works explicitly take advantage of the underlying geometric

structure of distance matrices, and other related families of matrices, to design fast algorithms (see Section

1.2 for a thorough discussion of prior works). In this work, we continue this line of research and take a broad

view of algorithm design for distance matrices. Our main motivating question is the following:

Can we design algorithms for fundamental linear algebraic primitives which are speciﬁcally tailored

for distance matrices and related families of matrices?

We make progress towards the motivating question by studying three of the most fundamental primitives

in algorithmic linear algebra. Speciﬁcally:

1We refer the reader to the survey [DPRV15] for a more thorough discussion of applications of distance matrices.

arXiv:2210.15114v1 [cs.DS] 27 Oct 2022

We study upper and lower bounds for computing matrix-vector products for a wide array of distance

matrices,

2. We give algorithms for multiplying distance matrices faster than general matrices, and,

3. We give fast algorithms for constructing distance matrices.

1.1 Our Results

We now describe our contributions in more detail.

1. We study upper and lower bounds for constructing matrix-vector queries for a wide array of

distance matrices.

A matrix-vector query algorithm accepts a vector

as input and outputs the vector

. There is substantial

motivation for studying such queries. Indeed, there is now a rich literature for fundamental linear algebra

algorithms which are in the “matrix free” or “implicit” model. These algorithms only assume access to the

underlying matrix via matrix-vector queries. Some well known algorithms in this model include the power

method for computing eigenvalues and the conjugate gradient descent method for solving a system of linear

equations. For many fundamental functions of

, nearly optimal bounds in terms of the number of queries

have been achieved [

MM15

BHSW20

BCW22

]. Furthermore, having access to matrix-vector queries also

allows the simulation of any randomized sketching algorithm, a well studied algorithmic paradigm in its own

right [

Woo14

]. This is because randomized sketching algorithms operate on the matrix Π

Π where Π

is a suitably chosen random matrix, such as a Gaussian matrix. Typically, Π is chosen so that the sketches

Π have signiﬁcantly smaller row or column dimension compared to

. If

is symmetric, we can

easily acquire both types of matrix sketches via a small number of matrix-vector queries.

Therefore, creating eﬃcient versions of matrix-vector queries for distance matrices automatically lends

itself to many further downstream applications. We remark that our algorithms can access to the set of input

points but do not explicitly create the distance matrix. A canonical example of our upper bound results is

the construction of matrix-vector queries for the function f(x, y) = kx−ykp

Theorem 1.1.

Let

p≥

1be an integer. Suppose we are given a dataset of

points

{x1, . . . , xn} ⊂ Rd

implicitly deﬁnes the matrix

Ai,j

kxi−xjkp

. Given a query

z∈Rn

, we can compute

exactly in time

O(ndp). If pis odd, we also require O(nd log n)preprocessing time.

We give similar guarantees for a wide array of functions

and we refer the reader to Table 1 which

summarizes our matrix-vector query upper bound results. Note that some of the functions

we study

in Table 1 do not necessarily induce a metric in the strict mathematical sense (for example the function

(

x, y

) =

kx−yk2

does not satisfy the triangle inequality). Nevertheless, we still refer to such functions

under the broad umbrella term of “distance functions” for ease of notation. We always explicitly state the

function fwe are referring to.

Crucially, most of our bounds have a linear dependency on

which allows for scalable computation as

the size of the dataset Xgrows. Our upper bounds are optimal in many cases, see Theorem A.13.

Combining our upper bound results with optimized matrix-free methods, immediate corollaries of our

results include faster algorithms for eigenvalue and singular value computations and low-rank approximations.

Low-rank approximation is of special interest as it has been widely studied for distance matrices; for low-rank

approximation, our bounds outperform prior results for speciﬁc distance functions. For example, for the

and

case (and in general PSD matrices), [

BCW20

] showed that a rank-

approximation can be found in

time

(

ndk/ε

nkw−1/εw−1

). This bound has extra

poly

/ε

) overhead compared to our bound stated in

Table 2. The work of [

IVWW19

] has a worse

poly

(

/ε

) overhead for an additive error approximation for the

case. See Section 1.2 for further discussion of prior works. The downstream applications of matrix-vector

queries are summarized in Table 2.

We also study fundamental limits for any upper bound algorithms. In particular, we show that no algo-

rithm can compute a matrix-vector query for general inputs for the

`∞

metric in subquadratic time, assuming a

Function f(x, y) Preprocessing Query Time Reference

pfor peven kx−ykp

p−O(ndp) Thms. A.1 / A.3

pfor podd kx−ykp

pO(nd log n)O(ndp) Thms. 2.2 / A.4

Mixed `∞maxi,j |xi−yj|O(nd log n)O(n2) Thm. A.5

Mahalanobis Distance2xTMy O(nd2)O(nd) Thm. A.6

Polynomial Kernel hx, yip−O(ndp) Thm. A.7

Total Variation Distance TV(x, y)O(nd log n)O(nd) Thm. A.8

KL Divergence DKL(xky)−O(nd) Thm. A.2

Symmetric Divergence DKL(xky)+DKL(ykx)−O(nd) Thm. A.9

Cross Entropy H(x, y)−O(nd) Thm. A.9

Hellinger Distance2Pd

i=1 px(i)y(i)−O(nd) Thm. A.10

Table 1: A summary of our results for exact matrix-vector queries.

standard complexity-theory assumption called the Strong Exponential Time Hypothesis (SETH) [

IP01

IPZ01

Theorem 1.2.

For any

α >

0and

(

log n

), any algorithm for exactly computing

for any input

where Ais the `∞distance matrix, requires Ω(n2−α)time (assuming SETH).

This shows a separation between the functions listed in Table 1 and the

`∞

metric. Surprisingly, we can

create queries for the approximate matrix-vector query in substantially faster time.

Theorem 1.3.

Suppose

X⊆ {

, . . . , O

(1)

. We can compute

in time

(

n·dO(√dlog(d/ε))

)where

kA−Bk∞≤ε.

To put the above result into context, the lower bound of Theorem 1.2 holds for points sets in

{

d≈log n

dimensions. In contrast, if we relax to an approximation guarantee, we can obtain a subquadratic-

time algorithm for dup to Θ(log2(n)/log log(n)).

Finally, we provide a general understanding of the limits of our upper bound techniques. In Theorem B.1,

we show that essentially the only

for which our upper bound techniques apply have a “linear structure”

after a suitable transformation. We refer to Appendix Section B for details.

2. We give algorithms for multiplying distance matrices faster than general matrices.

Fast matrix-vector queries also automatically imply fast matrix multiplication, which can be reduced to

a series of matrix-vector queries. For concreteness, if

is the

function which induces

, and

is any

n×n

matrix, we can compute

in time

(

n2dp

). This is substantially faster than the general matrix

multiplication bound of

nw≈n2.37

. We also give an improvement of this result for the case where we are

multiplying two distance matrices arising from `2

2. See Table 2 for summary.

3. We give fast algorithms for constructing distance matrices.

Finally, we give fast algorithms for constructing approximate distance matrices. To establish some context,

recall the classical Johnson-Lindenstrauss (JL) lemma which (roughly) states that a random projection of a

dataset

X⊂Rd

of size

onto a dimension of size

(

log n

) approximately preserves all pairwise distances

[

JL84

]. A common applications of this lemma is to instantiate the

distance matrix. A naive algorithm

which computes the distance matrix after performing the JL projection requires approximately

(

n2log n

)

time. Surprisingly, we show that the JL lemma is not tight with respect to creating an approximate

distance matrix; we show that one can initialize the `2distance in an asymptotically better runtime.

Problem f(x, y) Runtime Prior Work

(1 + ε) Relative error rank k

low-rank approximation `1, `2

Ondk

ε1/3+nkw−1

ε(w−1)/3

Theorem C.4

Ondk

ε+nkw−1

εw−1

[BCW20]

Additive error εkAkFrank k

low-rank approximation `2

Ondk

ε1/3+nkw−1

ε(w−1)/3

Theorem C.6

O(nd ·poly(k, 1/ε))

[IVWW19]

(1 + ε) Relative error rank k

low-rank approximation Any in Table 1

OT k

ε1/3+nkw−1

ε(w−1)/3

Theorem C.7

On2dk

ε1/3+nkw−1

ε(w−1)/3

[BCW22]

(1 ±ε) Approximation to

top ksingular values Any in Table 1

OT k

ε1/2+nk2

ε+k3

ε3/2

Theorem C.8

On2dk

ε1/2+nk2

ε+k3

3/2

[MM15]

Multiply distance matrix A

with any B∈Rn×nAny in Table 1 O(T n)

Lemma C.9 O(nw)

Multiply two distance

matrices Aand B`2

O(n2dw−2)

Lemma C.11 O(nw)

Table 2: Applications of our matrix-vector query results.

denotes the matrix-vector query time, given in

Table 1. w≈2.37 is the matrix multiplication constant [AW21].

Theorem 1.4

(Informal; See Theorem D.5 )

We can calculate a

n×n

matrix

such that each (

i, j

)entry

Bij of Bsatisﬁes (1 −ε)kxi−xjk2≤Bij ≤(1 + ε)kxi−xjk2in time O(ε−2n2log2(ε−1log n)).

Our result can be viewed as the natural runtime bound which would follow if the JL lemma implied an

embedding dimension bound of

(

poly

(

log log n

)). While this is impossible, as it would imply an exponential

improvement over the JL bound which is tight [

LN17

], we achieve our speedup by carefully reusing distance

calculations via tools from metric compression [

IRW17

]. Our results also extend to the

distance matrix;

see Theorem D.5 for details.

Notation.

Our dataset will be the

points

{x1, . . . , xn} ⊂ Rd

. For points in

, we denote

(

) to

be the

th coordinate of point

for clarity. For all other vectors

denotes the

th coordinate. We are

interested in matrices of the form

Ai,j

(

xi, xj

) for

Rd×Rd→R

which measures the similarity between

any pair of points.

might not necessarily be a distance function but we use the terminology “distance

function” for ease of notation. We will always explicitly state the function

as needed.

w≈

37 denotes

the matrix multiplication constant, i.e., the exponent of

in the time required to compute the product of

two n×nmatrix [AW21].

1.2 Related Works

Matrix-Vector Products Queries.

Our work can be understood as being part of a long line of classical

works on the matrix free or implicit model as well as the active recent line of works on the matrix-vector

query model. Many widely used linear algebraic algorithms such as the power method, the Lanczos algorithm

[

Lan50

], conjugate gradient descent [

S+94

], and Wiedemann’s coordinate recurrence algorithm [

Wie86

], to

name a few, all fall into this paradigm. Recent works such as [

MM15

BHSW20

BCW22

] have succeeded

in precisely nailing down the query complexity of these classical algorithms in addition to various other

algorithmic tasks such as low-rank approximation [

BCW22

], trace estimation [

MMMW21

], and other linear-

algebraic functions [

SWYZ21b

RWZ20

]. There is also a rich literature on query based algorithms in other

contexts with the goal of minimizing the number of queries used. Examples include graph queries [

Gol17

distribution queries [

Can20

], and constraint based queries [

ES20

] in property testing, inner product queries

in compressed sensing [EK12], and quantum queries [LSZ21, CHL21].

Most prior works on query based models assume black-box access to matrix-vector queries. While this is

a natural model which allows for the design non-trivial algorithms and lower bounds, it is not always clear

how such queries can be initialized. In contrast, the focus of our work is not on obtaining query complexity

bounds, but rather complementing prior works by creating an eﬃcient matrix-vector query for a natural class

of matrices.

Subquadratic Algorithms for Distance Matrices.

Most work on subquadratic algorithms for distance

matrices have focused on the problem of computing a low-rank approximation. [

BW18

IVWW19

] both

obtain an additive error low-rank approximation applicable for all distance matrices. These works only

assume access to the entries of the distance matrix whereas we assume we also have access to the underlying

dataset. [

BCW20

] study the problem of computing the low-rank approximation of PSD matrices with also

sample access to the entries of the matrix. Their results extend to low-rank approximation for the

and

distance matrices in addition to other more specialized metrics such as spherical metrics. Table 2 lists the

runtime comparisons between their results and ours.

Practically, the algorithm of [

IVWW19

] is the easiest to implement and has outstanding empirical

performance. We note that we can easily simulate their algorithm with no overall asymptotic runtime

overhead using

(

log n

) vector queries. Indeed, their algorithm proceeds by sampling rows of the matrix

according to their

value and then post-processing these rows. The sampling probabilities only need to

be accurate up to a factor of two. We can acquire these sampling probabilities by performing

(

log n

)

matrix-vector queries which sketches the rows onto dimension

(

log n

) and preserves all row-norms up to

a factor of two with high probability due to the Johnson-Lindenstrauss lemma [

JL84

]. This procedure only

incurs an additional runtime of O(Tlog n) where Tis the time required to perform a matrix-vector query.

The paper [

ILLP04

] shows that the exact

distance matrix can be created in time

(

n(w+3)/2

)

≈n2.69

in the case of

, which is asymptotically faster than the naive bound of

(

n2d

) =

(

). In contrast,

we focus on creating an (entry-wise) approximate distance matrices for all values of d.

We also compare to the paper of [

ACSS20

]. In summary, their main upper bounds are approximation

algorithms while we mainly focus on exact algorithms. Concretely, they study matrix vector products for

matrices of the form

Ai,j

(

kxi−xjk2

) for some function

R→R

. They present results on approximating

the matrix vector product of Awhere the approximation error is additive. They also consider a wide range

, including polynomials and other kernels, but the input to is always the

distance squared. In contrast,

we also present exact algorithms, i.e., with no approximation errors. For example one of our main upper

bounds is an exact algorithm when

Ai,j

kxi−xjk1

(see Table 1 for the full list). Since it is possible to

approximately embed the

distance into

, their methods could be used to derive approximate algorithms

for

, but not the exact ones. Furthermore, we also study a wide variety of other distance functions such as

`∞

and

(and others listed in Table 1) which are not studied in Alman et al. In terms of technique, the main

upper bound technique of Alman et al. is to expand

(

kxi−xjk2

) and approximate the resulting quantity

via a polynomial. This is related to our upper bound results for

for even

where we also use polynomials.

However, our results are exact, while theirs are approximate. Our

upper bound technique is orthogonal to

the polynomial approximation techniques used in Alman et al. We also employ polynomial techniques to give

upper bounds for the approximate

`∞

distance function which is not studied in Alman et al. Lastly, Alman

et al. also focus on the Laplacian matrix of the weighted graph represented by the distance matrix, such

as spectral sparsiﬁcation and Laplacian system solving. In contrast, we study diﬀerent problems including

low-rank approximations, eigenvalue estimation, and the task of initializing an approximate distance matrix.

We do not consider the distance matrix as a graph or consider the associated Laplacian matrix.

It is also easy to verify the “folklore” fact that for a gram matrix

AAT

, we can compute

AATv

in time

(

) if

A∈Rn×d

by computing

ATv

ﬁrst and then

(

ATv

). Our upper bound for the

function can

be reduced to this folklore fact by noting that

kx−yk2

kxk2

kyk2

2−

hx, yi

. Thus the

matrix can

be decomposed into two rank one components due to the terms

kxk2

and

kyk2

, and a gram matrix due to

the term

hx, yi

. This decomposition of the

matrix is well-known (see Section 2 in [

DPRV15

]). Hence, a

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

FasterLinearAlgebraforDistanceMatricesPiotrIndykMITindyk@mit.eduSandeepSilwalMITsilwal@mit.eduAbstractThedistancematrixofadatasetXofnpointswithrespecttoadistancefunctionfrepresentsallpairwisedistancesbetweenpointsinXinducedbyf.Duetotheirwideapplicability,distancematricesandrelatedfamiliesofmatricesh...

展开>> 收起<<

Faster Linear Algebra for Distance Matrices Piotr Indyk MIT.pdf

共30页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Faster Linear Algebra for Distance Matrices Piotr Indyk MIT

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: