
More precisely, the problem is to estimate the means µ1, . . . , µk∈Rdup to an error with
runtime that is poly(k, 1
, d) with as small a separation ∆ as possible. There has been a long line
of work on this problem which we survey in Section 1.3.
Recently, Regev and Vijayaraghavan [RV17] showed that a separation ∆ = Ω(√log k) is strictly
necessary when the dimension d= Ω(log k). Two natural questions arise immediately. First, if
∆ = √log kis sufficient when d= Ω(log k). Although the original work of [RV17] showed that it
was information theoretically possible, an actual efficient algorithm was only recently developed
by [LL22] (who show nearly tight results). The second main question is determining the optimal
separation in low dimensions when d=o(log k). Previously, even in O(1) dimensions, the exact
separation necessary was unknown. In this paper, we settle the second question and give nearly
optimal bounds on the separation necessary in low dimensions (see Figure 1 for more details).
1.1 Overview of Results
We begin with a few definitions. A point set {x1, x2, . . . , xk}is called ∆-separated if kxj−xj0k2≥∆
for any j6=j0. We say that a Gaussian mixture is ∆-separated (or has separation ∆) if the means
of its components are ∆-separated. Two point sets {u1, . . . , uk}and {v1, . . . , vk}are -close if for
some permutation σover [k], kuj−vσ(j)k2≤holds for every j.
Our main result is an algorithm that efficiently learns the parameters of a mixture of kspherical
Gaussians under separation ∆ ≈d
√log k·poly(log log k) in d=O(log k/ log log k) dimensions. In
the low-dimensional regime, this separation is strictly smaller than min{√log k, √d}, the smallest
separation under which previous algorithms could provably learn the parameters in poly(k) time.
Theorem 1.1 (Upper bound, informal).Let Pbe a uniform mixture of kspherical Gaussians
in d=Olog k
log log kdimensions with separation ∆ = Ω d√log((log k)/d)
√log k. There is a poly(k)-time
algorithm that, given samples from P, outputs kvectors that are w.h.p. -close to the true means
for =O(∆/√d).
See Theorem 2.1 and Remark 2.2 for a more formal statement of our algorithmic result, which
holds for a wider range of separation ∆ and accuracy parameter . Our learning algorithm is
provably correct for arbitrarily small ∆, > 0 (possibly with a longer runtime), whereas for most
of the previous algorithms, there is a breakdown point ∆∗such that the algorithm is not known
to work when ∆ <∆∗. Two exceptions are the algorithms of [MV10,BS10], both of which allow
an arbitrarily small separation but run in eΩ(k)time. We also remark that the runtime of our
algorithm scales as ˜
O(k3)·f(d, ∆, ), and is thus fixed-parameter tractable in parameters d, ∆ and
.1
We complement Theorem 1.1 with an almost-matching lower bound, showing that the d/√log k
separation is necessary for efficient parameter learning in low dimensions.
Theorem 1.2 (Lower bound, informal).For d=Olog k
log log kand ∆ = od
√log k, there are two
mixtures of kspherical Gaussians in Rdsuch that: (1) both have separation ∆; (2) their means are
not (∆/2)-close; and (3) the total variation (TV) distance between them is k−ω(1).
1While we focus on the uniform-weight case for simplicity, Theorem 1.1 can be easily extended to the setting
where each weight is in [1/(Ck), C/k] for some constant C > 1.
2