One issue with k-d trees is that their hierarchical nature means that they
work best with recursively formulated traversal techniques; but recursive for-
mulations can be problematic on highly parallel architectures such as, for ex-
ample, GPUs, FPGAs, dedicated hardware units, etc: true recursion—where
functions can call themselves—is often hard or impossible to realize on such
architectures, and is instead typically replaced with iterative algorithms that
emulate recursion through a manually managed stack of not-yet-traversed sub-
trees. Such “manual” stack based techniques do indeed avoid truly recursive
function calls—and thus work just fine on GPUs—but can still cause several
issues: for example, maintaining a stack per live thread (of which there will
typically be many thousands) can require large amounts of temporary memory;
it can lead to a significant amount of memory accesses (since such stacks are
register-indirectly accessed they will typically end up in device memory); they
can lead to hard-to-track errors if the reserved stack size isn’t large enough; etc.
In this paper, we describe a stack-free traversal algorithm for left-balanced
k-d trees that does not require any stack at all, and instead only requires two
integer values (for current and last traversed node IDs, respectively) to track all
traversal state. In particular, our algorithm can handle both simple unordered
traversals as well as the more channeling ordered kind where the traversal order
of two children depends on which side of the parent’s partitioning plane the
query point is located on. Our algorithm borrows from ideas we had previously
developed for bounding volume hierarchies [HDWS11], and relies on using a state
machine-like approach that, in each step, derives the respectively next node to
be traversed from some relatively simple logic that only needs to know the
current node that is currently traversed, and the one that was traversed in the
previous step. We describe the core algorithm, provide some simple pseudo-code
that realizes it, and present some performance data for a CUDA implementation
of various variations of find-closest-point (fcp) and k-nearest-neighbor (kNN)
queries that use this algorithm.
2 Related Work
k-d Trees (sometimes also referred to as kd-trees) are hierarchical spatial search
trees used to encode k-dimensional data points [Sam90, Knu97, Wik]. Though
the same term in graphics is also sometimes used to refer to a different kind of
spatial subdivision tree (where inner code describe arbitrary partitioning planes
and data are only stored in the leaves [WH06] ) for this paper we explicitly only
consider the former kind, in which the data to be encoded are k-dimensional
points, where points are stored also at inner nodes, and where partitioning
planes are defined by the coordinates of the points stored at inner nodes (i.e.,
partitioning planes always pass through data points; see Figure 1).
Though k-d trees do not necessarily have to be balanced, one particularly
interesting type of k-d trees are those that are left-balanced and complete, since
these allow for storing the tree without any pointers or other explicit topology
data. In graphics, these became famous through photon mapping [Jen01], in
2