
Eigen Memory Trees A PREPRINT
Algorithm 1 Primary Functions
1: class Node (router ∈Rd, boundary ∈R, left: Node, right: Node, M ⊆Rd×R):
2: Initialize: root ←Node()
3: Initialize: w←random vector in Rd
4: Initialize: c←leaf capacity
5: function QUERY(x∈Rd)
6: n←root
7: while nis not leaf n←n.left if hn.router, xi ≤ n.boundary else n.right
8: return arg min(xm,ym)∈n.MPREDICTSCORER(x,xm)
9: end function
10: function LEARN(x∈Rd,y∈R)
11: n←root
12: while nis not leaf n←n.left if hn.router, xi ≤ n.boundary else n.right
13: UPDATESCORER(x,y)
14: n.M←n.M∪(x, y)Insert memory into leaf.
15: if |n.M| ≥ cthen
16: SPLITLEAF(n)
17: end if
18: end function
Memory. EMT represents memories as a mapping from keys to values, M:= Rd→R, where dis the dimension-
ality of the context xt, and yt∈Rcorresponds to observed feedback. A query to the EMT requires an xtand returns
a previously observed value ˆy∈Rfrom its bank of memories. EMT learning, which updates both the underlying data
structure and the scoring mechanism, requires a full (xt, yt)observation pair M.
Data structure. The underlying data structure used by the EMT is a binary tree. The tree is composed of a set of
nodes, N, each of which is either an internal node or a leaf node. If a node n∈ N is internal it possesses two child
nodes (n.left ∈ N and n.right ∈ N ), a decision boundary n.boundary ∈R, and a top eigenvector n.router ∈Rd. The
decision boundary and top eigenvector, as discussed later, are used to route queries through the tree. If a node nis a
leaf node then it will instead possess a finite set of memories n.M=(x, y)∈Rd→R. In addition to the nodes and
routers EMT also possesses a single global scoring function responsible for selecting which memory to return given a
query and leaf node. This function is parameterised in by the weight vector w∈Rd.
Learn. Newly acquired information is stored during the learning operation (Line 10 in Algorithm 1), which takes an
item Mand traverses the tree until a suitable insertion leaf is identified (Line 14 in Algorithm 1). Immediately before
insertion, the scoring function weights ware updated such that the scores of the insertion leaf’s memories improve
with respect to the observed xtand yt(Line 13 in Algorithm 1)
Scorer. EMT’s scorer, w, ranks candidate memories at query time. The scorer supports both predictions (see Line 12
Algorithm 2) and updates (see Line 1 Algorithm 2). Intuitively, the scorer can be thought of as a dissimilarity metric,
assigning small values to similar query pairs and large values to dissimilar query pairs.
A pair of identical query vectors is guaranteed to result in a score of 0, satisfying the self-consistency property men-
tioned earlier. This is achieved for any two x1and x2by applying the scorer on the coordinate-wise absolute value
difference between pairs, z← |(x1)i−(x2)i|}d
i=1, which is all zeros if the two contexts are identical. Predictions are
then made by linearly regressing this vector onto the scorer’s weights, and clipping predictions hw, zisuch that the
minimum is 0. The clipping operation ensures that predictions will be 0even when weights are negative.
Updating the scoring function is done via a ranking loss, which considers (1) (xbest, ybest), which is the memory-
reward pair that would be retrieved by the scorer currently and (2) (xalt, yalt), an alternative memory in the same leaf
that has a reward most similar to the current observation (omitting the retrieved memory xbest). If yalt gives a better
prediction of the observed reward than ybest, we adjust the scorer to decrease the learned dissimilarity between xalt and
the current target xt. Alternatively, if the retrieved memory ybest is closer to the observed reward, we do the opposite,
emphasizing the similarity between xtand xbest and making xtand xalt more dissimilar. Specifically, this is done by
taking a gradient step with respect to an L2loss that encourages more similar pairs to have a score near 0and more
dissimilar pairs to have a score near 1.
3