EIGEN MEMORY TREES A P REPRINT Mark Rucker1 Jordan T. Ash2 John Langford2 Paul Mineiro2 and Ida Momennejad2

2025-05-03 0 0 1.46MB 14 页 10玖币
侵权投诉
EIGEN MEMORY TREES
A PREPRINT
Mark Rucker1, Jordan T. Ash2, John Langford2, Paul Mineiro2, and Ida Momennejad2
1University of Virginia
2Microsoft Research
November 1, 2022
ABSTRACT
This work introduces the Eigen Memory Tree (EMT), a novel online memory model for sequential
learning scenarios. EMTs store data at the leaves of a binary tree and route new samples through the
structure using the principal components of previous experiences, facilitating efficient (logarithmic)
access to relevant memories. We demonstrate that EMT outperforms existing online memory
approaches, and provide a hybridized EMT-parametric algorithm that enjoys drastically improved
performance over purely parametric methods with nearly no downsides. Our findings are validated
using 206 datasets from the OpenML repository in both bounded and infinite memory budget
situations.
1 Introduction
A sequential learning framework (also known as online or incremental learning (Hoi et al., 2021; Losing et al., 2018))
considers a setting in which data instances xtRdarrive incrementally. After each instance, the agent is required
to make a decision from a set of |A| possibilities, at∈ A. The agent then receives scalar feedback ytregarding the
quality of the action, and the goal is for the agent to learn a mapping from xtto atthat maximizes the sum of all
observed yt.
This general paradigm accommodates a wide array of well-studied machine learning scenarios. For example, in online
supervised learning, Ais a set of labels—the agent is required to predict a label for each xt, and the feedback yt,
indicates the quality of the prediction.
In a contextual bandit or reinforcement learning setting, xtacts as a context or state, atis an action, and ytcorresponds
to a reward provided by the environment. Contextual Bandits have proven useful in a wide variety of settings; their
properties are extremely well studied (Langford & Zhang, 2007) and have tremendous theoretical and real-world
applications (Bouneffouf et al., 2020).
Regardless of the particulars of the learning scenario, a primary consideration is sample complexity. That is, how can
we obtain the highest-performing model given a fixed interaction budget? This often arises when agents only receive
feedback corresponding to the chosen action at, i.e. partial feedback. Here, after an interaction with the environment,
the agent does not get access to what the best action in hindsight would have been. As a consequence, learners in a
partial-feedback setting need to explore different actions even for a fixed xtin order to discover optimal behavior.
Recent work in reinforcement learning has demonstrated that episodic memory mechanisms can facilitate more effi-
cient learning (Lengyel & Dayan, 2007; Blundell et al., 2016; Pritzel et al., 2017; Hansen et al., 2018; Lin et al., 2018;
Zhu et al., 2020). Episodic memory (Tulving, 1972) refers to memory of specific past experiences (e.g., what did I
have for breakfast yesterday). This is in contrast to semantic memory, which generalizes across many experiences
(e.g., what is my favorite meal for breakfast). Semantic memory is functionally closer to parametric approaches to
learning, which also rely on generalizations while discarding information about specific events or items.
This paper investigates the use of episodic memory for accelerating learning in sequential problems. We introduce
Eigen Memory Trees (EMT), a model that stores past observations in the leaves of a binary tree. Each leaf contains
arXiv:2210.14077v3 [cs.LG] 31 Oct 2022
Eigen Memory Trees A PREPRINT
Figure 1: A schematic of the Eigen Memory Tree (EMT) al-
gorithm. The outer boxes indicate the two operations supported
by EMT, Learn and Query. The inner boxes show the high-level
subroutines that occur to accomplish the outer operation.
Table 1: Cells indicate the number of datasets in which the
row algorithm beats the column algorithm with statistical signif-
icance. The stacked algorithm we propose (PEMT) has the most
wins in each column as indicated by bold.
PEMT
EMT
PCMT
CMT
Parametric
PEMT 167 87 196 110
EMT 31 42 177 44
PCMT 4 156 — 192 75
CMT 7 11 10 — 17
Parametric 8 151 10 184
experiences that are somewhat similar to each other, and the EMT is structured such that new samples are routed
through the tree based on the statistical properties of previously encountered data. When the EMT is queried with a
new observation, this property affords an efficient way to compare it with only the most relevant memories. A learned
“scoring” function wis used to identify the most salient memory in the leaf to be used for decision making.
Specifically, this work
introduces the Eigen Memory Tree, an efficient tool for storing, accessing, and comparing memories to
current observations (see Figure 1).
shows that the EMT gives drastically improved performance over comparable episodic memory data struc-
tures, and sometimes even outperforms parametric approaches that have no explicit memory mechanism.
proposes a simple combination EMT-parametric (PEMT) approach, which outperforms both purely paramet-
ric and EMT methods with nearly no downsides (see Table 1).
In the following section, we introduce the Eigen Memory Tree and overview the algorithms required for storing and
retrieving memories. A schematic of EMT and its high-level algorithms can be seen in Figure 1. With this in mind,
Section 3 describes related work.
Section 4 follows with an exhaustive set of experiments, demonstrating the superiority of EMT to previous episodic
memory models and motivating the EMT-parametric (PEMT) method, a simple and powerful hybrid strategy for ob-
taining competitive performance on sequential learning problems. Importantly, we show that the PEMT performance
advantage holds even when it is constrained to have a fixed limit on the number of memories it is allowed to store.
All experiments here consider the contextual bandit setting, but EMTs are applicable in broader domains as well.
This section continually references Table 1, identifying which methods outperform which other methods by a sta-
tistically significant amount over all datasets and replicates. We consider a substantial number of datasets from the
OpenML (Vanschoren et al., 2014) repository.
Section 5 summarizes our findings, discusses limitations, and overviews directions for future work.
2 Eigen Memory Tree
As with episodic memory, EMT is structured around storing and retrieving exact memories. We formalize this notion
as the self-consistency property: if a memory has been previously Inserted, then a Query with the same key should
return the previously inserted value. The self-consistency property encodes the assumption that the optimal memory
to return, if possible, is an exact match for the current observation.
EMT is a memory model with four key characteristics: (1) self-consistency, (2) incremental memory growth, (3)
incremental query improvement via supervised feedback, and (4) sub-linear computational complexity with respect to
the number of memories in the tree. As we discuss in the literature review below, this combination of characteristics
separates EMT from previous approaches.
2
Eigen Memory Trees A PREPRINT
Algorithm 1 Primary Functions
1: class Node (router Rd, boundary R, left: Node, right: Node, M Rd×R):
2: Initialize: root Node()
3: Initialize: wrandom vector in Rd
4: Initialize: cleaf capacity
5: function QUERY(xRd)
6: nroot
7: while nis not leaf nn.left if hn.router, xi ≤ n.boundary else n.right
8: return arg min(xm,ym)n.MPREDICTSCORER(x,xm)
9: end function
10: function LEARN(xRd,yR)
11: nroot
12: while nis not leaf nn.left if hn.router, xi ≤ n.boundary else n.right
13: UPDATESCORER(x,y)
14: n.Mn.M(x, y)Insert memory into leaf.
15: if |n.M| ≥ cthen
16: SPLITLEAF(n)
17: end if
18: end function
Memory. EMT represents memories as a mapping from keys to values, M:= RdR, where dis the dimension-
ality of the context xt, and ytRcorresponds to observed feedback. A query to the EMT requires an xtand returns
a previously observed value ˆyRfrom its bank of memories. EMT learning, which updates both the underlying data
structure and the scoring mechanism, requires a full (xt, yt)observation pair M.
Data structure. The underlying data structure used by the EMT is a binary tree. The tree is composed of a set of
nodes, N, each of which is either an internal node or a leaf node. If a node n∈ N is internal it possesses two child
nodes (n.left ∈ N and n.right N ), a decision boundary n.boundary R, and a top eigenvector n.router Rd. The
decision boundary and top eigenvector, as discussed later, are used to route queries through the tree. If a node nis a
leaf node then it will instead possess a finite set of memories n.M=(x, y)RdR. In addition to the nodes and
routers EMT also possesses a single global scoring function responsible for selecting which memory to return given a
query and leaf node. This function is parameterised in by the weight vector wRd.
Learn. Newly acquired information is stored during the learning operation (Line 10 in Algorithm 1), which takes an
item Mand traverses the tree until a suitable insertion leaf is identified (Line 14 in Algorithm 1). Immediately before
insertion, the scoring function weights ware updated such that the scores of the insertion leafs memories improve
with respect to the observed xtand yt(Line 13 in Algorithm 1)
Scorer. EMT’s scorer, w, ranks candidate memories at query time. The scorer supports both predictions (see Line 12
Algorithm 2) and updates (see Line 1 Algorithm 2). Intuitively, the scorer can be thought of as a dissimilarity metric,
assigning small values to similar query pairs and large values to dissimilar query pairs.
A pair of identical query vectors is guaranteed to result in a score of 0, satisfying the self-consistency property men-
tioned earlier. This is achieved for any two x1and x2by applying the scorer on the coordinate-wise absolute value
difference between pairs, z← |(x1)i(x2)i|}d
i=1, which is all zeros if the two contexts are identical. Predictions are
then made by linearly regressing this vector onto the scorer’s weights, and clipping predictions hw, zisuch that the
minimum is 0. The clipping operation ensures that predictions will be 0even when weights are negative.
Updating the scoring function is done via a ranking loss, which considers (1) (xbest, ybest), which is the memory-
reward pair that would be retrieved by the scorer currently and (2) (xalt, yalt), an alternative memory in the same leaf
that has a reward most similar to the current observation (omitting the retrieved memory xbest). If yalt gives a better
prediction of the observed reward than ybest, we adjust the scorer to decrease the learned dissimilarity between xalt and
the current target xt. Alternatively, if the retrieved memory ybest is closer to the observed reward, we do the opposite,
emphasizing the similarity between xtand xbest and making xtand xalt more dissimilar. Specifically, this is done by
taking a gradient step with respect to an L2loss that encourages more similar pairs to have a score near 0and more
dissimilar pairs to have a score near 1.
3
摘要:

EIGENMEMORYTREESAPREPRINTMarkRucker1,JordanT.Ash2,JohnLangford2,PaulMineiro2,andIdaMomennejad21UniversityofVirginia2MicrosoftResearchNovember1,2022ABSTRACTThisworkintroducestheEigenMemoryTree(EMT),anovelonlinememorymodelforsequentiallearningscenarios.EMTsstoredataattheleavesofabinarytreeandroutenews...

展开>> 收起<<
EIGEN MEMORY TREES A P REPRINT Mark Rucker1 Jordan T. Ash2 John Langford2 Paul Mineiro2 and Ida Momennejad2.pdf

共14页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:14 页 大小:1.46MB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 14
客服
关注