Group Distributionally Robust Reinforcement Learning with Hierarchical Latent Variables Mengdi Xu1 Peide Huang1 Yaru Niu1 Visak Kumar2 Jielin Qiu1

2025-05-06 0 0 3.29MB 27 页 10玖币
侵权投诉
Group Distributionally Robust Reinforcement Learning with
Hierarchical Latent Variables
Mengdi Xu1, Peide Huang1, Yaru Niu1, Visak Kumar2, Jielin Qiu1
Chao Fang2, Kuan-Hui Lee2, Xuewei Qi, Henry Lam3, Bo Li4, Ding Zhao1
Abstract
One key challenge for multi-task Reinforce-
ment learning (RL) in practice is the absence
of task indicators. Robust RL has been ap-
plied to deal with task ambiguity, but may re-
sult in over-conservative policies. To balance
the worst-case (robustness) and average perfor-
mance, we propose Group Distributionally Ro-
bust Markov Decision Process (GDR-MDP), a
flexible hierarchical MDP formulation that en-
codes task groups via a latent mixture model.
GDR-MDP identifies the optimal policy that
maximizes the expected return under the worst-
possible qualified belief over task groups within
an ambiguity set. We rigorously show that
GDR-MDP’s hierarchical structure improves dis-
tributional robustness by adding regularization
to the worst possible outcomes. We then de-
velop deep RL algorithms for GDR-MDP for
both value-based and policy-based RL meth-
ods. Extensive experiments on Box2D control
tasks, MuJoCo benchmarks, and Google foot-
ball platforms show that our algorithms outper-
form classic robust training algorithms across
diverse environments in terms of robustness
under belief uncertainties. Demos are avail-
able on our project page (https://sites.
google.com/view/gdr-rl/home).
1 Introduction
Reinforcement learning (RL) has demonstrated extraordi-
nary capabilities in sequential decision-making, even for
handling multiple tasks [1,2,3,4]. With policies condi-
tioned on accurate task-specific contexts, RL agents could
1Mengdi Xu, Peide Huang, Yaru Niu, Jielin Qiu and Ding Zhao
are with Carnegie Mellon University.
2Visak Kumar, Chao Fang and Kuan-Hui Lee are with Toyota Re-
search Institute (TRI).
3Henry Lam is with Columbia University.
4Bo Li is with University of Illinois Urbana-Champaign.
perform better than ones without access to context infor-
mation [5,6]. However, one key challenge for contextual
decision-making is that, in real deployments, RL agents
may only have incomplete information about the task to
solve. In principle, agents could adaptively infer the la-
tent context with data collected across an episode, and prior
knowledge about tasks [7,8,9]. However, the context es-
timates may be inaccurate [10,11] due to limited interac-
tions, poorly constructed inference models, or intentionally
injected adversarial perturbations. Blindly trusting the in-
ferred context and performing context-dependent decision-
making may lead to significant performance drops or catas-
trophic failures in safety-critical situations. Therefore, in
this work, we are motivated to study the problem of robust
decision-making under the task estimate uncertainty.
Prior works about robust RL involve optimizing over the
worst-case qualified elements within one uncertainty set
[12,13]. Such robust criterion assuming the worst possi-
ble outcome may lead to overly conservative policies, or
even training instabilities [14,15,16]. For instance, an au-
tonomous agent trained with robust methods may always
assume the human driver is aggressive regardless of re-
cent interactions and wait until the road is clear, conse-
quently blocking the traffic. Therefore, balancing the ro-
bustness against task estimate uncertainties and the per-
formance when conditioned on the task estimates is still
an open problem. We provide one solution to address the
above problem by modeling the commonly existing simi-
larities between tasks under distributionally robust Markov
Decision Process (MDP) formulations.
Each task is typically represented by a unique combina-
tion of parameters or a multi-dimensional context in multi-
task RL. We argue that some parameters are more impor-
tant than others in terms of affecting the environment dy-
namics model and thus tasks can be properly clustered into
mixtures according to the more crucial parameters as in
Figure 1(a) and (b). However, existing robust MDP for-
mulations [12] lack the capacity to model task groups, or
equivalently, task subpopulations. Thus the effect of task
subpopulations on the policy’s robustness is unexplored. In
this paper, we show that the task subpopulations help bal-
ance the worst-case performance (robustness) and average
arXiv:2210.12262v1 [cs.LG] 21 Oct 2022
Figure 1: Illustration examples when modeling tasks with
a flat latent structure that uses one distribution for all tasks
as in (a), and a hierarchical latent structure that clusters
tasks to different mixtures as in (b). The graphical model
with a hierarchical latent structure for both GDR-MDP and
HLMDP is shown in (c). At episode n, a mixture znis first
sampled from a prior distribution w. An MDP mis then
sampled according to µzn(m)and controls the dynamics
of the n’th episode.
performance under conditions (Section 5.2).
In contrast to prior work [10] that leverages point estimates
of latent contexts, we take a probabilistic point of view and
represent the task subpopulation estimate with a belief dis-
tribution. Holding a belief of the task subpopulation, which
is the high-level latent variable, helps leverage the prior dis-
tributional information of task similarities. It also naturally
copes with distributionally robust optimization by optimiz-
ing w.r.t. the worst-possible belief distribution within an
ambiguity set. We consider an adaptive setting in line with
system identification methods [17], where the belief is ini-
tialized as a uniform distribution and then updated during
one episode. Our problem formation is related to the am-
biguity modeling [18] inspired by human’s bounded ratio-
nality to approximate and handle distributions, which has
been studied in behavioral economics [19,20] yet has not
been widely acknowledged in RL.
We highlight our main contributions as follows:
1. We formulate Hierarchical-Latent MDP (HLMDP)
(Section 4), which utilizes a mixture model over
MDPs to encode task subpopulations. HLMDP has a
high-level latent variable zas the mixture, and a low-
level mto represent tasks (Figure 1(c)).
2. We introduce the Group Distributionally Robust
MDP (GDR-MDP) in Section 5to handle the over-
conservative problem, which formulates the robust-
ness w.r.t. the ambiguity of the adaptive belief b(z)
over mixtures. GDR-MDP builds on distributionally
robust optimization [21,22] and HLMDP to leverage
rich distributional information.
3. We show the convergence property of GDR-MDP in
the infinite-horizon case. We find that the hierarchical
latent structure helps restrict the worst-possible out-
come within the ambiguity set and thus helps generate
less conservative policies with higher optimal values.
4. We design robust deep RL training algorithms based
on GDR-MDP by injecting perturbations to beliefs
stored in the data buffer. We empirically evalu-
ate in three environments, including robotic control
tasks and google research football tasks. Our results
demonstrate that our proposed algorithms outperform
baselines in terms of robustness to belief noise.
2 Related Work
Robust RL and Distributionally Robust RL. RLs vul-
nerability to uncertainties has attracted large efforts to de-
sign proper robust MDP formulations accounting for un-
certainties in MDP components [12,13,23,24,25,26].
Existing robust deep RL algorithms [27,28,29,30,31,24]
are shown to generate robust policies with promising re-
sults in practice. However, it is also known that robust
RL that optimizes over the worst-possible elements in the
uncertainty set may generate over-conservative policies by
trading average performance for robustness and may even
lead to training instabilities [16]. In contrast, distribution-
ally robust RL [32,33,34,35,36,37,38,39] assumes
that the distribution of uncertain components (such as tran-
sition models) is partially/indirectly observable. It builds
on distributionally robust optimization [21,22] which op-
timizes over the worst possible distribution within the am-
biguity set. Compared with common robust methods, dis-
tributionally robust RL embeds prior probabilistic informa-
tion and generates less conservative policies with carefully
calibrated ambiguity sets [32]. We aim to propose distri-
butionally robust RL formulations and training algorithms
to handle task estimate uncertainties while maintaining a
trade-off between robustness and performance.
One relevant work is the recently proposed distributionally
robust POMDP [37] which maintains a belief over states
and finds the worst possible transition model distribution
within an ambiguity set. We instead hold a belief over
task mixtures and find the worst possible belief distribu-
tion. [38] also maintains a belief distribution over tasks but
models tasks with a flat latent structure. Moreover, [38]
achieves robustness by optimizing at test-time, while we
aim to design robust training algorithms to save computa-
tion during deployment.
RL with Task Estimate Uncertainty. Inferring the la-
tent task as well as utilizing the estimates in decision-
making have been explored under the framework of
Bayesian-adaptive MDPs [40,41,42,43,17]. Our work
is similar to Bayesian-adaptive MDPs in terms of updating
a belief distribution with Bayesian update rules, but we fo-
cus on the robustness against task estimate uncertainties at
the same time. The closest work to our research is [10],
which optimizes a conditional value-at-risk objective and
maintains an uncertainty set centered on a context point es-
timate. Instead, we maintain an ambiguity set over beliefs
and further consider the presence of task subpopulations.
[11] also considers the uncertainties in belief estimates but
with a flat latent task structure.
Multi-task RL. Learning a suite of tasks with an RL
agent has been studied under different frameworks [3,44],
such as Latent MDP [45], Multi-model MDP [5], Con-
textual MDP [46], Hidden Parameter MDP [47], and etc
[48]. Our proposed HLMDP builds on the Latent MDP
[45] which contains a finite number of MDPs, each accom-
panied by a weight. In contrast to Latent MDP utilizing
a flat structure to model each MDP’s probability, HLMDP
leverages a rich hierarchical model to cluster MDPs to a
finite number of mixtures. In addition, HLMDP is a spe-
cial yet important subclass of POMDP [49]. It treats the
latent task mixture that the current environment belongs to
as the unobservable variable. HLMDP resembles the re-
cently proposed Hierarchical Bayesian Bandit [50] model
but focuses on more complex MDP settings.
3 Preliminary
This section introduces Latent MDP and the adaptive belief
setting, both serving as building blocks for our proposed
HLMDP (Section 4) and GDR-MDP (Section 5).
Latent MDP. An episodic Latent MDP [45] is specified
by a tuple (M, T, S,A, µ).Mis a set of MDPs with
cardinality |M| =M. Here T,S, and Aare the shared
episode length (planning horizon), state, and action space,
respectively. µis a categorical distribution over MDPs and
PM
m=1 µ(m) = 1. Each MDP Mm∈ M, m [M]is
a tuple (T, S,A,Pm,Rm, νm)where Pmis the transition
probability, Rmis the reward function and νmis the initial
state distribution.
Latent MDP assumes that at the beginning of each episode,
one MDP from set Mis sampled based on µ(m). It
aims to find a policy πthat maximizes the accumulated
expected return solving maxπPM
m=1 µ(m)Eπ
mPT
t=1 rt,
where Em[·]denotes EPm,Rm[·].
The Adaptive Belief Setting In general, a belief distribu-
tion contains the probability of each possible MDP that the
current environment belongs to. The adaptive belief setting
[5] holds a belief distribution that is dynamically updated
with streamingly observed interactions and prior knowl-
edge about the MDPs. In practice, prior knowledge may
be acquired by rule-based policies or data-driven learning
methods. For example, it is possible to pre-train in sim-
ulated complete information scenarios or exploit unsuper-
vised learning methods based on online collected data [51].
There also exist multiple choices for updating the belief,
such as applying the Bayesian rule as in POMDPs [49] and
representing beliefs with deep recurrent neural nets [52].
4 Hierarchical Latent MDP
In realistic settings, tasks share similarities, and task sub-
populations are common. Although different MDP formu-
lations are proposed to solve multi-task RL, the task rela-
tionships are in general overlooked. To fill in the gap, we
first propose Hierarchical Latent MDP (HLMDP), which
utilizes a hierarchical mixture model to represent distribu-
tions over MDPs. Moreover, we consider the adaptive be-
lief setting to leverage prior information about tasks.
Definition 1 (Hierarchical Latent MDPs).An episodic
HLMDP is defined by a tuple (Z,M, T, S,A, w).Zde-
notes a set of Latent MDPs and |Z| =Z.Mis a set of
MDPs with cardinality |M| =Mshared by different La-
tent MDPs. T,S, and Aare the shared episode length
(planning horizon), state, and action space, respectively.
Each Latent MDP Zz∈ Z, z [Z]consists of a set of
joint MDPs {Mm}M
m=1 and their weights µzsatisfying
PM
m=1 µz(m) = 1.wis the categorical distribution over
Latent MDPs and PZ
z=1 w(z) = 1.
We provide a graphical model of HLMDP in Figure 1(c).
HLMDP assumes that at the beginning of each episode, the
environment first samples a Latent MDP zw(z)and
then samples an MDP mµz(m). HLMDP encodes task
similarity information via the mixture model, and thus con-
tains richer task information than Latent MDP proposed in
[45]. For instance, we could always find one Latent MDP
for each HLMDP. However, there may exist infinitely many
corresponding HLMDPs given one Latent MDP.
HLMDP in Adaptive Belief Setting. When solving
multi-task RL problems, the adaptive setting is shown to
help generate a policy with a higher performance [5] than
the non-adaptive one since it leverages prior knowledge
about the transition model as well as the online collected
data tailored to the unseen environment. Hence we are mo-
tivated to formulate HLMDP in the adaptive belief setting.
HLMDP maintains a belief distribution b(z)over task
groups to model the probability that the current environ-
ment belongs to each group z. At the beginning of each
episode, we initialize the belief distribution with a uniform
distribution b0. We use the Bayesian rule to update beliefs
based on interactions and a prior knowledge base. Note that
the knowledge base are not accurate enough and may lead
to inaccurate belief updates. At timestep t, we get the next
belief estimate bt+1 with the state estimation function SE:
SE(bt, st) = bt(j)L(j)
Pi[Z]bt(i)L(i),j[Z],(1)
wher Under the adaptive belief setting, HLMDP aims to
find an optimal policy ¯π?within a history-dependent pol-
icy class Π, under which the discounted expected cumula-
tive reward is maximized as in Equation 2. Following gen-
eral notations in POMDPs, we denote the history at time
tas ht= (s0, a1, s1, . . . , st1, at1, st)∈ Htcontaining
state-action pairs (s, a). At timestep t, we use both the ob-
served state stand the inferred belief distribution bt(z)as
the sufficient statistics for history ht.
¯
V?= max
πΠ
Eb0:T(z)Eµz(m)Eπ
m
T
X
t=1
γtrt,(2)
where rtdenotes the reward received at step t.b0(z)is the
initial belief at timestep 0.
5 Group Distributionally Robust MDP
The belief update function in Equation 1may not be accu-
rate, which motivates robust decision-making under belief
estimate errors. In this section, we introduce Group Dis-
tributionally Robust MDP (GDR-MDP) which models
task groups and considers robustness against the belief am-
biguity. We then study the convergence property of GDR-
MDP in the infinite-horizon case in Section 5.1. We find
that GDR-MDP’s hierarchical structure helps restrict the
worst-possible value within the ambiguity set and provide
the robustness guarantee in Section 5.2.
Definition 2 (General Ambiguity Sets).Let kbe a k-
simplex. Considering a categorical belief distribution b
k, a general ambiguity set without special structures is
defined as Ckcontaining all possible distributions for b.
Definition 3 (Group Distributionally Robust MDP).
An episodic GDR-MDP is defined by a 8-tuple
(C,Z,M, T, S,A, w, SE).Cis a general belief am-
biguity set. T, S,A,M,Z, w are elements of an episodic
HLMDP as in Definition 1.SE : ∆Z1× S Z1is
the belief updating rule. GDR-MDP aims to find a policy
π?Πthat obtains the following optimal value:
V?= max
πΠmin
ˆ
b0:T
∈CZ1
Eˆ
b0:T(z)Eµz(m)Eπ
m
T
X
t=1
γtrt,(3)
where CZ1is a general ambiguity set tailored to beliefs
over Latent MDPs in set Z.
GDR-MDP naturally balances robustness and performance
by leveraging distributionally robust formulation and rich
distributional information. In contrast to HLMDP, which
maximizes expected return over nominal adaptive belief
distribution (Equation 2), GDR-MDP aims to maximize
the expected return under the worst-possible beliefs within
an ambiguity set CZ1. Moreover, GDR-MDP opti-
mizes over fewer optimization variables than when directly
perturbing MDP model parameters or states. It resem-
bles the group distributionally robust optimization problem
in supervised learning [53,54] but focuses on sequential
decision-making in dynamic environments.
5.1 Convergence in Infinite-horizon Case
With general ambiguity sets (as in Definition 2), calculating
the optimal policy is intractable [33,39]. We propose a
belief-wise ambiguity set that follows the b-rectangularity
to facilitate solving the proposed GDR-MDP.
Assumption 1 (b-rectangularity).We assume a belief-wise
ambiguity set, ˜
C:= NbZ1Cb, where Nrepresents
Cartesian product. bserves as the nominal distribution of
the ambiguity set.
More concretely, the b-rectangularity assumption uncou-
ples the ambiguity set related to different beliefs. When
conditioned on beliefs at each timestep, the minimization
loop selects the worst-case realization unrelated to other
timesteps. The b-rectangularity assumption is motivated
by the s-rectangularity first introduced in [23], which helps
reduce a robust MDP formulation to an MDP formulation
and get rid of the time-inconsistency problem [55]. Ambi-
guity sets beyond rectangularities are recently explored in
[56,57], which we leave for future works.
With b-rectangular ambiguity sets, we derive Bellman
equations to solve Equation 3with dynamic programming.
Detailed proofs are in Appendix Section B.1.
Proposition 1 (Group Distributionally Robust Bellman
Equation).Define the distributionally robust value of an
arbitrary policy πas follows where bt+1 =SE(bt, st).
Vπ
t(bt, st)= min
ˆ
bt:T
Cbt:T
Eˆ
bt:T(z)Eµz(m)Eπt:T
m
T
X
n=t
γntrn|bt, st.
The Group Distirbutionally Robust Bellman expectation
equation is
Vπ
t(bt, st) = min
ˆ
bt∈Cbt
Eˆ
bt(z)Eµz(m)EπthERm[rt]+
γX
st+1
Pm(st+1|st, at)Vπ
t+1(bt+1, st+1)i.(4)
Lemma 1 (Contraction Mapping).Let Vbe a set of real-
valued bounded functions on Z1×S.LV(b, s) : V → V
refers to the Bellman operator defined as
LV(b, s) = max
πΠmin
ˆ
b∈Cb
Eˆ
b(z)Eµz(m)EπhERm[r]+
γX
s0
Pm(s0|s, a)Vπ(SE(b, s), s)i.(5)
LV(b, s)is a γ-contraction operator on the complete met-
ric space (V,k · k). That is, given U, V ∈ V,
kLU− LVkγkUVk.
Theorem 1 (Convergence in Infinite-horizon Case).De-
fine V(b, s)as the infinite horizon value function. For all
b B and s∈ S, we have V(b, s)is the unique solu-
tion to LV(b, s) = V(b, s), and limt→∞ LVt(b, s) =
LV(b, s)uniformly in k·k.
By repeatedly applying the contraction operator in
Lemma 1, the value function will converge to a unique
fixed point, which corresponds to the optimal value based
on Banach fixed point theorem [58].
5.2 Robustness Guarantee of GDR-MDP
This section shows how GDR-MDP’s hierarchical task
structure and the distributionally robust formulation help
balance performance and robustness. We compare the op-
timal value of GDR-MDP denoted as VGDR(π?
GDR), with
three different robust formulations. Group Robust MDP is
a robust version of GDR-MDP with its optimal value de-
noted as VGR(π?
GR). Distributionally Robust MDP holds
a belief over MDPs without the hierarchical task structure
whose optimal value denoted as VDR(π?
DR). Robust MDP
is a robust version of Distributionally Robust MDP, de-
noted as VR(π?
R).π?
·denote optimal policies under differ-
ent formulations. We achieve the comparison by studying
how maintaining beliefs over mixtures affects the worst-
possible outcome of the inner minimization problem and
the resulting RL policy.
We study the worst-possible value via the relationships
between ambiguity sets projected to the space of beliefs
over MDPs. We first define a discrepancy-based ambigu-
ity set that is widely used in existing DRO formulations
[59,60,61].
Definition 4 (Ambiguity set with total variance distance).
Consider a discrepancy-based ambiguity set defined based
on total variance distance. Formally, the ambiguity set is
CνX,dT V (X) = {ν0(X) : sup
X∈X
|ν0(X)νX(X)| ≤ ξ},
where X∈ X is the support, νXis the nominal distribution
over Xand ξis the ambiguity set’s size.
To achieve a reasonable comparison, we control the adver-
sary’s budget ξthe same when perturbing the belief over
task groups zand tasks m, which correspond to different
model misspecification forms when there is a hierarchical
latent structure about tasks.
Theorem 2 (Values of different robust formulations).Let
Um(π) = Eπ
mPT
t=1 γtrt. Let Cb(m),dT V (m)and
Cb(z),dT V (z)denote the ambiguity sets for beliefs over
tasks mand groups z, respectively. b(m)and b(z)satisfy
b(m) = PZµz(m)b(z)and are the nominal distributions.
For any history-dependent policy πΠ, its value function
under different robust formulations are:
VGDR(π) = min
ˆ
b(z)∈Cb(z),dT V (z)
Eˆ
b(z)Eµz(m)[Um(π)],
VGR(π) = min
z[Z]
Eµz(m)[Um(π)],
VDR(π) = min
ˆ
b(m)∈Cb(m),dT V (m)
Eˆ
b(m)[Um(π)],
VR(π) = min
m[M][Um(π)].
We have the following inequalities hold: VGDR(π)
VGR(π)VR(π)and VGDR(π)VDR(π).
Theorem 2shows that with a nontrivial ambiguity set,
the distributionally robust formulation in GDR-MDP helps
regularize the worst-possible value when compared with
robust ones, including the group robust (GR) and task ro-
bust (R) formulations. It also shows that GDR-MDP’s hi-
erarchical structure further helps restrict the effect of the
Figure 2: Hierarchical Latent Bandit examples. (a), (b) and
(c) show the graphical model, the relationship between am-
biguity sets, and different robust formulations’ optimal val-
ues for an example with two groups and two unique tasks.
(d) shows the relationship between ambiguity sets for an
example with two groups and three unique tasks.
adversary, resulting in higher values than the distribution-
ally robust formulation with a flat latent structure (DR). To
get Theorem 2, we first find that when projecting the ξ-
ambiguity set for b(z)to the space of b(m), the resulting
ambiguity set is a subset of the ξ-ambiguity set for b(m).
Proofs are detailed in Appendix Section B.2. Our setting is
different from [62] which states that DRO is a generaliza-
tion of point-wise attacks. The key difference is that when
the adversary perturbs b(m), we omit the expectation over
the mixtures under b(z).
Theorem 3 (Optimal values of different robust formula-
tions).Let π?
·denote the converged optimal policy for
different robust formulations, we have VGDR(π?
GDR)
VGR(π?
GR)VR(π?
R)and VGDR(π?
GDR)VDR(π?
DR).
Based on Theorem 2, we can compare the optimal values
for different robust formulations. Theorem 3shows that
imposing ambiguity set on beliefs over mixtures helps gen-
erate less conservative policies with higher optimal values
at convergence compared with other robust formulations.
Illustration Examples in Figure 2.We provide two hier-
archical latent bandit examples in Figure 2. The first ex-
ample shown in Figure 2(a) has two latent groups with
different weights over two unique MDPs. (b) shows the
ambiguity sets of the example in (a). The orange sets de-
note the ξ-ambiguity sets for the beliefs over mixtures and
MDPs. The green set denotes the ambiguity set projected
from the ξ-ambiguity set for belief distributions over mix-
tures. We show that the mapped set is a subset of the orig-
inal ξ-ambiguity set for the MDP belief distributions. (c)
shows the optimal policy and value of different robust for-
mulations for the example in (a). Our proposed GDR has
the potential to get a less conservative policy with higher
returns than other robust baselines. (d) follows the same
notations in (b) but corresponds to an example with three
possible MDPs. (b) and (d) together shows that the hier-
archical structure helps regularize the adversary’s strength.
The detailed procedure for getting the optimal policies is
shown in Appendix A.
6 Algorithms
To solve the proposed GDR-MDP, we propose novel robust
deep RL algorithms (summarized in Algorithm 2and Algo-
摘要:

GroupDistributionallyRobustReinforcementLearningwithHierarchicalLatentVariablesMengdiXu1,PeideHuang1,YaruNiu1,VisakKumar2,JielinQiu1ChaoFang2,Kuan-HuiLee2,XueweiQi,HenryLam3,BoLi4,DingZhao1AbstractOnekeychallengeformulti-taskReinforce-mentlearning(RL)inpracticeistheabsenceoftaskindicators.RobustRLha...

展开>> 收起<<
Group Distributionally Robust Reinforcement Learning with Hierarchical Latent Variables Mengdi Xu1 Peide Huang1 Yaru Niu1 Visak Kumar2 Jielin Qiu1.pdf

共27页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:27 页 大小:3.29MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 27
客服
关注