Machine-learning-assisted Monte Carlo fails at sampling computationally hard problems Simone Ciarella1Jeanne Trinquier2 1Martin Weigt2and Francesco Zamponi1

2025-05-02 0 0 4.59MB 24 页 10玖币
侵权投诉
Machine-learning-assisted Monte Carlo
fails at sampling computationally hard problems
Simone Ciarella,
1,
Jeanne Trinquier,
2, 1,
Martin Weigt,
2
and Francesco Zamponi
1
1
Laboratoire de Physique de l’Ecole Normale Supérieure, ENS, Université PSL,
CNRS, Sorbonne Université, Université de Paris, F-75005 Paris, France
2
Sorbonne Université, CNRS, Institut de Biologie Paris Seine,
Biologie Computationnelle et Quantitative LCQB, F-75005 Paris, France
(Dated: March 13, 2023)
Several strategies have been recently proposed in order to improve Monte Carlo sampling efficiency
using machine learning tools. Here, we challenge these methods by considering a class of problems
that are known to be exponentially hard to sample using conventional local Monte Carlo at low
enough temperatures. In particular, we study the antiferromagnetic Potts model on a random graph,
which reduces to the coloring of random graphs at zero temperature. We test several machine-
learning-assisted Monte Carlo approaches, and we find that they all fail. Our work thus provides
good benchmarks for future proposals for smart sampling algorithms.
I. INTRODUCTION
A. Motivations
Sampling from a given target probability distribution
Pt
(
σ1,· · · , σN
)over
N
degrees of freedom can become
extremely hard when
N
is large. A universal (i.e. system-
independent) strategy for sampling consists in starting
from a random configuration of
σ
=
{σi}i=1,··· ,N
, and
generating a local Monte Carlo Markov Chain (MCMC),
by sequentially proposing an update of one of the
σi
, and
accepting or rejecting it with a proper probability (e.g.
Metropolis-Hastings), until convergence [
1
]. However,
for large
N
, the convergence time of the MCMC can
grow exponentially in
N
, because of non-trivial long-
range correlations that make local decorrelation extremely
hard [2].
A solution to this problem consists in identifying the
proper set of correlated variables, and proposing global
updates of such variables together, in such a way to speed
up convergence [
3
]. However, this process is not universal,
because it relies on the proper identification of system-
dependent correlations, which is not always possible. For
instance, in disordered systems such as spin glasses, the
nature of correlated domains is extremely elusive and
proper global moves are not easy to identify [
4
,
5
]. An-
other approach, which has been particularly successful
in atomistic models of glasses, consists in unconstraining
some degrees of freedom, evolve them and constrain them
back [
6
10
], but again it is model-specific. Alternative pro-
posals based on a renormalization group approach [
11
,
12
]
also rely on the identification of system-dependent collec-
tive variables.
A recently developed line of research, see e.g. [
13
20
],
proposed to solve the problem in an elegant and universal
way, by machine learning proper MCMC moves. In a
These authors contributed equally. Email: simone.ciarella@ens.fr,
jeanne.trinquier@ens.fr
nutshell, the idea is to learn an auxiliary probability
distribution
Pa
(
σ
), which (i) can be sampled efficiently
(e.g. linearly in
N
) and (ii) provides a good approximation
of the target probability. Then, the hope is to use the
auxiliary distribution to propose smart MCMC moves.
Using this strategy with autoregressive architectures that
ensure efficient sampling, some authors found convergence
speedup [
13
,
14
,
16
], but others found less promising
results [20].
In order to make these studies more systematic, and
really assess the performance of the method, it is impor-
tant to have good benchmarks, i.e. problems that are
guaranteed to be really hard to sample by local MCMC.
In the early 90s, the very same problem had to be faced
to assess the performance of local search algorithms that
looked for solution of optimization or satisfiability prob-
lems [
21
]. In that case, the problem of generating good
benchmarks was solved by introducing an ensemble of
random instances of the problem under study [
21
24
]. It
was later shown, both numerically and analytically, that
these random optimization/satisfiability problems require
a time scaling exponentially in
N
for proper sampling at
low enough temperatures in certain regions of parameter
space [
2
]. Hence, they provide very good benchmarks for
sampling algorithms. Yet, the recent attempts to apply
machine learning methods to speed-up sampling have not
considered these benchmarks.
In this paper, we consider a prototypical hard-to-sample
random problem, namely the coloring of random graphs,
and we show that all the proposed methods fail to solve
it. Our results confirm that this class of problems are
a real challenge for sampling methods, even assisted by
smart machine-learned moves. The model investigated
in [
20
] possibly belong to this class. In addition, we
discuss some practical issues such as mode-collapse in
learning the auxiliary model, which happens when the
target probability distribution has multiple peaks and the
auxiliary model only learns one (or a subset) of them.
arXiv:2210.11145v3 [cond-mat.dis-nn] 10 Mar 2023
2
B. State of the art
Before proceeding, we provide a short review of the
papers that motivated our study. Because the field is
evolving rapidly, this does not aim at being an exhaustive
review, and despite our best efforts, it is possible that we
missed some relevant references.
Ref. [
25
] considered the general problem of whether a
target probability distribution
Pt
can be approximated
by a simpler one
Pa
, in particular by considering the
Kullback-Leibler (KL) divergence
DKL(Pt||Pa) = log Pt(σ)
Pa(σ)Pt
.(1)
If this quantity is proportional to
N
for
N→ ∞
, then
Pt
(
σ
)
/Pa
(
σ
)is typically exponential in
N
, and as a result
samples proposed from
Pa
are very unlikely to be accepted
in
Pt
. A small
DKL
(
Pt||Pa
)
/N
(ideally vanishing for
N→ ∞
) seems therefore to be a necessary condition for
a good auxiliary probability, which provides a quantitative
measure of condition (ii) above. Ref. [
25
] suggested, by
using small disordered systems (
N20
), that there might
be a phase transition, for
N→ ∞
, separating a phase
where
DKL
(
Pt||Pa
)
/N
vanishes identically and a phase
where it is positive.
Ref. [
13
] proposed, more specifically, to use autoregres-
sive models as tractable architectures for
Pa
. In these
architectures, Pais represented using Bayes’ rule,
Pa(σ) = P1
a(σ1)P2
a(σ2|σ1)· · · PN
a(σN|σN1,· · · , σ1).
(2)
Each term
Pi
a
is then approximated by a neural network,
which takes as input
{σ1,· · · , σi1}
and gives as output
Pi
a
, i.e. the probability of
σi
conditioned to the input.
Such a representation of
Pa
, also called Masked Autoen-
coder for Distribution Estimator (MADE) [
26
], allows for
very efficient sampling, because one can first sample
σ1
,
then
σ2
given
σ1
, and so on, in a time scaling as the sum
of the computational complexity of evaluating each of
the
Pi
a
, which is typically polynomial in
N
for reasonable
architectures. Hence, this scheme satisfies condition (i)
above. The simplest choice for such a neural network is
a linear layer followed by a softmax activation function.
Ref. [
13
] showed that using such an architecture, several
statistical models could be well approximated, and the
Boltzmann distribution of a Sherrington-Kirkpatrick (SK)
spin glass model (with
N
= 20) could be efficiently sam-
pled. Note that the model in Ref. [
13
] was trained by
avariational procedure, which minimizes
DKL
(
Pa||Pt
)
instead of
DKL
(
Pt||Pa
). This method is computationally
very efficient as it only requires an average over
Pa
, which
can be sampled easily, instead of
Pt
, but it is prone to
mode-collapse (see Sec. II for details). Moreover, this
work was limited to quite small N.
Following up on Ref. [
13
], Ref. [
14
] considered as tar-
get probability the Boltzmann distribution of a two-
dimensional (2d) Edwards-Anderson (EA) spin glass
model at various temperatures
T
, and used a Neural Au-
toregressive Distribution Estimator (NADE) [
27
], which
is a variation of the MADE meant to reduce the number
of parameters. Furthermore, the model was trained using
a different scheme from Ref. [
13
], called sequential temper-
ing, which tries to minimize
DKL
(
Pt||Pa
), thus preventing
mode-collapse. To this aim, at first, a sample from
Pt
is
generated at high temperature, which is easy, and used
to learn
Pa
. Then, temperature is slightly reduced and
smart MCMC sampling is performed using the
Pa
learned
at the previous step, to generate a new sample from
Pt
,
which is then used in the next step. If
Pa
remains a good
approximation to
Pt
and MCMC sampling is efficient, this
strategy ensures a correct minimization of
DKL
(
Pt||Pa
).
This was shown to be the case in Ref. [
14
], down to low
temperatures for a 2d EA model of up to
N
= 225 spins.
Ref. [
15
] introduced a different scheme for learning
Pa
.
This adaptive scheme combines local MCMC moves with
smart
Pa
-assisted MCMC moves, together with an online
training of
Pa
. It was successfully tested using a different
architecture for
Pa
(called normalizing flows), on problems
with two stable states separated by a high free energy
barrier. Note that normalized flows can be equivalently
interpreted as autoregressive models [
28
30
]. Ref. [
16
]
also proved the effectiveness of smart assisted MCMC
moves in a 2d Ising model and an Ising-like frustrated
plaquette model.
Several other groups [
17
20
] investigated a problem re-
lated to sampling, namely that of simulated annealing [
31
]
for finding ground states of optimization problems. This
is an a priori slightly easier problem, because simulated
annealing does not need to equilibrate at all temperatures
to find a solution [
32
,
33
]. In these works, simulated
annealing moves were once again assisted by machine
learning. Ref. [
17
] tested their procedure on the 2d EA
and SK models, and Ref. [
18
] considered a 2d, 3d, and
4d EA model. However, while finding the exact ground
state of the SK and EA (for
d
3) models is hard, in
practice for not too large random instances the problem
can be solved by a proper implementation of standard
simulated annealing [
34
], and the scaling of these methods
with system size remains poorly investigated. Ref. [
19
]
considered the graph coloring problem, which is the zero-
temperature version of the benchmark problem we pro-
pose to use in this work, and found that a Graph Neural
Network (GNN) can propose moves that allow one to
efficiently find a proper coloring with comparable perfor-
mances to (but not outperforming) state-of-the-art local
search algorithm. Additionally, GNN have shown to be
successful at solving discrete combinatorial problems [
35
],
but they do not provide much advantage over classical
greedy algorithms, and sometimes they can even show
worse performance [
36
,
37
]. Finally, Ref. [
20
] showed that
the machine-learning-assisted simulated annealing scheme
does not work on a glassy problem with a rough energy
landscape.
These works provided a series of inspiring ideas to im-
prove sampling in disordered systems via machine learning
3
FIG. 1. Sketches of the autoregressive architectures used in this work.
smart MCMC moves. Yet, the question of whether ma-
chine learning can really speed up sampling in problems
that are exponentially hard to sample via local MCMC
remains open. This wide class of systems include many
problems of interest, such as optimization problems (e.g.
random SAT or random graph coloring) [
38
] and mean-
field glass-forming materials [39–42].
C. Summary
In this work, we test machine-learning-assisted MCMC
in what is considered to be a prototypical hard-to-sample
model, namely the coloring of random graphs [
21
,
43
,
44
].
Before doing that, we also tested and reproduced previous
results in simpler cases.
The models we consider are:
(1)
The mean-field ferromagnetic problem, usually
called the Curie-Weiss (CW) model, to gain some
analytical insight into the different ways of training
the auxiliary model.
(2)
A two-dimensional Edwards-Anderson spin glass (2d
EA) model. We consider this as an ‘easy’ problem
(because, for instance, its ground state can be found
in polynomial time), and we use it to reproduce pre-
vious results, compare different architectures, and
gain insight on the role of some hyperparameters.
(3)
The coloring (COL) of a random graph, which at
finite temperature becomes an antiferromagnetic
Potts model. In the proper range of parameters,
this problem is proven to be exponentially hard to
sample via local MCMC [2, 44], and we use it as a
benchmark to understand whether smart MCMC
can improve the sampling efficiency.
Any machine learning model that satisfies the autoregres-
sive property can be trained and used as an auxiliary
distribution to propose smart moves. However, on the
one hand, for complex problems, shallow or simple models
might not be expressive enough to accurately learn the
target distribution. On the other hand, if a problem can
be easily solved by a simpler model, there is no need to
employ complex deep architectures. In this paper we used
several standard architectures illustrated in Fig. 1 and
detailed in the SI:
The MADE [
13
], which is an autoregressive deep
neural network; when its depth is equal to zero, this
corresponds to a ‘shallow’ or single-layer autoregres-
sive model.
The NADE [
14
,
27
], which corresponds to a MADE
with additional constraints on the parameters, with
different depths and number of hidden units. This
architecture has proven to be effective in image de-
tection [
45
], filtering [
46
] and quantum systems [
47
].
For the coloring, because neither the MADE nor the
NADE perform well, we also tested an autoregres-
sive GNN that we called Graph Autoregressive Dis-
tribution Estimator (GADE), and a non-symmetric
MADE (called ColoredMADE).
Finally, we tried several strategies to learn the auxiliary
model:
(I)
Maximum likelihood: we generate a sample from
the target distribution, and we use it to train the
auxiliary model by maximum likelihood. While
this is not a technique to generate samples from
Pt
,
because one needs the samples to begin with, it is
the best way to test if a given architecture for
Pa
is
expressive enough.
(II)
Variational: we minimize the KL divergence
DKL
(
Pa||Pt
), which also corresponds to the vari-
ational free energy of the auxiliary model when
considered as an approximation of the true one [
13
].
(III)
Sequential tempering: we train the auxiliary model
at higher temperature
T
, then use it to generate
samples at lower
T
, and use the new samples to
re-train the auxiliary model, and so on [14, 15].
The core of our work is the application of methods (II)
and (III) to attempt a sampling at low temperatures, for
which local MCMC does not decorrelate fast enough. For
the 2d EA model, we find that basically all the techniques
and architectures perform well down to very low temper-
atures, although (II) is more prone to mode-collapse. We
confirm that machine learning MCMC moves can provide
4
a speedup in this case [
14
]. For the COL problem, we find
that none of these methods work well, even at moderately
high temperatures located within the paramagnetic phase
of the model.
II. METHODS
We consider a specific application of the general scheme
discussed in Sec. I, in which we want to approximate the
Boltzmann distribution associated to the ‘true’ Hamilto-
nian
H
(
σ
)at a fixed inverse temperature
β
= 1
/T
(the
target),
Pt(σ) = PB(σ) = eβH(σ)
Z,(3)
with an autoregressive (AR) network, i.e.
Pa(σ) = PAR(σ)
. In most cases, sampling is easy
at small
β
and becomes harder and harder as
β
is
increased. We now discuss different possible strategies to
learn a proper PAR(σ).
A. Maximum likelihood
If a sufficiently large sample of configurations
{σm}m=1,··· ,M
is available, independently and identically
sampled from
PB
(
σ
), it is possible to train the model by
maximizing the probability of the sample according to
the AR model itself, i.e. to use the maximum likelihood
method.
Assuming that the AR model is specified by a set of
parameters
θ
, we maximize the likelihood of the observed
data, defined as
L(θ) =
M
Y
m=1
PAR(σm|θ).(4)
Equivalently, if
Pemp
(
σ
) =
1
MPM
m=1 δσ,σm
is the empiri-
cal distribution of the sample, we minimize
D(θ) = DKL(Pemp||PAR)
=log M1
M
M
X
m=1
log PAR(σm|θ).(5)
The optimal parameters ˆ
θare given by
ˆ
θ= argmaxθL(θ) = argminθD(θ).(6)
The gradient of
D
(
θ
)can be computed analytically in
terms of the
σm
, because
log PAR
(
σ|θ
)is given explicitly
(or by back-propagation) as a function of
θ
at fixed
σ
,
and the training can thus be performed efficiently.
Note that if
PB
(
σ
)has multiple peaks, for sufficiently
large
M
all these peaks are represented in the empirical
distribution with the correct weights. Hence, when per-
forming maximum likelihood to learn
PAR
(
σ
), the learned
model should be able to represent all the peaks of
PB
(
σ
),
provided the AR model has enough free parameters, i.e.
is expressive enough. Then, mode-collapse will not occur.
Obviously, the maximum-likelihood approach relies on
the quality of the initial sample, which has to be repre-
sentative of the true distribution. Such a sample is by
definition difficult to obtain for the really hard sampling
problems that we want to solve. Moreover, if we were
able to obtain such a sample by conventional means, there
would be no need for any smart MCMC scheme. Never-
theless, the maximum-likelihood approach constitutes a
reliable and effective way to test if a specific AR archi-
tecture is capable of learning the complexity of a specific
problem. We will thus use this scheme, in cases where
standard sampling from
PB
(
σ
)is possible, to test the
expressive quality of our AR architectures.
B. Variational approach
Ref. [
13
] proposes to bypass the need for sampling
from
PB
(
σ
)by using a variational approach. In-
stead of minimizing
DKL
(
PB||PAR
)or its approximation
DKL
(
Pemp||PAR
), as in Sec. II A, we want here to mini-
mize
DKL
(
PAR||PB
) =
PσPAR
(
σ
)
log PAR(σ)
PB(σ)
, or equiva-
lently [13] the variational free energy:
βF [PAR] = X
σ
PAR(σ)(βH(σ) + log PAR(σ))
=βF [PB] + DKL(PAR||PB).
(7)
As it is well known in statistical mechanics, because the
KL divergence is positive,
F
[
PAR
]is minimized when
PAR
=
PB
and
DKL
(
PAR||PB
) = 0, and otherwise it
provides an upper bound to the true free energy.
The gradient with respect to the parameters
θ
that
define the AR model can be written as an expectation
value over the AR model itself [13],
βθF[PAR] = hQ(σ)θlog PAR(σ)iPAR ,
Q(σ) = βH(σ) + log PAR(σ).(8)
The learning can then be done by gradient descent on
F
[
PAR
], sampling from the AR model to estimate the
gradient via Eq.
(8)
. We used, as a condition to stop the
gradient descent, that the variance of
Q
(
σ
)over batches
of generated data is smaller than a given threshold. In-
deed, if the AR distribution is exactly the Boltzmann one,
then
Q
(
σ
)is a constant. Reciprocally, if the variance of
Q
(
σ
)is zero, then the AR distribution is proportional
to the Boltzmann distribution whenever
PAR
(
σ
)
>
0,
but not necessarily over all possible
σ
, due to mode-
collapse. More specifically, if we have mode-collapse, then
PAR
(
σ
) =
PB
(
σ
)
/Z
in some regions of the space of
σ
(typ-
ically around some of the peaks of
PB
) and
PAR
(
σ
)=0
elsewhere, where the proportionality constant
Z<
1
represents the total probability covered by
PAR
(
σ
)in
PB
(
σ
). We then obtain
DKL
(
PAR||PB
) =
log Z>
0,
5
because the regions where
PAR
(
σ
) = 0 do not contribute
to the sum. While this solution has larger KL diver-
gence with respect to the optimal one (
PAR
=
PB
,
DKL
(
PAR||PB
)=0), it can be a local minimum of
DKL
(
PAR||PB
)in which the variationally-trained AR
model can be trapped, thus learning only a part of the
landscape.
C. Local versus global MCMC
Standard local MCMC usually consists in selecting a
variable at random and then proposing a random (or
semi-random) change. If the MC moves respect micro-
scopic detailed balance, the MCMC is guaranteed to
converge to the correct Boltzmann distribution. This can
be achieved, e.g., by accepting/rejecting the proposed MC
moves following the Metropolis rule, where the acceptance
probability of a move from configuration
σold
to
σnew
is
defined as:
Acc [σold σnew] = min 1,PB(σnew)
PB(σold).(9)
We call this scheme local MCMC because each move
consists in a change of a single degree of freedom.
In contrast, we can sample from our autoregressive
model
PAR
(
σ
)to generate a new proposed configuration
σnew
. It is still useful to respect microscopic detailed
balance in order to ensure convergence to equilibrium, and
for this reason the replacement
σold σnew
is accepted
with probability
Acc [σold σnew] = min 1,PB(σnew)×PAR(σold)
PB(σold)×PAR(σnew).
(10)
Note that, because
σnew
is generated from scratch by the
AR model, it is in most cases completely different from
σold
, hence the resulting move is non-local and this is why
this scheme is called global MCMC.
We also note that this global MCMC scheme is very
similar to importance sampling, in which
M
i.i.d. samples
σm
are generated from
PAR
(
σ
), and then reweighted by
W
(
σm
) =
PB
(
σm
)
/PAR
(
σm
)to compute averages. How-
ever, the formulation in terms of a MCMC is convenient
to compare with local MCMC, to monitor efficiency via
the acceptance rate (which is morally equivalent to a par-
ticipation ratio in importance sampling), and to perform
smart protocols (e.g. sequential tempering) during the
MCMC dynamics [
14
,
15
]. This is why we stick to this
formulation in this paper.
The reweighting factor
W
(
σ
) =
PB
(
σ
)
/PAR
(
σ
)that
appears both in importance sampling and in Eq.
(10)
is
the crucial quantity for the efficiency of the global MCMC
scheme. If
W
(
σ
)is typically exponential in
N
, then its
fluctuations are too wild and moves are almost never ac-
cepted. The KL divergence is precisely the average of
log W
(
σ
), either on the Boltzmann or on the AR distribu-
tion, and if it is too large (in particular, growing linearly
in N) the method is doomed to failure.
D. Sequential tempering
Sequential tempering is a technique used in Ref. [
14
]
to learn the AR probability at a larger
β
using data
from lower
β
, which can be convenient because collecting
data becomes harder upon increasing
β
. The first step
consists in collecting a sample via local MCMC at low
β
,
where sampling is easy, and then training an AR model to
reproduce this sample by maximum likelihood (Sec. II A).
Next, in order to create a new sample at
β
+
δβ
, we use
global MCMC by proposing moves with the previous AR
model, at the new temperature. The acceptance rule then
becomes:
Acc [σold σnew] = min 1,e(β+δβ)H(σnew )PAR(σold)
e(β+δβ)H(σold )PAR(σnew).
(11)
We then learn a new AR model from the new sample by
maximum likelihood, and iterate until either we reach the
temperature of interest, or the convergence time of the
global MCMC exceeds some fixed threshold, indicating
a failure of the training procedure. A related adaptive
global MCMC scheme has been introduced in Ref. [
15
]
and is detailed in the SI.
E. Evaluation of the AR model
Once the learning is completed, one can use several
observables to evaluate the quality of the AR model.
By completion of the learning we mean convergence of
the gradient ascent in maximum likelihood (Sec. II A),
convergence of the gradient descent in the variational free
energy (Sec. II B), or reaching the target temperature with
high enough acceptance rate in the sequential tempering
(Sec. II D).
As a first check, we can use the AR model to estimate
thermodynamic observables (energy, entropy, correlations)
of the true Hamiltonian. If the AR model has a lower en-
tropy than the true one, the AR model is probably suffer-
ing mode-collapse. Another interesting observable is the
KL divergence, either
DKL
(
PAR||PB
)or
DKL
(
PB||PAR
),
which measures how well the AR model approximates the
target one, and more quantitatively provides the average
of the reweighting factor, as discussed in Sec. II C. A more
easily accessible quality measure consists in generating
samples with the AR model, then evolving them with
local MCMC and checking if the energy remains constant
and the correlation functions remain time-translationally
invariant [
6
,
41
], as expected if the initial configuration is
a good equilibrium one.
A very important measure of the quality of the AR
model is the acceptance rate of the global MCMC, as
a high acceptance rate indicates that the AR model de-
scribes well the true model, at least in the region explored
by the global MCMC. Yet, the AR model could be mode-
collapsed and still keep a high acceptance rate, because
the global MCMC would then only explore the region on
摘要:

Machine-learning-assistedMonteCarlofailsatsamplingcomputationallyhardproblemsSimoneCiarella,1,JeanneTrinquier,2,1,MartinWeigt,2andFrancescoZamponi11LaboratoiredePhysiquedel'EcoleNormaleSupérieure,ENS,UniversitéPSL,CNRS,SorbonneUniversité,UniversitédeParis,F-75005Paris,France2SorbonneUniversité,CNR...

展开>> 收起<<
Machine-learning-assisted Monte Carlo fails at sampling computationally hard problems Simone Ciarella1Jeanne Trinquier2 1Martin Weigt2and Francesco Zamponi1.pdf

共24页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:24 页 大小:4.59MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 24
客服
关注