
this flow. In particular, if our goal is to guarantee min
0≤s≤tIStein (ρs|π)≤ε, result (2) says that we
need to take
t≥DKL(ρ0|π)
ε.
Unfortunately, and this is the key motivation for our work, the quantity the initial KL divergence
DKL (ρ0|π)can be very large. Indeed, it can be proportional to the underlying dimension, which
is highly problematic in high dimensional regimes. Salim et al. (2021) and Sun et al. (2022) have
recently derived an iteration complexity bound for the infinite particle SVGD method. However,
similarly to the time complexity of the continuous flow, their bound depends on DKL (ρ0|π).
1.1 SUMMARY OF CONTRIBUTIONS
In this paper, we design a family of continuous time flows—which we call β-SVGD flow—by
combining importance weights with the kernelized gradient flow of the KL-divergence. Surpris-
ingly, we prove that the time for this flow to converge to the equilibrium distribution π, that is
min0≤s≤tIStein (ρs|π)≤εwith (ρs)t
s=0 generated along β-SVGD flow, can be bounded by
−1
εβ(β+1) when β∈(−1,0). This indicates that the importance weights can potentially accel-
erate SVGD. Actually, we design β-SVGD method based on a discretization of the β-SVGD flow and
provide a descent lemma for its population limit version. Some simple experiments in Appendix D
verify our predictions.
We summarize our contributions in the following:
•A new family of flows. We construct a family of continuous time flows for which we
coin the name β-SVGD flows. These flows do not arise from a time re-parameterization of
the SVGD flow since their trajectories are different, nor can they be seen as the kernelized
gradient flows of the R´
enyi divergence.
•Convergence rates. When β→0, this returns back to the kernelized gradient flow of the
KL-divergence (SVGD flow); when β∈(−1,0), the convergence rate of β-SVGD flows
is significantly improved than that of the SVGD flow in the case DKL (ρ0|π)is large.
Under a Stein Poincar´
e inequality, we derive an exponential convergence rate of 2-R´
enyi
divergence along 1-SVGD flow. Stein Poincar´
e inequality is proved to be weaker than Stein
log-Sobolev inequality, however like Stein log-Sobolev inequality, it is not clear to us when
it does hold.
•Algorithm. We design β-SVGD algorithm based on a discretization of the β-SVGD flow
and we derive a descent lemmas for the population limit β-SVGD.
•Experiments. Finally, we do some experiments to illustrate the advantages of β-SVGD
with negative β. The simulation results on β-SVGD corroborate our theory.
1.2 RELATED WORKS
The SVGD sampling technique was first presented in the fundamental work of Liu & Wang (2016).
Since then, a number of SVGD variations have been put out. The following is a partial list: Newton
version SVGD (Detommaso et al., 2018), stochastic SVGD (Gorham et al., 2020), mirrored SVGD
(Shi et al., 2021), random-batch method SVGD (Li et al., 2020) and matrix kernel SVGD (Wang
et al., 2019). The theoretical knowledge of SVGD is still constrained to population limit SVGD. The
first work to demonstrate the convergence of SVGD in the population limit was by Liu (2017); Korba
et al. (2020) then derived a similar descent lemma for the population limit SVGD using a different
approach. However, their results relied on the path information and thus were not self-contained,
to provide a clean analysis, Salim et al. (2021) assumed a Talagrand’s T1inequality of the target
distribution πand gave the first iteration complexity analysis in terms of dimension d. Following
the work of Salim et al. (2021); Sun et al. (2022) derived a descent lemma for the population limit
SVGD under a non-smooth potential V.
In this paper, we consider a family of generalized divergences, R´
enyi divergence, and SVGD with
importance weights. For these two themes, we name a few but non-exclusive related results. Wang
et al. (2018) proposed to use the f-divergence instead of KL-divergence in the variational inference
problem, here fis a convex function; Yu et al. (2020) also considered variational inference with
2