
2Statistical learning for ψ-weakly dependent processes
The goal is to construct a learner h∈ H such that, for any t∈Z,h(Xt) is average ”close” to Yt; that is, a
learner which achieves the small averaged risk. The empirical risk (with respect to the training sample) of a
hypothesis his given by
b
Rn(h) = 1
n
n
X
i=1
`h(Xi), Yi.
In the sequel, we set `(h, z) = `h(x), yfor all z= (x, y)∈ X × Y and h:X → Y. The setting considered
here covers many commonly used situations: regression estimation, classification (pattern recognition when Y
is finite), autoregressive models prediction (we can take Xt= (Yt−1,··· , Yt−k) and X=Ykfor some k∈N),
autoregressive models with exogenous covariates.
Consider a target (with respect to H) function hH(assumed to exist), given by,
hH= argmin
h∈H
R(h);
and the empirical target
b
hn= argmin
h∈H b
Rn(h).(1.1)
We focus on the empirical risk minimization (ERM) principle and aim to study the relevance of estimation
of hHby b
hn. The capacity of b
hnto approximate hHis now as the generalization capability of the ERM
algorithm. This generalization capability is accessed by studying how R(b
hn) is close to R(hH). The deviation
between R(b
hn) and R(hH) is the generalization error the algorithm. When R(b
hn)−R(hH) = oP(1), the ERM
algorithm is said to be consistent within the hypothesis class H.
The study of a learning algorithm includes the calibration of a bound of the generalization error for any
fixed n(non asymptotic property) and the investigation of consistency (asymptotic property). There exist
several important contributions in the literature devoted to statistical learning for dependent observations,
with various types of dependence structure. See among others papers, [25], [26], [36], [28], [35], [18], [23]
for some developments under mixing conditions and [39], [37], [38], [33] for some results for Markov chains.
[1] considered a prediction of time series under θ-weakly dependent condition. They established convergence
rates using the PAC-Bayesian approach. See also [6], [22], [24] for some Bernstein-type inequality for τ-mixing
process and some advances on time series forecasting using statistical learning paradigm. However, most of
the above works are developed within a mixing condition or for time series prediction or do not consider a
general setting that includes pattern recognition, regression estimation, time series prediction,···
In this new contribution, we consider a general learning framework where the observations Dn={Z1=
(X1, Y1),··· , Zn= (Xn, Yn)}is a trajectory of a ψ-weakly dependent process {Zt= (Xt, Yt), t ∈Z}with
values in a Banach space Z=X × Y. The following issues are addressed.
(i) Consistency of the ERM algorithm. We establish the consistency of the ERM algorithm within any
space Hof Lipschitz predictors. In comparison with the existing works, let us stress that, the ψ-weakly
dependent structure considered here is a more general concept and it is well known that many weak
dependent processes do not fulfill the mixing conditions, see for instance [10].
(ii) Generalization bounds and convergence rates. When X ⊂ Rd(with d∈N), Y ⊂ Rand His a
subset of a H¨older space Csfor some s > 0, generalization bounds are derived and the learning rate is
provided. This rate is close to the usual O(n−1/2) when sd.