conventional formulation, we will impose a martingale constraint in the joint distribution of the
empirical data and the resulting adversarially perturbed data.
Why do we believe that the martingale constraint makes sense as a regularization technique? It turns
out that two random variables
X
and
¯
X
form a martingale in the sense that
E[¯
X|X] = X
if and only
if the distribution of
¯
X
dominates
X
in convex order [
30
]. In this sense, the adversary
¯
X
will have
higher dispersion in non-parametric sense than the observed data
X
but in a suitably constrained
way so that the average locations are preserved. This novel OT-DRO constrained regularization, we
believe, is helpful to potentially combat conservative solutions, see [
16
]. Moreover, by allowing a
small amount of violation in the martingale property, we can control the regularization properties of
this constraint, thus obtaining a natural interpolation towards the conventional OT-DRO formulation
and potentially improved regularization performance. We point out that related optimal transport
problems with martingale constraints have been studied in robust mathematical finance [1,8].
Consider, for example, the linear regression setting with the exact martingale constraints, which
means that for any given observed data point, the conditional expectation of the additive perturbation
under the worst-case joint distribution equals zero. Surprisingly, we show that the resulting martingale
DRO model is exactly equivalent to the ridge regression [
18
] with the Tikhonov regularization. To
the best of our knowledge, this paper is the first work to interpret the Tikhonov regularization from
a DRO perspective showing that it is distributionally robust in a precise non-parametric sense. In
stark contrast, it is well-known that the conventional OT-based DRO model (without the martingale
constraint) is identical to the regularized square-root regression problem [
2
]. Therefore, introducing
an additional power in norm regularization (i.e., converting square-root regression to Tikhonov
regularization) can be translated into adding martingale constraints in the adversarial perturbations
thus reducing the adversarial power. A natural question that arises here is whether we can interpolate
between the conventional DRO model and the Tikhonov regularization, and further improve them.
We will provide a comprehensive and positive answer to the above question in this paper. The key idea
here is to relax the equality constraint on the conditional expectation of the adversarial violation and
thus allow a small perturbation of the martingale property to gain more flexibility of the uncertainty
set. This idea leads to another novel model, termed the perturbed martingale DRO in the sequel.
Intuitively, if the relaxation is sufficiently loose, the perturbed martingale DRO model will reduce to
the conventional DRO, which is formally equivalent to setting an infinite amount of possible violations
for the martingale constraint. By contrast, if no violation is allowed, the perturbed martingale DRO
will automatically reduce to the exact counterpart — Tikhonov regularization. As a result, we are
able to introduce a new class of regularizers via the interpolation between the conventional DRO
model and the Tikhonov regularization.
Furthermore, such insightful interpolation also works for a broad class of nonlinear learning models.
Inspired by our extensive exploration of linear regression, the developed martingale DRO model can
also provide a new principled adversarial training procedure for deep neural networks. Extensive
experiments are conducted to demonstrate the effectiveness of the proposed perturbed martingale
DRO model for both linear regression and deep neural network training under the adversarial setting.
We summarize our main contributions as below:
•
We reveal a new hidden connection in this paper, that is, Tikhonov regularization is optimal
transport robust when exact martingale constraints (i.e., convex order between the adversary and
empirical data) are imposed.
•
Upon this finding, we develop a new perturbed martingale DRO model, which not only provides
a unified viewpoint of existing regularization techniques, but also leads to a new class of robust
regularizers.
•
We introduce an easy-to-implement computational approach to capitalize the theoretical benefits
in practice, in both linear regression and neural network training under the adversarial setting.
•
As a byproduct, the strong duality theorem, which is proved in this paper and is used as the main
technical tool, can be applied to a wider spectrum of problems of independent interest.
2 Preliminaries
Let us introduce some basic definitions and concepts preparing for the subsequent analysis.
2