
InfoOT: Information Maximizing Optimal Transport
Ching-Yao Chuang 1Stefanie Jegelka 1David Alvarez-Melis 2
Abstract
Optimal transport aligns samples across distribu-
tions by minimizing the transportation cost be-
tween them, e.g., the geometric distances. Yet,
it ignores coherence structure in the data such as
clusters, does not handle outliers well, and can-
not integrate new data points. To address these
drawbacks, we propose InfoOT, an information-
theoretic extension of optimal transport that max-
imizes the mutual information between domains
while minimizing geometric distances. The re-
sulting objective can still be formulated as a (gen-
eralized) optimal transport problem, and can be
efficiently solved by projected gradient descent.
This formulation yields a new projection method
that is robust to outliers and generalizes to unseen
samples. Empirically, InfoOT improves the qual-
ity of alignments across benchmarks in domain
adaptation, cross-domain retrieval, and single-
cell alignment. The code is available at
https:
//github.com/chingyaoc/InfoOT.
1. Introduction
Optimal Transport (OT) provides a general framework with
a strong theoretical foundation to compare probability dis-
tributions based on the geometry of their underlying spaces
(Villani,2009). Besides its fundamental role in mathematics,
OT has increasingly received attention in machine learning
due to its wide range of applications in domain adapta-
tion (Courty et al.,2017;Redko et al.,2019;Xu et al.,
2020), generative modeling (Arjovsky et al.,2017;Bous-
quet et al.,2017), representation learning (Ozair et al.,2019;
Chuang et al.,2022), and generalization bounds (Chuang
et al.,2021). The development of efficient algorithms (Cu-
turi,2013;Peyr
´
e et al.,2016) has significantly accelerated
the adoption of optimal transport in these applications.
1
MIT CSAIL, Cambridge, MA, USA
2
Microsoft Research,
Cambridge, MA, USA. Correspondence to: Ching-Yao Chuang
<cychuang@mit.edu>.
Proceedings of the
40 th
International Conference on Machine
Learning, Honolulu, Hawaii, USA. PMLR 202, 2023. Copyright
2023 by the author(s).
Computationally, the discrete formulation of OT seeks a
matrix, also called transportation plan, that minimizes the
total geometric transportation cost between two sets of sam-
ples drawn from the source and target distributions. The
transportation plan implicitly defines (soft) correspondences
across these samples, but provides no mechanism to relate
newly-drawn data points. Aligning these requires solving a
new OT problem from scratch. This limits the applicability
of OT, e.g., to streaming settings where the samples arrive in
sequence, or very large datasets where we can only solve OT
on a subset. In this case, the current solution cannot be used
on future data. To overcome this fundamental constraint,
a line of work proposes to directly estimate a mapping,
the pushforward from source to target, that minimizes the
transportation cost (Perrot et al.,2016;Seguy et al.,2017).
Nevertheless, the resulting mapping is highly dependent
on the complexity of the mapping function (Galanti et al.,
2021).
OT could also yield alignments that ignore the intrinsic
coherence structure of the data. In particular, by relying
exclusively on pairwise geometric distances, two nearby
source samples could be mapped to disparate target samples,
as in Figure 1, which is undesirable in some settings. For
instance, when applying OT for domain adaptation, source
samples with the same class should ideally be mapped to
similar target samples. To mitigate this, prior work has
sought to impose structural priors on the OT objective, e.g.,
via submodular cost functions (Alvarez-Melis et al.,2018)
or a Gromov-Wasserstein regularizer (Vayer et al.,2018b;a).
However, these methods still suffer from sensitivity to out-
liers (Mukherjee et al.,2021) and imbalanced data (Hsu
et al.,2015;Tan et al.,2020).
This work presents Information Maximization Optimal
Transport (InfoOT), an information-theoretic extension of
the optimal transport problem that generalizes the usual
formulation by infusing it with global structure in form
of mutual information. In particular, InfoOT seeks align-
ments that maximize mutual information, an information-
theoretic measure of dependence, between domains. To do
so, we treat the pairs selected by the transportation plan as
samples drawn from the joint distribution and estimate the
mutual information with kernel density estimation based
on the paired samples (Moon et al.,1995). Interestingly,
this results in an OT problem where the cost is the log ra-
1
arXiv:2210.03164v2 [cs.LG] 29 May 2023