
methods for extracting shared and individual sources from
data. Maneshi et al. [2016] proposes a heuristic way of us-
ing FastICA for the given task without discussing the identi-
fiability of the results; [Long et al., 2020] suggests to apply
ICA on each view separately followed by statistical analysis
to separate the individual from the shared sources; [Lukic et
al., 2002] exploits temporal correlations rather than the non-
Gaussianity of the sources and thus is not applicable in the
context we are considering.
A common tool for analyzing multi-view data is canonical
correlation analysis (CCA), initially proposed by Hotelling
[1936]. It finds two datasets’ projections that maximize the
correlation between the projected variables. Gaussian-CCA
[Bach et al., 2005], its kernelized version [Bach et al., 2002]
and deep learning [Andrew et al., 2013] formulations of the
classical CCA problem aim to recover shared latent sources
of variations from the multiple views. There are extensions
of CCA that model the observed variables as a linear combi-
nation of group-specific and dataset-specific latent variables:
estimated with Bayesian inference methods [Klami et al.,
2013] or exponential families with MCMC inference [Virta-
nen, 2010]. However, most of them assume that the latent
sources are Gaussian or non-linearly related to the observed
data [Wang et al., 2016] and thus lack identifiability results.
Existing non-linear multiview versions such as [Tian et al.,
2020, Federici et al., 2020] cannot recover both shared and
individual signals across multiple measurements, and do not
assure the identifiability of the proposed generative models.
There are identifiable deep non-linear versions of ICA (e.g.
[Hyvärinen et al., 2019]) which can be employed for this
task. However, their assumptions for achieving identifiabil-
ity are often hard to satisfy in real-life applications, espe-
cially in the biomedical domains with low-data regimes.
7 EXPERIMENTS
Model Implementation and Training.
We used the python
library
pytorch
[Paszke et al., 2017] to implement our
method. We model each view with a separate unmixing
matrix. To impose orthogonality constraints on the unmixing
matrices, we made use of the
geotorch
library, which is
an the extension of
pytorch
[Lezcano-Casado, 2019]. The
stochastic gradient-based method applied for training is L-
BFGS. Before running any of the ICA-based methods (our
or the baselines), we whiten every single view by performing
PCA to speed up computation. We estimate the mixing
matrix up to scale (due to the whitening) and permutation
(see Sections 3 and 4). To force the algorithm to output
the shared sources in the same order across all views we
initialize the unmixing matrices by means of CCA. This
follows from the fact that the CCA weights are orthogonal
matrices, and the transformed views’ components are paired
and ordered across views. For all conducted experiments,
we fixed the parameter λfrom Equation 3 to 1.
Figure 2: Comparison of ShIndICA (this paper) to ShICA, Infomax,
GroupICA, MultiViewICA and ShICA-ML. The datasets come
from two different views with total number of sources 100 and
sample size 1000. We vary the number of the true number of shared
sources from 10 to 100 (x-axis), which are considered to be known
to the user before training. We compute the Amari distance (y-axis)
between the estimated unmixing matrices and ground truth (the
lower the better) in each case. ShIndICA consistently outperforms
all baselines.
Baselines Implementation.
We compare ShIndICA to the
standard single-view ICA method Infomax [Ablin et al.,
2018]. To adapt it to the multi-view setting, we run Info-
max on each view separately, and then we apply the Hun-
garian algorithm [Kuhn and Yaw, 1955] to match compo-
nents from different views based on their cross-correlation.
For the shared response model settings, ShIndICA is com-
pared to related methods such as MultiViewICA Richard et
al. [2020], ShICA, ShICA-ML Richard et al. [2021], and
GroupICA as proposed by Richard et al. [2020]. The latter
involves a two-step pre-processing procedure, first whiten-
ing the data in the single views and then dimensionality re-
duction on the joint views. For the data integration exper-
iment we use a method based on partial least squares esti-
mation, closely related to CCA, that extracts between-views
correlated components and view-specific ones. This method
is provided by the
OmicsPLS
R package Bouhaddani et al.
[2018] and is especially developed for data integration of
omics data. We refer to this method as PLS.
7.1 SYNTHETIC EXPERIMENTS
Data Simulation.
We simulated the data using the Laplace
distribution
exp(−1
2|x|)
, and the mixing matrices are sam-
pled with normally distributed entries with mean
1
and
0.1
standard deviation. The realizations of the observed views
are obtained according to the proposed model. In the differ-
ent scenarios described below we vary the noise distribution.
We conducted each experiment
50
times and based on that
we provided error bars in all figures where applicable. Addi-
tional experiments are provided in Appendix D.2.