2013), molecular biology (Olivon et al., 2018), especially in single-cell transcriptomics (Kobak and
Berens, 2019), among others.
However, the wide availability and functional diversity of data visualization methods also brings
forth new challenges to data analysts and practitioners (Nonato and Aupetit, 2018; Espadoto et al.,
2019). On the one hand, it is critically important to determine among the extensive list which
visualization method is most suitable and reliable for embedding a given dataset. In fact, even
for a single visualization method, such as t-SNE or UMAP, oftentimes there are multiple tuning
parameters to be determined by the users, and different tuning parameters may lead to distinct
visualizations (Kobak and Linderman, 2021; Cai and Ma, 2021). Thus, for a given dataset, selecting
the most suitable visualization method and along with its tuning parameters calls for a method
that provides quantitative and objective assessment of different visualizations of the dataset. On
the other hand, as different methods are usually based on distinct ideas and heuristics, they would
generate qualitatively diverse visualizations of a dataset, each containing important features about
the data that are possibly unique to the visualization method. Meanwhile, due to noisiness and high-
dimensionality of many real-world datasets, their low-dimensional visualizations necessarily contain
distortions from the underlying true structures, which again may vary from one visualization to
another. It is therefore of substantial practical interest to combine strengths and reach a consensus
among multiple data visualizations, in order to obtain an even better “meta-visualization” of the
data that captures the most information and is least susceptible to the distortions. Naturally, a
meta-visualization would also save practitioners from painstakingly selecting a single visualization
method among many.
In this paper, we propose an efficient spectral approach for simultaneously assessing and combin-
ing multiple data visualizations produced by diverse dimension reduction/visualization algorithms,
allowing for different settings of tuning parameters for individual algorithms. Specifically, the
proposed method takes as input a collection of visualizations, or low-dimensional embeddings of
a dataset, hereafter referred as “candidate visualizations,” and summarizes each visualization by
a normalized pairwise-distance matrix among the samples. With respect to each sample in the
dataset, we construct a comparison matrix from these normalized distance matrices, characterizing
the local concordance between each pair of candidate visualizations. Based on eigen-decomposition
of the comparison matrices, we propose a quantitative measure, referred as “visualization eigen-
score,” that quantifies the relative performance of the candidate visualizations in a sample-wise
manner, reflecting their local concordance with the underlying low-dimensional structure contained
in the data. To obtain a meta-visualization, the candidate visualizations are combined together
into a meta-distance matrix, defined as a row-wise weighted average of those normalized distance
matrices, using the corresponding eigenscores as the weights. The meta-distance matrix is then
used to produce a meta-visualization, based on an existing method such as UMAP or kPCA, which
is shown to be more reliable and more informative compared to individual candidate visualizations.
Our method is schematically summarized in Figure 1 and Algorithm 1, and detailed in Section 2.1.
The thus obtained meta-visualization reflects a joint perspective aggregating various aspects of the
data that are oftentimes captured separately by individual candidate visualizations.
Numerically, through extensive simulations and analysis of multiple real-world datasets with
diverse underlying structures, we show the effectiveness of the proposed eigenscores in assessing
and ranking a collection of candidate visualizations, and demonstrate the superiority of the final
meta-visualization over all the candidate visualizations in terms of identification and characteri-
zation of these structural patterns. To achieve a deeper understanding of the proposed method,
we also develop a formal statistical framework, that rigorously justifies the proposed scoring and
meta-visualization method, providing theoretical insights on the fundamental principles behind the
empirical success of the method, along with its proper interpretations, and guidance on practice.
2