New and simplied manual controls for projection and slice tours with application to exploring classication boundaries in high dimensions

2025-05-02 0 0 2.51MB 16 页 10玖币
侵权投诉
New and simplified manual controls for projection and slice tours,
with application to exploring classification boundaries in high
dimensions
Ursula Laaa, Alex Aumannb, Dianne Cookc, German Valenciab
aInstitute of Statistics, University of Natural Resources and Life Sciences, Vienna; bSchool of
Physics and Astronomy, Monash University; cDepartment of Econometrics and Business
Statistics, Monash University
ARTICLE HISTORY
Compiled October 12, 2022
ABSTRACT
This paper describes new user controls for examining high-dimensional data using
low-dimensional linear projections and slices. A user can interactively change the
contribution of a given variable to a low-dimensional projection, which is useful for
exploring the sensitivity of structure to particular variables. The user can also in-
teractively shift the center of a slice, for example, to explore how structure changes
in local subspaces. The Mathematica package as well as example notebooks are
provided, which contain functions enabling the user to experiment with these new
manual controls, with one specifically for exploring regions and boundaries produced
by classification models. The advantage of Mathematica is its linear algebra capa-
bilities, and interactive cursor location controls. Some limited implementation has
also been made available in the R package tourr.
KEYWORDS
data visualisation; grand tour; statistical computing; statistical graphics;
multivariate data; dynamic graphics
1. Introduction
From a statistical perspective 3D is a rare data dimension, so unlike in most 3D rota-
tion computer graphics applications, the more useful methods for data analysis need
to work for arbitrary dimension. A good approach is to show projections from an ar-
bitrary dimensional space to create dynamic data visualizations called tours. Tours
involve views of high-dimensional (p) data with low-dimensional (d) projections. In
his original paper on the grand tour, Asimov (1985) provided several algorithms for
tour paths that could theoretically show the viewer the data from all sides. Prior to
Asimov’s work, there were numerous preparatory developments including Fisherkeller,
Friedman, and Tukey (1974)’s PRIM-9. PRIM-9 had user-controlled rotations on co-
ordinate axes, allowing one to manually tour through low-dimensional projections. (A
video illustrating the capabilities is available through video library of ASA Statistical
Graphics Section (2022).) Steering through all possible projections is impossible, un-
like Asimov’s tours which allows one to quickly see many, many different projections.
CONTACT Ursula Laa. Email: ursula.laa@boku.ac.at, Alex Aumann. Email: aaum0002@student.monash.
edu, Dianne Cook. Email: dicook@monash.edu, German Valencia. Email: german.valencia@monash.edu
arXiv:2210.05228v1 [stat.CO] 11 Oct 2022
After Asimov there have been many tour developments, which are summarized in Lee
et al. (2021).
One such direction of work develops the ideas from PRIM-9, to provide manual
control of a tour. Cook and Buja (1997) describe controls for 1D (or 2D) projections,
respectively in a 2D (or 3D) manipulation space, allowing the user to select any variable
axis, and rotate it into, or out of, or around the projection through horizontal, vertical,
oblique, radial or angular changes in value. Spyrison and Cook (2020) refined this
algorithm and implemented it to generate radial tour animation sequences.
Manual controls are especially useful for assessing sensitivity of structure to particu-
lar elements of the projection. There are many places where it is useful. In exploratory
data analysis, where one sees clusters in a projection, one may ask whether some
variables can be removed from the projection without affecting the clustering. For in-
terpreting models, one can reduce or increase a variable’s contribution to examine the
variable importance. Having the user interact with a projection is extremely valuable
for understanding high-dimensional data. However, these algorithms have two prob-
lems: (1) the pre-processing of creating a manipulation space overly complicates the
algorithm, (2) extending to higher dimensional control is difficult.
Another potentially useful manual control, is to allow the user to choose the position
of the center of a slice. The slice tour was introduced in Laa, Cook, and Valencia
(2020). It operates by converting the projection plane into a slice, by removing or de-
emphasizing points that are further than a fixed orthogonal distance from the plane.
The projection plane is usually thought of as passing through the center of the data.
Manual control would allow the user to change the position of the center point, by
shifting it along a coordinate axis, while keeping the orientation of the projection plane
fixed. The purpose would be to explore how or if the shape of the data, in the space
orthogonal to the projection, changes as one gets away from the center. It would also
allow the user to interactively decide on the thickness of the slice.
This paper explains the new manual controls for projection and slice tours. The next
section describes the new algorithm for manual control, for both projections and slices.
The use of these methods is illustrated to compare and contrast boundaries constructed
by different classifiers. The software section describes a mathematica package that is
used for the application, and describes the interactive environment that would be
desirable within R as new technology becomes available. The paper is accompanied
by an appendix with more details and adjustments to the manual controls, and three
Mathematica notebooks that can be used to reproduce the application.
2. How to construct a manual tour
A manual tour allows the user to alter the coefficients of one (or more) variables
contributing to a ddimensional projection. The initial ingredients are an orthonormal
basis (Ap×d) defining the projection of the data, and a variable id (m∈ {1, ..., p})
specifying which coefficient will be changed. A method to update the values of the
component (mth row of Ap×d) of the controlled variable Vmis then needed.
2.1. Existing methods
The methods for updating component values in Cook and Buja (1997) (and utilized
in Spyrison and Cook (2020)) are prescribed primarily for a 2D projection, to take ad-
vantage of (then) newly developed 3D trackball controls made available for computer
2
gaming. The first step was to construct a 3D manipulation space from a 2D projec-
tion. In this space, the coefficient of the controlled variable ranges between -1 and 1.
Movements of a cursor are recorded and converted into changes in the values of Vm
thus changing the displayed 2D projection. The algorithm also provided constraints
to horizontal, vertical, radial or angular motions only. The construction of the ma-
nipulation space overly complicates the manual controls, especially when considering
possible techniques that will apply to arbitrary d.
2.2. A new simpler and broadly applicable approach
The new approach emerged from experiments on the tour using the linear algebra
capabilities, and relatively new interactive graphics interface, available in Mathematica
(Wolfram Research, Inc. 2022). The components corresponding to Vmare directly
controlled by cursor movement, which updates row mof A. The updated matrix is
then orthonormalised.
2.2.1. Algorithm
1. Provide A, and m. (Note that mcould also be automatically chosen as the
component that is closest to the cursor position.)
2. Change values in row m, for example, if d= 2 gives
A= [a
1a
2] =
a11 a12
.
.
..
.
.
a
m1a
m2
.
.
..
.
.
ap1ap2
.
A large change in these values would correspond to making a large jump from
the current projection. Small changes would correspond to tracking a cursor,
making small jumps from the current projection.
3. Orthonormalise A, using Gram-Schmidt.
i. Normalise a
1and a
2.
ii. a
2=a
2(a
1·a
2)a
1.
This algorithm will produce the changes to a projection as illustrated in Figure 1.
The controlled variable, Vm, corresponds to the black line, and sequential changes to
row mof Acan be seen to roughly follow a specified position (orange dot). Changes
in the other components happen as a result of the orthonormalisation, but are uncon-
trolled.
2.3. Refinements to enforce exact position
The problem with the new simple method is that it is not faithful to the precise
values for Vmbecause the orthonormalisation will change them. Even though these
changes are for the most part imperceptible, one may wish to avoid them and there are
numerous ways that this can be enforced, a few are detailed in the Appendix. These
primarily differ in how the remaining variables are adjusted during orthonormalisation.
3
Figure 1. Sequence of projections where contribution of one variable is controlled (black) is changed using
unconstrained orthonormalisation. The dot (orange) indicates the chosen values for the controlled variable. It
can be seen that the actual axis does not precisely match the chosen position, but it is close.
2.4. Manual control for slices
To better explore the space we combine the manual controls for the projection with
manual controls for slicing. A slice is a section of the data that is defined by a pro-
jection, a center point that is anchoring it in the high-dimensional space and the slice
thickness h(Laa, Cook, and Valencia 2020). A data point is inside the slice if its
orthogonal distance from the projection plane (passing through the center point) is
below the thickness h. This orthogonal distance is computed in terms of the component
that is normal on the projection plane. For xiapdimensional data point and cthe
center point (in the same pdimensional space) we compute the orthogonal distance as
v2
i=||x0
ic0||2=x02
i+c022x0
i·c0,(1)
with c0=c(c·a1)a1(c·a2)a2,x0
i=xi(xi·a1)a1(xi·a2)a2and ak, k = 1,2(= d)
denoting the columns of the projection matrix, A= (a1,a2).
2.4.1. Shifting the center
A natural starting point is to place cin the center of the data distribution, but shifting
it away from the mean can provide additional insights. In the case of a single orthogonal
direction on the projection plane we can pick a sequence of center points cin steps
along that direction to move the slice and fully cover the data space. This no longer
works in higher-dimensional spaces, and we can think of picking one direction and
shifting the slice along the component orthogonal to the projection plane.
2.4.2. Changing the thickness
In addition it is also useful to interactively change the slice thickness h(also called
the slice radius), in particular to find the preferred value for exploring the input data.
For guidance the estimates of the number of points inside the slice as a function of
the original sample size Nand the number of dimensions pfrom Laa et al. (2022)
can be used: in case of a uniform distribution inside a sphere of radius Ra slice with
thickness hwill contain NSpoints, with
NS(h, p, R, N) = N
2h
Rp2 p(p2) h
R2!.(2)
4
摘要:

Newandsimpli edmanualcontrolsforprojectionandslicetours,withapplicationtoexploringclassi cationboundariesinhighdimensionsUrsulaLaaa,AlexAumannb,DianneCookc,GermanValenciabaInstituteofStatistics,UniversityofNaturalResourcesandLifeSciences,Vienna;bSchoolofPhysicsandAstronomy,MonashUniversity;cDepartme...

展开>> 收起<<
New and simplied manual controls for projection and slice tours with application to exploring classication boundaries in high dimensions.pdf

共16页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:16 页 大小:2.51MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 16
客服
关注