New and simplied manual controls for projection and slice tours with application to exploring classication boundaries in high dimensions

2025-05-02 0 0 2.51MB 16 页 10玖币

侵权投诉

New and simpliﬁed manual controls for projection and slice tours,

with application to exploring classiﬁcation boundaries in high

dimensions

Ursula Laaa, Alex Aumannb, Dianne Cookc, German Valenciab

aInstitute of Statistics, University of Natural Resources and Life Sciences, Vienna; bSchool of

Physics and Astronomy, Monash University; cDepartment of Econometrics and Business

Statistics, Monash University

ARTICLE HISTORY

Compiled October 12, 2022

ABSTRACT

This paper describes new user controls for examining high-dimensional data using

low-dimensional linear projections and slices. A user can interactively change the

contribution of a given variable to a low-dimensional projection, which is useful for

exploring the sensitivity of structure to particular variables. The user can also in-

teractively shift the center of a slice, for example, to explore how structure changes

in local subspaces. The Mathematica package as well as example notebooks are

provided, which contain functions enabling the user to experiment with these new

manual controls, with one speciﬁcally for exploring regions and boundaries produced

by classiﬁcation models. The advantage of Mathematica is its linear algebra capa-

bilities, and interactive cursor location controls. Some limited implementation has

also been made available in the R package tourr.

KEYWORDS

data visualisation; grand tour; statistical computing; statistical graphics;

multivariate data; dynamic graphics

1. Introduction

From a statistical perspective 3D is a rare data dimension, so unlike in most 3D rota-

tion computer graphics applications, the more useful methods for data analysis need

to work for arbitrary dimension. A good approach is to show projections from an ar-

bitrary dimensional space to create dynamic data visualizations called tours. Tours

involve views of high-dimensional (p) data with low-dimensional (d) projections. In

his original paper on the grand tour, Asimov (1985) provided several algorithms for

tour paths that could theoretically show the viewer the data from all sides. Prior to

Asimov’s work, there were numerous preparatory developments including Fisherkeller,

Friedman, and Tukey (1974)’s PRIM-9. PRIM-9 had user-controlled rotations on co-

ordinate axes, allowing one to manually tour through low-dimensional projections. (A

video illustrating the capabilities is available through video library of ASA Statistical

Graphics Section (2022).) Steering through all possible projections is impossible, un-

like Asimov’s tours which allows one to quickly see many, many diﬀerent projections.

CONTACT Ursula Laa. Email: ursula.laa@boku.ac.at, Alex Aumann. Email: aaum0002@student.monash.

edu, Dianne Cook. Email: dicook@monash.edu, German Valencia. Email: german.valencia@monash.edu

arXiv:2210.05228v1 [stat.CO] 11 Oct 2022

After Asimov there have been many tour developments, which are summarized in Lee

et al. (2021).

One such direction of work develops the ideas from PRIM-9, to provide manual

control of a tour. Cook and Buja (1997) describe controls for 1D (or 2D) projections,

respectively in a 2D (or 3D) manipulation space, allowing the user to select any variable

axis, and rotate it into, or out of, or around the projection through horizontal, vertical,

oblique, radial or angular changes in value. Spyrison and Cook (2020) reﬁned this

algorithm and implemented it to generate radial tour animation sequences.

Manual controls are especially useful for assessing sensitivity of structure to particu-

lar elements of the projection. There are many places where it is useful. In exploratory

data analysis, where one sees clusters in a projection, one may ask whether some

variables can be removed from the projection without aﬀecting the clustering. For in-

terpreting models, one can reduce or increase a variable’s contribution to examine the

variable importance. Having the user interact with a projection is extremely valuable

for understanding high-dimensional data. However, these algorithms have two prob-

lems: (1) the pre-processing of creating a manipulation space overly complicates the

algorithm, (2) extending to higher dimensional control is diﬃcult.

Another potentially useful manual control, is to allow the user to choose the position

of the center of a slice. The slice tour was introduced in Laa, Cook, and Valencia

(2020). It operates by converting the projection plane into a slice, by removing or de-

emphasizing points that are further than a ﬁxed orthogonal distance from the plane.

The projection plane is usually thought of as passing through the center of the data.

Manual control would allow the user to change the position of the center point, by

shifting it along a coordinate axis, while keeping the orientation of the projection plane

ﬁxed. The purpose would be to explore how or if the shape of the data, in the space

orthogonal to the projection, changes as one gets away from the center. It would also

allow the user to interactively decide on the thickness of the slice.

This paper explains the new manual controls for projection and slice tours. The next

section describes the new algorithm for manual control, for both projections and slices.

The use of these methods is illustrated to compare and contrast boundaries constructed

by diﬀerent classiﬁers. The software section describes a mathematica package that is

used for the application, and describes the interactive environment that would be

desirable within R as new technology becomes available. The paper is accompanied

by an appendix with more details and adjustments to the manual controls, and three

Mathematica notebooks that can be used to reproduce the application.

2. How to construct a manual tour

A manual tour allows the user to alter the coeﬃcients of one (or more) variables

contributing to a ddimensional projection. The initial ingredients are an orthonormal

basis (Ap×d) deﬁning the projection of the data, and a variable id (m∈ {1, ..., p})

specifying which coeﬃcient will be changed. A method to update the values of the

component (mth row of Ap×d) of the controlled variable Vmis then needed.

2.1. Existing methods

The methods for updating component values in Cook and Buja (1997) (and utilized

in Spyrison and Cook (2020)) are prescribed primarily for a 2D projection, to take ad-

vantage of (then) newly developed 3D trackball controls made available for computer

gaming. The ﬁrst step was to construct a 3D manipulation space from a 2D projec-

tion. In this space, the coeﬃcient of the controlled variable ranges between -1 and 1.

Movements of a cursor are recorded and converted into changes in the values of Vm

thus changing the displayed 2D projection. The algorithm also provided constraints

to horizontal, vertical, radial or angular motions only. The construction of the ma-

nipulation space overly complicates the manual controls, especially when considering

possible techniques that will apply to arbitrary d.

2.2. A new simpler and broadly applicable approach

The new approach emerged from experiments on the tour using the linear algebra

capabilities, and relatively new interactive graphics interface, available in Mathematica

(Wolfram Research, Inc. 2022). The components corresponding to Vmare directly

controlled by cursor movement, which updates row mof A. The updated matrix is

then orthonormalised.

2.2.1. Algorithm

1. Provide A, and m. (Note that mcould also be automatically chosen as the

component that is closest to the cursor position.)

2. Change values in row m, for example, if d= 2 gives

A∗= [a∗

1a∗

2] = 





a11 a12

a∗

m1a∗

ap1ap2







A large change in these values would correspond to making a large jump from

the current projection. Small changes would correspond to tracking a cursor,

making small jumps from the current projection.

3. Orthonormalise A∗, using Gram-Schmidt.

i. Normalise a∗

1and a∗

ii. a∗

2=a∗

2−(a∗

1·a∗

2)a∗

This algorithm will produce the changes to a projection as illustrated in Figure 1.

The controlled variable, Vm, corresponds to the black line, and sequential changes to

row mof Acan be seen to roughly follow a speciﬁed position (orange dot). Changes

in the other components happen as a result of the orthonormalisation, but are uncon-

trolled.

2.3. Reﬁnements to enforce exact position

The problem with the new simple method is that it is not faithful to the precise

values for Vmbecause the orthonormalisation will change them. Even though these

changes are for the most part imperceptible, one may wish to avoid them and there are

numerous ways that this can be enforced, a few are detailed in the Appendix. These

primarily diﬀer in how the remaining variables are adjusted during orthonormalisation.

Figure 1. Sequence of projections where contribution of one variable is controlled (black) is changed using

unconstrained orthonormalisation. The dot (orange) indicates the chosen values for the controlled variable. It

can be seen that the actual axis does not precisely match the chosen position, but it is close.

2.4. Manual control for slices

To better explore the space we combine the manual controls for the projection with

manual controls for slicing. A slice is a section of the data that is deﬁned by a pro-

jection, a center point that is anchoring it in the high-dimensional space and the slice

thickness h(Laa, Cook, and Valencia 2020). A data point is inside the slice if its

orthogonal distance from the projection plane (passing through the center point) is

below the thickness h. This orthogonal distance is computed in terms of the component

that is normal on the projection plane. For xiapdimensional data point and cthe

center point (in the same pdimensional space) we compute the orthogonal distance as

i=||x0

i−c0||2=x02

i+c02−2x0

i·c0,(1)

with c0=c−(c·a1)a1−(c·a2)a2,x0

i=xi−(xi·a1)a1−(xi·a2)a2and ak, k = 1,2(= d)

denoting the columns of the projection matrix, A= (a1,a2).

2.4.1. Shifting the center

A natural starting point is to place cin the center of the data distribution, but shifting

it away from the mean can provide additional insights. In the case of a single orthogonal

direction on the projection plane we can pick a sequence of center points cin steps

along that direction to move the slice and fully cover the data space. This no longer

works in higher-dimensional spaces, and we can think of picking one direction and

shifting the slice along the component orthogonal to the projection plane.

2.4.2. Changing the thickness

In addition it is also useful to interactively change the slice thickness h(also called

the slice radius), in particular to ﬁnd the preferred value for exploring the input data.

For guidance the estimates of the number of points inside the slice as a function of

the original sample size Nand the number of dimensions pfrom Laa et al. (2022)

can be used: in case of a uniform distribution inside a sphere of radius Ra slice with

thickness hwill contain NSpoints, with

NS(h, p, R, N) = N

2h

Rp−2 p−(p−2) h

R2!.(2)

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Newandsimpliedmanualcontrolsforprojectionandslicetours,withapplicationtoexploringclassicationboundariesinhighdimensionsUrsulaLaaa,AlexAumannb,DianneCookc,GermanValenciabaInstituteofStatistics,UniversityofNaturalResourcesandLifeSciences,Vienna;bSchoolofPhysicsandAstronomy,MonashUniversity;cDepartme...

展开>> 收起<<

New and simplied manual controls for projection and slice tours with application to exploring classication boundaries in high dimensions.pdf

共16页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

New and simplied manual controls for projection and slice tours with application to exploring classication boundaries in high dimensions

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: