Review of Clustering Methods for Functional Data Mimi Zhang13 Andrew Parnell23 1School of Computer Science and Statistics Trinity College Dublin Ireland

2025-04-29 0 0 2.7MB 52 页 10玖币

侵权投诉

Review of Clustering Methods for Functional Data

Mimi Zhang1,3, Andrew Parnell2,3

1School of Computer Science and Statistics, Trinity College Dublin, Ireland

2Hamilton Institute, Maynooth University, Ireland

3I-Form Advanced Manufacturing Research Centre, Science Foundation Ireland, Ireland

Abstract

Functional data clustering is to identify heterogeneous morphological patterns in

the continuous functions underlying the discrete measurements/observations. Appli-

cation of functional data clustering has appeared in many publications across various

ﬁelds of sciences, including but not limited to biology, (bio)chemistry, engineering, en-

vironmental science, medical science, psychology, social science, etc. The phenomenal

growth of the application of functional data clustering indicates the urgent need for a

systematic approach to develop eﬃcient clustering methods and scalable algorithmic

implementations. On the other hand, there is abundant literature on the cluster anal-

ysis of time series, trajectory data, spatio-temporal data, etc., which are all related to

functional data. Therefore, an overarching structure of existing functional data clus-

tering methods will enable the cross-pollination of ideas across various research ﬁelds.

We here conduct a comprehensive review of original clustering methods for functional

data. We propose a systematic taxonomy that explores the connections and diﬀer-

ences among the existing functional data clustering methods and relates them to the

conventional multivariate clustering methods. The structure of the taxonomy is built

on three main attributes of a functional data clustering method and therefore is more

reliable than existing categorizations. The review aims to bridge the gap between the

functional data analysis community and the clustering community and to generate new

principles for functional data clustering.

Keywords: curve registration, dependent functional data, multivariate functional

data, shape analysis

arXiv:2210.00847v1 [stat.ME] 3 Oct 2022

1 Introduction

With the advancement of data-collection technology, a wide range of industry and business

sectors are now able to collect functional data. According to Ramsay and Silverman [1], a

functional datum is not an individual value but rather a set of measurements/observations

along a continuum that, taken together, are to be regarded as a single entity. Functional

data come in many forms, but their deﬁning quality is that they consist of functions – often,

but not always, curves. For example, spectroscopic techniques obtain spectral information

by probing each sample with electromagnetic radiation that varies in a range of wavelengths,

and hence the calculated absorption coeﬃcient is a function of wavelength. By probing a

sample at diﬀerent wavelengths, the set of absorption coeﬃcients is one data unit. Another

example of functional data is an fMRI time series, consisting of a time series of 3D images

of the living human brain, where each 3D image consists of a large number of voxels (3D

pixels). For example, the prevalent BOLD fMRI detects the blood-oxygen-level-dependent

signal that reﬂects changes in deoxyhemoglobin, driven by localized changes in brain blood

ﬂow and blood oxygenation. Each 3D image is a functional datum (or, equivalently, a

random ﬁeld). Paradigmatic formats of functional data include time series, trajectories,

spatio-temporal data, etc. However, the term “functional” is not the deﬁning quality of time

series, trajectories, or spatio-temporal data. Ansari et al. [2] classiﬁed spatio-temporal data

into ﬁve types, according to which certain types of spatio-temporal data are not functional

data. Apart from the diﬀerence in the deﬁnitions of data format, the main diﬀerence is in

the focus of statistical analysis: the focus of functional data analysis is on analyzing relations

among the random elements, rather than properties of individual random elements.

While functional data analysis has received attention from statisticians since the 1980s,

there is very little advancement in the area of functional data clustering. Within the two

databases: Scopus and Web of Science, we found only about 100 articles that are on develop-

ing clustering methods for functional data.1Moreover, nearly all documented methods tackle

only the functional-data part of the problem, not the clustering part of the problem. For

example, many studies mainly concern extracting a tabular-data proxy for functional data,

ignoring the synergy between the feature-learning (a.k.a., representation-learning) step and

the clustering step. The main objective of our review is to develop an overarching structure

of existing functional data clustering methods, which highlights the similarities and diﬀer-

1In the appendix, we give the details on the identiﬁcation of relevant literature and the article selection

process. We also provide a table that implements the classiﬁcation of the reviewed articles according to our

taxonomy.

ences among them and their connections with conventional multivariate clustering methods.

We point to a few good references that give excellent coverage of state-of-the-art clustering

methods for relevant data types (i.e., time series, trajectory data, and spatio-temporal data).

We also suggest a new methodological framework that extricates the primary deﬁciency in

the current tandem approach. The review will also help connect the machine learning and

computer science communities with the challenges and opportunities in analyzing functional

data.

Figure 1 depicts the tandem approach adopted in the current practice of functional data

tabular

data

smooth

functions clustering

step 2 step 3

step 2*

sample

paths step 1

Figure 1: Functional data clustering methods can be categorized into two major groups,

according to whether the clustering method is applied to the extracted tabular data (steps

1, 2 and 3) or to the estimated smooth functions (step 1 and step 2*). In the upper line

approach, cluster analysis is performed in a ﬁnite-dimensional space, while in the bottom

line approach, cluster analysis is performed in an inﬁnite-dimensional space.

clustering, and Figure 2 illustrates our taxonomy. Functional data clustering methods can

be categorized (Tier 1 categorization) according to whether the clustering method is applied

to the extracted tabular data (i.e., in a ﬁnite-dimensional space) or to the estimated smooth

functions (i.e., in an inﬁnite-dimensional space). Then within each major category, cluster-

ing methods can be further categorized (Tier 2 categorization) according to the deﬁnition

of (dis)similarity, the deﬁnition of cluster, and/or algorithmic features. In particular, in the

upper pipeline of Figure 1, clustering methods can be classiﬁed into “hierarchical clustering”,

“model-based clustering”, “centroid-based clustering”, “density-based clustering”, “spectral

clustering”, etc. In the bottom pipeline, clustering methods can be classiﬁed into “subspace

clustering”, “nonparametric Bayesian”, “density-based clustering”, “new (dis)similarity”,

etc. Finally, in the Tier 3 categorization, clustering methods are grouped according to the

way they deal with phase variation and/or amplitude variation. In the random-eﬀects cate-

gory, phase variation and amplitude variation are characterized by a few random parameters

in the function expression; for example, y=y(at +b), where tis the argument, and the ran-

dom parameters aand bare to capture the phase variation. In the (non)parametric category,

















 



























































































Figure 2: The three-tier categorization of existing functional data clustering methods. The

ﬁrst tier categorization concerns the dimension of the direct input to a clustering method, the

second tier categorization is based on the characteristics of the clustering method, and the

third tier categorization is to highlight the diﬀerent strategies that deal with phase variation

and/or amplitude variation. Methods highlighted in green and blue constitute the vast

majority of the literature and are respectively reviewed in Section 3 and Section 4. Methods

highlighted in grey explicitly address the phase variation and/or amplitude variation in their

clustering methods and are reviewed in Section 7.

the time-warping functions admit either a parametric model or a nonparametric model. In

the equivalence-relation category, two functions are equivalent if they can be transformed

to each other by, e.g., a linear time-warping function. Our three-tier categorization pro-

vides a well-conceived and useful taxonomy in that it frames the three deﬁning features of

functional data clustering methods: dimensionality reduction, clustering strategy, and curve

registration.

There are a few attempts at devising taxonomic categories for functional data clustering

methods. The short survey given by Jacques and Preda [3] classiﬁes a few conventional func-

tional data clustering methods into three categories. Chamroukhi and Nguyen [4] reviewed

a few articles that diﬀer in the way of extracting tabular data but all apply the model-based

clustering technique on the extracted tabular data. Cheam and Fredette [5] reviewed a few

functional data clustering methods and categorized them according to whether they allow

amplitude variation and/or phase variation within clusters. We note that, while a few func-

tional data clustering methods explicitly deal with phase variation, the majority of functional

data clustering methods adopt the convention that phase variation, whether relevant or not

to the clustering problem, will be identiﬁed in the pre-processing step. Hence, the categories

provided by [5] are too broad to enlighten future works. By contrast, our three-tier cate-

gorization provides a lot more information. Moreover, none of the above surveys tends to

be as comprehensive as we are in this review. Ullah and Finch [6] conducted a systematic

overview of applications of functional data analysis, covering all articles published during

1995 – 2010. Cuevas [7] provided a good survey of the current theory and statistics of func-

tional data analysis. Finally, while there is limited literature in the ﬁeld of functional data

clustering, there is abundant literature on clustering time series, trajectory data, or spatio-

temporal data. Readers are referred to the following recent surveys for cross-pollination of

insights and ideas: Zheng [8] for trajectory data, Aghabozorgi et al. [9] for time series, and

Atluri at al. [10], Ansari et al. [2] and Wang et al. [11] for spatio-temporal data.

The novelty of functional data clustering obliges us to start by clarifying the terminology

in Section 2. The majority of the diﬀerent functional data clustering methods are explained

in Section 3 & 4, while Section 5 & 6 are respectively dedicated to the clustering methods

for vector-valued functional data and dependent functional data, which are two demanding

tasks in this ﬁeld. All the methods reviewed in Section 3-6 belong to the “pre-processing”

category in Tier 3 categorization. Only a few articles, reviewed in Section 7, explicitly

address the phase variation problem in their clustering methods. We conclude our review

by presenting in Section 8 a new methodological framework that aims at maximizing the

synergy among the sequential steps in a functional data clustering method. The layout of

our overview in each section is consistent with the hierarchy of our taxonomy. However, we

may explain an original work and its follow-up or relevant works together, to avoid repeating

the problem context and to provide an integrated view. Table 2 in the appendix delineates

the classiﬁcation of all the reviewed publications according to our taxonomy.

2 Preliminaries

The notion “random function” is a natural generalization of the notion “random variable”.

Let Tdenote a compact set in a topological space of dimension d(≥1). For example, Tcan

be an interval or a manifold. A random function Yis deﬁned on a probability space (Ω,F,P)

and takes values in an inﬁnite-dimensional space Y. Most theoretical developments require

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ReviewofClusteringMethodsforFunctionalDataMimiZhang1;3,AndrewParnell2;31SchoolofComputerScienceandStatistics,TrinityCollegeDublin,Ireland2HamiltonInstitute,MaynoothUniversity,Ireland3I-FormAdvancedManufacturingResearchCentre,ScienceFoundationIreland,IrelandAbstractFunctionaldataclusteringistoidentif...

展开>> 收起<<

Review of Clustering Methods for Functional Data Mimi Zhang13 Andrew Parnell23 1School of Computer Science and Statistics Trinity College Dublin Ireland.pdf

共52页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Review of Clustering Methods for Functional Data Mimi Zhang13 Andrew Parnell23 1School of Computer Science and Statistics Trinity College Dublin Ireland

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: