Contrastive Trajectory Similarity Learning with Dual-Feature Attention Yanchuan Chang1 Jianzhong Qi1Yuxuan Liang2 Egemen Tanin1

2025-04-27 2 0 1.58MB 15 页 10玖币

侵权投诉

Contrastive Trajectory Similarity Learning with

Dual-Feature Attention

Yanchuan Chang1, Jianzhong Qi1,§Yuxuan Liang2, Egemen Tanin1

1The University of Melbourne,2National University of Singapore

{yanchuanc@student., jianzhong.qi@, etanin@}unimelb.edu.au, yuxliang@outlook.com

Abstract—Trajectory similarity measures act as query pred-

icates in trajectory databases, making them the key player in

determining the query results. They also have a heavy impact

on the query efﬁciency. An ideal measure should have the

capability to accurately evaluate the similarity between any two

trajectories in a very short amount of time. Towards this aim,

we propose a contrastive learning-based trajectory modeling

method named TrajCL. We present four trajectory augmentation

methods and a novel dual-feature self-attention-based trajectory

backbone encoder. The resultant model can jointly learn both

the spatial and the structural patterns of trajectories. Our model

does not involve any recurrent structures and thus has a high

efﬁciency. Besides, our pre-trained backbone encoder can be ﬁne-

tuned towards other computationally expensive measures with

minimal supervision data. Experimental results show that TrajCL

is consistently and signiﬁcantly more accurate than the state-of-

the-art trajectory similarity measures. After ﬁne-tuning, i.e., to

serve as an estimator for heuristic measures, TrajCL can even

outperform the state-of-the-art supervised method by up to 56%

in the accuracy for processing trajectory similarity queries.

Index Terms—Trajectory similarity, spatial databases, con-

trastive learning, transformer

I. INTRODUCTION

A trajectory is commonly represented as a sequence of

location points to describe the movement of an object, such

as a person or a vehicle. Measuring the similarity between

trajectories is a fundamental step in trajectory queries [?],

[1]–[6], since it is used as a query predicate which determines

query results and efﬁciency. Unlike numeric data and character

data, there are not many universally applicable comparison

criteria for trajectory data, and thus measuring similarity

between trajectories is an important area of research.

A series of trajectory similarity measures [7]–[13] have been

proposed, which can be classiﬁed into two categories: heuristic

measures and learned measures. Heuristic trajectory similar-

ity measures [7]–[10] mainly aim to ﬁnd a point-oriented

matching between two trajectories based on hand-crafted rules.

For example, Hausdorff [9] leverages the Euclidean distances

between points on two trajectories to measure trajectory sim-

ilarity. Learned trajectory similarity measures [11]–[14], on

the other hand, utilize deep learning models to predict sim-

ilarity values by computing the distance between trajectory-

oriented embeddings (i.e., numeric vector representations of

trajectories). For example, t2vec [11] and E2DTC [14] adapt

recurrent neural networks (RNN) to encode trajectories into

§Corresponding author

embeddings, TrjSR [12] uses convolutional neural networks

(CNN) to embed trajectories.

(a) Hausdorff (heuristic) (b) t2vec (learned) (c) TrajCL (ours)

Fig. 1: Querying the 3NN trajectories (The query trajectory is

in yellow with extra thick lines for easy viewing. The 3NN

results are colored in red,green and blue, respectively.)

TABLE I: Trajectory similarity computation time

Hausdorff t2vec TrajCL

Time (µs) 6.63 0.34 0.14

Measures in the both categories above face the following

challenges. (1) Ineffectiveness: Trajectories with different

sampling rates or containing noise can degrade the effective-

ness of the existing measures. This is because the heuristic

measures using hand-crafted rules are prone to errors by low-

quality trajectories. The learned measures also suffer from

this problem, since they mostly adopt deep learning models

which are not originally designed for trajectory data and may

fail to capture long spatial correlations between trajectory

points and between similar trajectories. For example, Fig. 1

shows the 3-nearest neighbor query results on the Porto taxi

trajectory dataset [15]. The query results obtained using t2vec

(Fig. 1b) are far from the query trajectory. Those obtained

by Hausdorff are closer to the query trajectory (Fig. 1a),

but not as close as those obtained by our TrajCL method

(Fig. 1c), while Hausdorff suffers in efﬁciency (discussed

next). (2) Inefﬁciency: Existing heuristic measures compute

the distance between each pair of points on two trajectories.

They take at least a quadratic time w.r.t. the number of

trajectory points, which is unacceptable in online systems,

especially when trajectories become longer. Although the

learned measures get rid of pairwise point comparisons, they

are still limited in efﬁciency. As Table I shows, Hausdorff takes

6.63 microseconds to compute the similarity of two Porto taxi

arXiv:2210.05155v3 [cs.DB] 20 Feb 2023

trajectories. t2vec reduces the time by more than an order

of magnitude to 0.34 microseconds. However, its recurrent

structure has not fully exploited the parallel power of GPUs.

Our TrajCL avoids this recurrent structure and further brings

the computation time down to 0.14 microseconds.

To address these issues, we propose TrajCL, a contrastive

learning-based trajectory similarity measure with a dual-

feature self-attention-based trajectory backbone encoder (Du-

alSTB). TrajCL ﬁrst leverages our proposed trajectory aug-

mentation methods to generate diverse trajectory variants (i.e.,

so called views) with different characteristics for each training

sample. Then, the proposed DualSTB encoder embeds the

augmented trajectories into trajectory embeddings, which can

capture the spatial distance correlation between the trajecto-

ries. After that, we compute the similarity of two trajectories

simply as the L1distance between their embeddings.

Due to the lack of ground-truth for trajectory similarity,

we train our proposed DualSTB encoder by adopting self-

supervised contrastive learning [16], [17] that aims to max-

imize the agreement between the representations of positive

(i.e., similar) data pairs and minimize that of the negative (i.e.,

dissimilar) data pairs, where the positive and negative data

pairs are generated from input data via augmentation methods.

The idea of using contrastive learning for representation

learning is not new. By introducing it into trajectory embed-

ding learning, our ﬁrst technical contribution is four trajectory

augmentation methods that enable obtaining the positive and

negative data pairs for contrastive learning over trajectories.

These methods include point shifting, point masking, trajec-

tory truncating, and trajectory simpliﬁcation. The augmented

trajectories can be regarded as a set of low-quality variants of

the input trajectories with uncertainty. Such diverse trajectories

guide our model to learn the key patterns to differentiate

between similar and not-so-similar trajectory pairs.

Our second technical contribution is a dual-feature self-

attention-based trajectory backbone encoder (i.e., DualSTB)

that encodes both structural and spatial trajectory features

of a trajectory into its learned embedding. The two types

of features together provide coarse-grained and ﬁne-grained

location information of trajectories. To obtain a comprehensive

embedding based on the two types of features, we devise a

dual-feature multi-head self-attention module that ﬁrst learns

the correlations between trajectory points based on each type

of features. Then, the module adaptively combines the two

types of correlations, and ﬁnally it forms the output embed-

dings. Such a module can capture the long-term dependency

between trajectory points, while its non-recurrent structure

enables model inference with high efﬁciency.

After TrajCL is trained, it can be ﬁne-tuned towards any

existing heuristic measure as a fast estimator with little training

effort, similar to the approximate learned measures [18]–[21].

To sum up, we make the following contributions:

1) We propose TrajCL, a contrastive learning-based trajectory

similarity measure that does not rely on any supervision

data during training. Our measure is robust to low-quality

trajectories and efﬁcient on trajectory similarity compu-

tation. Besides, pre-trained TrajCL models can be used to

fast approximate any existing heuristic trajectory similarity

measure with little training effort.

2) We design four trajectory augmentation methods for our

trajectory contrastive learning framework, to enhance the

robustness of TrajCL on measuring trajectory similarity.

3) We present a dual-feature self-attention-based trajec-

tory backbone encoder, which incorporates the structural

feature-based attention and the spatial feature-based at-

tention adaptively. It can capture more comprehensive

correlations between trajectory points comparing with a

vanilla self-attention-based encoder.

4) We conduct extensive experiments on four trajectory

datasets. The results show that: (i) Compared with the

state-of-the-art learned trajectory similarity measures, Tra-

jCL improves the measuring accuracy by 3.22 times and

reduces the running time by more than 50%, on average.

(ii) When acting as a fast estimator of a heuristic measure,

TrajCL outperforms the state-of-the-art supervised method

by up to 56% in terms of the prediction accuracy.

II. RELATED WORK

Trajectory similarity measures. Existing studies on mea-

suring the similarity between two trajectories can be divided

into two categories: heuristic measures and learned measures.

Heuristic measures, in general, compare pairs of points

from two trajectories to ﬁnd optimal point matches [7]–[10],

[22]–[24]. The (Euclidean) distances aggregated from the

matched points formulate the similarity of two trajectories.

Such methods usually take O(n2)time given trajectories of

npoints each. For example, Hausdorff [9] computes the

maximum point-to-trajectory distance between two trajecto-

ries. Fr´

echet [10] resembles Hausdorff but requires the point

matches to strictly follow the sequential point order. EDR [7]

and EDwP [8] compute edit distance between trajectories,

while EDwP [8] further considers the real point distances,

and it allows interpolation points to account for non-uniform

sampling frequencies. A few other studies [2]–[4] measure

similarity on spatial networks, which are less relevant and are

omitted.

A few recent studies [18]–[21], [25]–[27] take a supervised

approach and train a deep learning model to approximate a

heuristic measure (e.g., Hausdorff). Once trained, the model

can predict trajectory similarity in time linear to the embed-

ding dimensionality. For example, NEUTRAJ [18] leverages

LSTMs [28] with a spatial memory module to capture the

correlation between trajectories. Traj2SimVec [19] accelerates

NEUTRAJ training with a sampling strategy, and it uses an

auxiliary loss to capture sub-trajectory similarity. T3S [20] uses

vanilla LSTMs and self-attention [29] to learn heuristic mea-

sures. TrajGAT [21] proposes a graph-based attention model

to capture the long-term dependency between trajectories.

Learned measures [11]–[14] do not require a given heuristic

measure to generate model training signals. These methods

still learn trajectory embeddings with deep learning, which

are expected to be more robust to low-quality (e.g., noisy

or with low sampling rates) trajectories, since deep learning

models are strong in capturing the distinctive data features.

t2vec [11] uses an RNN-based sequence-to-sequence model

to learn trajectory embeddings and then the similarity. It uses

a spatial proximity-aware loss that helps encode the spatial

distance between trajectories. E2DTC [14] leverages t2vec

as the backbone encoder for trajectory clustering. It adds

two loss functions to capture the similarity between trajec-

tories from the same cluster. TrjSR [12] captures the spatial

pattern of trajectories by converting trajectories into images.

CSTRM [13] uses vanilla self-attention as its trajectory encoder

and proposes a multi-view hinge loss to capture both point-

level and trajectory-level similarities between trajectories. It

generates positive trajectory pairs using two augmentation

methods, i.e., point shifting and point masking, which are

empirically shown to be sub-optimal in Section V.

Our model is a learned trajectory similarity measure. It aims

to address the limitations of the existing learned measures in

effectiveness and efﬁciency as discussed in Section I.

Contrastive learning. Contrastive learning [16], [17], [30]–

[38] is a self-supervised learning technique. Its core idea is to

maximize the agreement between the learned representations

of similar objects (i.e., positive pairs) while minimizing that

between dissimilar objects (i.e., negative pairs). The positive

and the negative sample pairs are generated from an input

dataset, and no supervision (labeled) data is needed. Once

trained, the representation generation model (i.e., a backbone

encoder) can be connected to downstream models, to generate

object representations for downstream learning tasks (e.g.,

classiﬁcation). A few studies introduce contrastive learning

into spatial problems, such as trafﬁc ﬂow prediction [39].

Self-attention models. Self-attention-based models [29],

[40]–[42] learn the correlation between every two elements

of an input sequence. Studies have adopted self-attention for

trajectory similarity measurement (i.e., T3S and CSTRM).

Unlike our model, both T3S and CSTRM adopt the vanilla

multi-head self-attention encoder [29], while we propose a

dual-feature self-attention-based encoder which can capture

trajectory features from two levels of granularity and thus

generate more robust embeddings.

III. SOLUTION OVERVIEW

We consider a trajectory Tas a sequence of points recording

discrete locations of the movement of some entity, denoted by

T= [p1, p2, ..., p|T|], where piis the i-th point on T, and |T|

denotes the number of points on T. A point piis represented

by its coordinates in an Euclidean space, i.e., pi= (xi, yi).

Problem statement. Given a set of trajectories, we aim to

learn a trajectory encoder F:T→hthat maps a trajectory

Tto a d-dimensional embedding vector h∈Rd. The distance

between the learned embeddings of two trajectories should

be negatively correlated to the similarity between the two

trajectories (we use the L1distance in the experiments).

Model overview. Fig. 2 shows an overview of our TrajCL

model. The model follows the dual-branch structure of a strong

contrastive learning framework, MoCo [16]. Our technical

contributions come in the design of the learning modules as

highlighted in red in Fig. 2, to be detailed in the next section.

Given an input trajectory T, it ﬁrst goes through a trajec-

tory augmentation module to generate two different trajectory

views (i.e., variants) of T, denoted as e

Tand e

T0, respectively.

We propose four different augmentation methods that em-

phasize different features of a trajectory (Section IV-A). The

augmentation process is based on Tdirectly, and hence no

additional manual data labeling efforts are needed.

The generated views e

Tand e

T0are fed into pointwise

trajectory feature enrichment layers to generate pointwise

features beyond just the coordinates, which reﬂect the key

characteristics of e

Tand e

T0(Section IV-B). We represent the

enriched features by two types of embeddings, the structural

feature embedding and the spatial feature embedding, for each

point in e

T(and e

T0). These embeddings encode pointwise

structural and spatial features, and form a structural embedding

matrix T(T0) and a spatial embedding matrix S(S0).

Then, we input (T,S) and (T0,S0) into trajectory backbone

encoders Fand F0to obtain embeddings hand h0for e

Tand

T0, respectively (Section IV-C). Our backbone encoders are

adapted from Transformer [29], and they encode structural and

spatial features of trajectories into the embeddings.

Next, hand h0go through two projection heads Pand P0

(which are fully connected layers of the same structure) to be

mapped into lower-dimensional vectors zand z0, respectively:

z=P(h) = (FC ◦ReLU ◦FC)(h)(1)

Here, FC denotes a fully connected layer, ReLU denotes the

ReLU activation function, and ◦denotes function composition.

We omit the equation for P0as it is the same. Such projections

have been shown to improve the embedding quality [17], [30].

Model training. Following previous contrastive learning

models, we use the InfoNCE [43] loss for model training.

We use zand z0as a pair of positive samples, as they both

come from variants of Tand are supposed to be similar in

the learned latent space. The embeddings (except z0) from

projection head P0that are in the current and recent past

training batches are used as negative samples of z. The

InfoNCE loss Lmaximizes the agreement between positive

samples and minimizes that between negative samples:

L(T) = −log expsim(z,z0)/τ

expsim(z,z0)/τ+P|Qneg |

j=1 expsim(z,z−

j)/τ

(2)

Here, sim is the cosine similarity. τis a temperature parameter

that controls the contribution of the negative samples [44].

We use a queue Qneg of a ﬁxed size (an empirical pa-

rameter) to store negative samples. The queue includes the

embeddings from P0in recent batches, to enlarge the negative

sample pool, since more negative samples help produce more

robust embeddings [16], [17]. To reuse negative samples

from recent batches, the parameters of F0and P0should

change smoothly between batches. We follow the momentum

update [16] procedure to satisfy this requirement:

ΘF0=mΘF0+(1−m)ΘF; ΘP0=mΘP0+(1−m)ΘP(3)

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ContrastiveTrajectorySimilarityLearningwithDual-FeatureAttentionYanchuanChang1,JianzhongQi1,§YuxuanLiang2,EgemenTanin11TheUniversityofMelbourne,2NationalUniversityofSingaporefyanchuanc@student.,jianzhong.qi@,etanin@gunimelb.edu.au,yuxliang@outlook.comAbstractTrajectorysimilaritymeasuresactasquerypr...

展开>> 收起<<

Contrastive Trajectory Similarity Learning with Dual-Feature Attention Yanchuan Chang1 Jianzhong Qi1Yuxuan Liang2 Egemen Tanin1.pdf

共15页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Contrastive Trajectory Similarity Learning with Dual-Feature Attention Yanchuan Chang1 Jianzhong Qi1Yuxuan Liang2 Egemen Tanin1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: