Reachability Embeddings Scalable Self-Supervised Representation Learning from Spatiotemporal Motion Trajectories for Multimodal Computer Vision

2025-04-29 0 0 2.1MB 3 页 10玖币
侵权投诉
Reachability Embeddings: Scalable Self-Supervised
Representation Learning from Spatiotemporal Motion
Trajectories for Multimodal Computer Vision
Swetava Ganguli
, C. V. Krishnakumar Iyer, Vipul Pandey
Apple
{swetava,cvk,vipul}@apple.com
1 Introduction
GPS Record
Deduced Trajectory
s
s
s
Absorption
Transition
(s,s )
Emission
Transition
(s ,s)
s
s NδR
s
{T1,T2,T3}
Ψea
s
1
1
1
1
1
1
3
1
1
1
1
1
1
1
1
3
1
1
1
1
s Ξs
s Λs
Ξs
Λs
s
s
1
1
1
1
1
1
3
1
1
1
1
1
1
1
1
3
1
1
1
1
s
s
h(s)∈ ℝdR
w
h
c
Pixel&=&Zoom-24&Tile
Each&number&along&the&channel&dimension&
corresponds&to&a&meaningful&real&number&
associated&with&the&zoom-24&tile&
w
h
c
Pixel&=&Zoom-24&Tile
Each&number&along&the&channel&dimension&
corresponds&to&a&meaningful&real&number&
associated&with&the&zoom-24&tile&
Nodes Zoom-24 Tiles Pixels
Figure 1: Generation of reachability embeddings.
Graphs are natural data structures for representing geospa-
tial datasets (e.g., road networks, point clouds, 3D ob-
ject meshes) with natural definitions of nodes and edges.
Instead of hand-engineering task-specific and domain-
specific features for nodes in graphs, recent methods
[
2
,
6
,
8
] have focused on automatically learning low-
dimensional, feature vector representations called node
embeddings. In parallel, self-supervised learning has been
an area of active and promising research. Self-supervised
representation learning techniques utilize large datasets
without semantic annotations to learn meaningful, univer-
sal features that can be conveniently transferred to solve
a wide variety of downstream supervised tasks. In [
4
],
we propose a novel self-supervised method for learning
representations of geographic locations from observed un-
labeled GPS trajectories that can be used by itself or can
be combined with other image-like data modalities to solve downstream geospatial computer vision tasks. A spatial
proximity-preserving graph representation of the earth surface is inferred from observed mobility trajectories that is used
to cast the geospatial representation learning task into a task of learning self-supervised node embeddings. Reachability
embeddings serve as task-agnostic, feature representations of geographic locations. Using reachability embeddings as
pixel representations for five different downstream geospatial tasks, cast as supervised semantic segmentation problems,
we quantitatively demonstrate that reachability embeddings are semantically meaningful representations and result in
4–23% gain in performance, as measured using area under the precision-recall curve (AUPRC) metric, when compared to
baseline models that use pixel representations (called Local Aggregate Representations (LAR)) that do not account for the
spatial connectivity between tiles [
10
]. Reachability embeddings transform sequential, spatiotemporal motion trajectory
data into semantically meaningful image-like tensor representations that can be combined (multimodal fusion) with other
data modalities (on machine learning platforms such as Trinity [
7
]) that are or can be transformed into image-like tensor
representations (for e.g., RBG imagery, graph embeddings of road networks, passively collected imagery like SAR, etc.) to
facilitate multimodal learning in geospatial computer vision. Multimodal computer vision is critical for training machine
learning models for geospatial feature detection to keep a geospatial mapping service up-to-date in real-time and can
significantly improve user experience and above all, user safety.
2 The Reachability Embeddings Algorithm Proposed in [4]
AGPS trajectory encodes spatiotemporal movement of an object as a chronologically ordered sequence of GPS records (a
tuple of timestamp and the location’s zoom-24 tile [
1
]). Let
T
represent the set of all available GPS trajectories during the
time interval
[t0, t0+ ∆t]
such that all GPS records in each trajectory are associated with the same motion modality (e.g.,
driving, walking, biking). The Earth Surface Graph (ESG),
GES (VES ,EES )
, is defined as the inferred graph obtained
from a raster representation of the earth’s surface based on zoom-24 tiles with these tiles as nodes,
VES
. GPS trajectories
are modeled as allowed Markovian paths on the ESG thereby defining the edges of the ESG,
EES
, as the observed
transitions between nodes in
T
. In summary, the following equivalences are defined: (i) zoom-24 tile
Markovian state
Corresponding author. Alternative EMail: swetava@cs.stanford.edu
Abstract accepted for poster presentation at BayLearn 2022.
arXiv:2210.03289v1 [cs.CV] 7 Oct 2022
摘要:

ReachabilityEmbeddings:ScalableSelf-SupervisedRepresentationLearningfromSpatiotemporalMotionTrajectoriesforMultimodalComputerVisionSwetavaGanguli,C.V.KrishnakumarIyer,VipulPandeyApple{swetava,cvk,vipul}@apple.com1IntroductionFigure1:Generationofreachabilityembeddings.Graphsarenaturaldatastructuresf...

展开>> 收起<<
Reachability Embeddings Scalable Self-Supervised Representation Learning from Spatiotemporal Motion Trajectories for Multimodal Computer Vision.pdf

共3页,预览1页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:3 页 大小:2.1MB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 3
客服
关注