1 Learning Location from Shared Elevation Proﬁles in Fitness Apps A Privacy Perspective

2025-04-30 0 0 7.06MB 16 页 10玖币

Learning Location from Shared Elevation

Proﬁles in Fitness Apps: A Privacy Perspective

Ulku Meteriz-Yildiran, Necip Fazil Yildiran, Joongheon Kim, and David Mohaisen

Abstract—The extensive use of smartphones and wearable devices has facilitated many useful applications. For example, with Global

Positioning System (GPS)-equipped smart and wearable devices, many applications can gather, process, and share rich metadata,

such as geolocation, trajectories, elevation, and time. For example, ﬁtness applications, such as Runkeeper and Strava, utilize the

information for activity tracking and have recently witnessed a boom in popularity. Those ﬁtness tracker applications have their own web

platforms and allow users to share activities on such platforms or even with other social network platforms. To preserve the privacy of

users while allowing sharing, several of those platforms may allow users to disclose partial information, such as the elevation proﬁle for

an activity, which supposedly would not leak the location of the users. In this work, and as a cautionary tale, we create a proof of

concept where we examine the extent to which elevation proﬁles can be used to predict the location of users. To tackle this problem, we

devise three plausible threat settings under which the city or borough of the targets can be predicted. Those threat settings deﬁne the

amount of information available to the adversary to launch the prediction attacks. Establishing that simple features of elevation proﬁles,

e.g., spectral features, are insufﬁcient, we devise both natural language processing (NLP)-inspired text-like representation and

computer vision-inspired image-like representation of elevation proﬁles, and we convert the problem at hand into text and image

classiﬁcation problem. We use both traditional machine learning- and deep learning-based techniques and achieve a prediction

success rate ranging from 59.59% to 99.80%. The ﬁndings are alarming, highlighting that sharing elevation information may have

signiﬁcant location privacy risks.

Index Terms—location privacy, privacy breach, privacy in social media, ﬁtness applications, natural language processing, applied

machine learning

1 INTRODUCTION

From smartphones to wearables, an increasing number of

Internet of Things (IoT) devices are equipped with Global

Positioning System (GPS), accelerometers, and gyroscopes

to allow applications to function or to present a better user

experience using geodata, such as location and elevation

information. More recently, ﬁtness applications that run on

smartphones and smartwatches used these components to

collect spatial, temporal, and activity-speciﬁc information

to analyze, summarize, and visualize users’ activities. By

analyzing each activity, many of those applications deliver

personalized motivations and challenges for users to meet

their goals. Using social media support of these applications

for sharing updates about users’ activities, including train-

ing routes and elevation proﬁles for the routes taken for

an activity (e.g., walking, running, climbing, cycling), users

can have positive behavioral changes through a more active

lifestyle motivated by competitions with acquaintances [2].

Despite the broad set of advantages that geodata offers,

geodata usage and uncontrolled sharing can pose a signif-

•U. Meteriz-Yildiran is with Meta and N. Yildiran is with Google; the work

of both authors was done while they were at the University of Central

Florida. D. Mohaisen is with the Department of Computer Science,

University of Central Florida, Orlando, FL 32816, USA. J. Kim is with the

Department of Electrical Engineering at Korea University, Republic of Ko-

rea. D. Mohaisen is the corresponding author (e-mail: mohaisen@ucf.edu).

An earlier version of this work has appeared in IEEE ICDCS 2020 [1]. This

work was supported in part by NRF under grant 2016K1A1A2912757

and CyberFlorida Seed Grant (2021/2022). J. Kim was supported by NRF

under grant 2022R1A2C20048690. J. Kim (joongheon@korea.ac.kr) and

D. Mohaisen (mohaisen@ucf.edu) are the corresponding authors.

76%

51%

15%

36%

10%

0% 20% 40% 60% 80% 100%

Finish

Start

Home School Work Others

Fig. 1. Survey results for understanding users’ behavior with starting

point statistics and ﬁnishing point statistics. While 90% of the 60 par-

ticipants indicated their start of activity is either home, school, or work,

an overwhelming 98% of the participant indicated those to be the ﬁnish

(end) point of their activities.

icant privacy risk that can be further exploited in multiple

attacks, including stalking [3] and cybercasing [4]. For ex-

ample, with a large amount of geotagged data, including

text, images, and videos, cybercasing provides criminals and

maliciously motivated individuals with a signiﬁcant attack

vector. Geo-tagged photos that are frequently posted on

image-sharing websites, such as Flickr, or second-hand sale

websites, such as Craigslist, may put owners of those images

at risk. For example, geo-tagged images posted on sales

websites may reveal the location of the advertised product,

leading to trespassing or even theft.

While geodata recorded by ﬁtness applications is indeed

important and valuable for the operation of those applica-

tions, this data can also be used for launching attacks on

users by breaching their privacy since sensitive information

of users, such as home or workplace location, can be easily

inferred from such data. Even worse, a large number of

users, when sharing such information, would be unaware

of the ramiﬁcations of sharing and the potential risk of

inferring such contextual information, such as home, work

arXiv:2210.15529v1 [cs.CR] 27 Oct 2022

location, etc., from such shared location data. To support this

argument, we conducted an online survey with 60 partici-

pants who regularly use ﬁtness applications outdoors. The

results of the survey, summarized in Figure 1, reveal that

51% of the participants start their training from their homes,

36% start from their school, and 3% start from their work-

place, while 76% of the participants ﬁnish their training at

their homes. Moreover, for the same set of users (results are

not shown in Figure 1), 42% of those users have indicated

that not sharing location information implies privacy protec-

tion, while 30% of the respondent were uncertain, and 28%

were certain that not sharing would not necessarily mean

their privacy is protected. The mixed responses highlight

the gap between reality and expectations of privacy when

sharing location information online and call for further

investigation.

Although it is possible to hide the location trajectory

by removing the activity map in the ﬁtness applications,

users still want to share elevation proﬁles or certain statistics

of the activity to show the roughness, technicality, and

difﬁculty of the routes they took as a measure of their

workout. For example, up until recently, users have been de-

manding those ﬁtness applications to allow for ﬁne-grained

and customized access control by allowing them to share

the elevation proﬁle of an activity while masking the map

that highlights the actual trajectory, which is deemed of high

privacy value to them [5]–[8].

In the same survey we conducted earlier, we asked our

60 subjects “while sharing an outdoor workout record, do

you think hiding the map and sharing only the statistics

of your training (such as speed and elevation changes)

is enough for protecting your privacy?”. The results were

overwhelmingly positive, with 25 of them indicating “yes”,

18 indicating “maybe” (together accounting for more than

71%), and only 17 indicating “no”.

Is sharing the elevation proﬁle of activity enough to

maintain the privacy of users? In this paper, we argue

that an approximate location, extracted from the contexts of

activities and at different levels of location granularity, could

still be revealed from the elevation proﬁle information. We

examine this problem comprehensively and develop tech-

niques that can be used to accurately associate an elevation

proﬁle with contextual information, such as the location.

Contributions. In this paper, we contribute the following:

•we translate the problem of location privacy inference

from elevation proﬁles into text classiﬁcation and image

classiﬁcation problems by encoding the elevation signals

as strings and visualizing the elevation signals as images

to employ various common approaches for solving image

and text classiﬁcation problems,

•we investigate the possible attack surface for the problem

by exploring three different threat models, which we

later use to evaluate the success of our approaches by

simulating our methods considering each threat model,

•we demonstrate that location information can be

predicted from elevation proﬁle using different ma-

chine/deep learning methods with accuracy in the range

80.25% −99.80% at different resolutions.

We note that examining the effect of the attack using

a large-scale in-the-wild case study is impractical as service

TABLE 1

Popular ﬁtness applications and their features. ET: Exercise tracking.

SS: Ability to share to social media. SNS: Social networking capabilities

in the service. PR: Private records. BU: User blocking capability.

Service ET SS SNS PR BU

Strava • • • • •

Runtastic • • • ◦ •

Runkeeper • • • • ◦

Nike+ Running • • • • ◦

MapMyRun • • • • ◦

providers prevent the use of their data for tracking by a third

party. However, to motivate the effect of the attack, we con-

sider the scenario of an informed adversary who knows the

city where a victim with the exposed elevation proﬁles for

associated activities lives. As such, the adversary proceeds

by proﬁling the city and collecting elevation proﬁles for

different segments within the city. One can see how easily

such an adversary will be able to contextualize the elevation

proﬁles of the victim’s activity further by narrowing it

down to a few candidate precomputed elevation proﬁles.

Given the adversary’s awareness of the mapping between

the location and the proﬁles, the adversary will be able

to easily infer valuable information about the habits of the

victim by associating, for instance, end, start, and stopping

points on the elevation proﬁle, with points of interest (cafes,

workplace, etc.).

Organization. We present the background in section 2,

the threat model in section 3, a high-level overview of

our approach in section 4, the implementation details are

presented in section 5, the evaluation results in section 6,

further discussions in section 7, the related work in section 8,

and concluding remarks in section 9.

2 BACKGROUND

In this section, we provide some background information

highlighting the signiﬁcance of elevation proﬁles for ath-

letes, the use cases, some properties of the ﬁtness applica-

tions on the market today, and some reported privacy breach

incidences of ﬁtness applications to contextualize further the

work presented in the rest of this paper.

2.1 Elevation Proﬁles Importance for Athletes

Athletes who keep track of their activity records measure

various modalities and attributes associated with the activ-

ities, including the distance, speed, overall time, and heart

rate over the course of the activity. Based on these attributes,

they adjust their training strategies to reach their goals.

Elevation changes, often reported in the form of elevation

gain, are one of the most signiﬁcant attributes measuring

the performance of a cyclist/runner and often depict how

hard the run or ride is. For example, riding a bike for a 20-

mile ride while climbing 1000 feet in total is signiﬁcantly

more challenging than biking on a ﬂat terrain [9]. Therefore,

when recording or sharing a ride/run, athletes care about

the changes in the elevation, thus elevation proﬁles.

2.2 Fitness Applications & Privacy Breach Incidents

Fitness applications allow users to track their workout his-

tory and provide them with statistics. Moreover, some ﬁt-

ness applications have social network capabilities, as shown

in Table 1, and allow users to share workout summaries that

are known to motivate users and their social network con-

nections to achieve their goals [2]. Some ﬁtness applications

also inherit user-blocking features and capabilities from

social network platforms, including user privacy options

such as private records–the activity records that are only

visible to the user.

Although ﬁtness applications have conﬁgurable privacy

options, there have been a lot of privacy incidents concern-

ing location data obtained from those ﬁtness applications.

We review some of those privacy breaches in the following

to contextualize our work in the broader privacy literature.

Revealing Secret U.S. Military Bases. Strava, which is

one the most popular ﬁtness tracking applications in the

market today, collects users’ public data and publishes a

heatmap of the aggregates to highlight routes frequented

by users [10]. Although the aggregates in the heatmap do

not explicitly contain any identity information, activities in

desolate places revealed the location of many U.S. military

bases, which is considered sensitive information [11], [12].

Deanonymization Through Strava Segments. In Strava, the

heatmap feature was used to show “heat” made by the

aggregated and public activities of Strava users over the

past year. It is, however, shown that a dedicated adversary

can deanonymize heatmap to ﬁnd out users who ran in a

speciﬁed route [13]. For example, by selecting a route from

the heatmap, a registered user can manually create a GPS

eXchange (GPX) track ﬁle and create a segment using it on

Strava. A segment is a portion of a road or a trail where

athletes compare their ﬁnishing times. Consequently, once

this segment is created, the users who previously ran that

route are shown on the leaderboard grouped by gender and

age. This feature is then leveraged to identify individuals

who ran that particular place.

Tracking and Bicycle Theft. Users of ﬁtness applications

can share information related to the equipment used for

the activity, including bicycles, tracking devices, shoes, etc.,

along with the routes frequented. The combined shared

information makes them a target for robbery, and several

such incidents of bicycle theft are reported [14]–[17].

Attack on Privacy Zone. To cope with the increasing privacy

risks, Strava features privacy zones, a technique to obfuscate

the exact start and end points of a route. A recent study [18]

has demonstrated that it is possible to reveal the exact start

and end point of a route that utilizes the privacy zone

feature. The same study also claimed that around 95% of

the users are at risk of revealing their location information.

Live Activity Breach. In Runtastic, one of the popular

activity-tracking applications, users can share their live

activities. In theory, users should be able to conﬁgure the

privacy settings for their activities such that only privileged

users, such as connections on the application platform, can

track the shared live activity session. However, it has been

demonstrated [19] that the selected privacy settings are not

correctly applied to a live session. As a result, everyone can

go through live sessions and track Runtastic users in real

time, even though the associated privacy options should

have prevented this type of breach. Based on this incident, it

would be easy to stalk and locate a user, e.g., a lone runner

or cyclist with expensive equipment, in real time.

3 THREAT MODELS

We outline the potential threat models under which this

study is conducted. We describe three models under which

location privacy is breached only from associated elevation

proﬁles. We note that the following threat models are only

hypothetical: no attacks were actually launched on any

users. As mentioned earlier, this study in its entirety is

motivated by the aforementioned demands of users to have

more ﬂexibility over-sharing partial data, such as elevation

proﬁles, and examines the ramiﬁcations of such sharing in

a hypothetical setting. We note, however, that those settings

are also plausible if such sharing is enabled.

Our study utilizes three threat models: TM-1,TM-2, and

TM-3, which we outline below with their justiﬁcations. The

adversarial capabilities in TM-1 are greater than in TM-2 and

TM-3, making it a more restrictive (powerful) model.

1TM-1.In TM-1, we assume an adversary with workout

history records of a target user, and the goal of the adversary

is to identify the last workout location of the target user from

the recently shared elevation proﬁles. TM-1 is justiﬁed by

multiple plausible scenarios in practice. For example, such

an adversary might have been a previous social network

connection of the target user that was later blocked. In

such a scenario, the adversary may have previous workout

records of the target from which the adversary may attempt

to de-anonymize the target’s activities. Another example

might include group activities, where two individuals (i.e.,

the adversary and target) may have shared the same route at

some point. In either case, by knowing the target’s previous

ﬁtness activity records, the main goal of the adversary in

this model is to identify recent whereabouts only from

publicly shared elevation proﬁles in workout summaries,

thus breaching the target’s location privacy.

2TM-2.In TM-2, we assume an adversary with access to

limited information, such as the city where the target lives.

Such information is easily accessible from public proﬁle

summaries, athlinks.com, public records, etc. The adver-

sary’s goal in TM-2 is to ﬁnd out which region or part of a

given city the target’s activities are associated with. The TM-

2use scenario may include a targeted user sharing private

activities in which the route is hidden while the elevation

proﬁle is shown. The adversary, knowing the city where

the target lives, would want to identify the region (e.g., a

borough in the city) associated with the user’s activity.

3TM-3.In TM-3, we assume an adversary trying to identify

the target user’s city using only publicly shared elevation

proﬁles without any prior information. We assume, how-

ever, the adversary has the ability to proﬁle the elevation of

cities with information that is easily obtained from public

sources (e.g., Google Maps, OpenStreetMap). The use sce-

nario of TM-3 may be used as a stepping stone towards

launching the attack scenario in TM-2 upon narrowing

down the search space to a city.

4 APPROACH: HIGH-LEVEL OVERVIEW

In this section, we give a brief overview of our pipeline,

which consists of the data collection, preprocessing, feature

extraction, and classiﬁcation as illustrated in Figure 2. Each

phase of the pipeline is detailed in section 5.

Data Collection. We collected three datasets with varying

and rich characteristics, namely (i) user-speciﬁc activity data

collected from an athlete, (ii) mined training route seg-

ments grouped at city-level, and (iii) mined training route

segments grouped at borough-level. For the user-speciﬁc

dataset, we collected physical activity records of athletes

and converted those activities to an intermediate format,

the GPS Exchange Format (GPX). Then, we parsed the GPX

ﬁles and manually labeled them according to the latitude

and longitude information included within each ﬁle. For the

second dataset, we mined training route segments from a

popular ﬁtness tracking website by specifying the location

boundaries, i.e., the class label of the mined data, and

augmented each segment with the corresponding elevation

proﬁles obtained from Google Maps Elevation API. Finally,

we similarly constructed the borough-level dataset as in the

city-level dataset.

Preprocessing. We employ Natural Language Processing

(NLP) and computer vision techniques to convert the prob-

lem to text classiﬁcation and image classiﬁcation problems,

respectively. To this end, we prepare the data accordingly

in the preprocessing phase. Preprocessing consists of two

parts: (i) text-like and (ii) image-like representations.

For text-like representation, we discretize the elevation

signals and compute the minimum required word size. We

then create a mapping between each unique discrete value

and a string. By mapping the string correspondents to the

unique discrete values, we encode the elevation proﬁles in

text. We, then, form a vocabulary from the text sequences of

each dataset using the n-grams.

To obtain image-like representations, we convert the

elevation proﬁles to a ﬁxed-sized line graph where the x-

axis stands for time and the y-axis stands for the elevation

values. We also color the lines in the graphs to represent the

elevation interval in which the elevation proﬁles range.

Feature Extraction. The classiﬁcation algorithms operate on

high-quality and discriminative features obtained from the

representations of elevation proﬁles. For feature extraction,

we utilize NLP and computer vision approaches.

To employ NLP approaches using the vocabulary ob-

tained in preprocessing phase, we represent each elevation

proﬁle as either a feature vector based on the vocabulary

frequency in the text-like representation (bag-of-words vec-

tor) or as a term frequency-inverse document frequency

(tf-idf) vector. To employ computer vision approaches, we

utilize Convolutional Neural Networks (CNN) over image-

like representations. The optimal features of an image-like

representation are efﬁciently extracted by the convolutional

and pooling layers in the CNN architecture.

Multi-Class Classiﬁcation. We utilize various machine

learning and deep learning models for classiﬁcation, in-

cluding Support Vector Machine (SVM) and Random Forest

Classiﬁcation (RF), Multi-Layer Perceptron (MLP), Long

Short-Term Memory (LSTM), 1D Convolutional Neural Net-

work (C1D), and 2D Convolutional Neural Network (CNN).

5 IMPLEMENTATION DETAILS

The implementation details of data collection, preprocess-

ing, feature extraction, and multi-class classiﬁcation are

addressed in the following subsections.

5.1 Data Collection

In this study, we compiled three datasets: the user-speciﬁc

dataset, the city-level dataset, and the borough-level dataset.

The user-speciﬁc dataset is retrieved from a voluntary ath-

lete who frequently records activities through ﬁtness appli-

cations. It offers dense and thorough coverage of regions

frequented by the user; those regions are used as class labels.

The city-level and borough-level datasets are created from

scratch by collecting location trajectories that are created and

frequented by the athletes. Both city-level and borough-level

datasets provide sparse coverage of cities and boroughs.

5.1.1 User-Speciﬁc Dataset

For the user-speciﬁc dataset, we collected activity data,

including each activity’s location trajectory and the corre-

sponding elevation proﬁle from a voluntary athlete who

records activities frequently through ﬁtness applications.

First, the location trajectories included in the user-speciﬁc

dataset are converted to GPX format to avoid confusion

caused by different formats and settings across the activity

records. Then, to label the samples, the maximum and

minimum coordinates of each location trajectory are fetched.

Each sample location trajectory is encapsulated with a tight

rectangle whose top right (North East) and bottom left

(South West) corners are computed from the maximum

and minimum coordinates of the trajectory as illustrated in

Figure 4. To classify the samples, each rectangle encapsulat-

ing the trajectory is compared with the previously created

regions. If the Euclidean distance between the center of

the rectangle and the center of the existing region does

not exceed a predetermined threshold, the rectangle and its

corresponding sample are labeled with a unique identity

of the region. Then, we annotated the region labels, such

as Orlando, Washington DC etc., based on the manual

observation on the map. If no region includes the trajectory,

a new region is created. The ﬁnal sample size distribution of

the user-speciﬁc dataset is shown in Table 2.

The user-speciﬁc dataset is prone to have similar loca-

tion trajectory portions across its samples since the user

may frequent the same set of places in his/her everyday

activities, such as the location trace they follow while leav-

ing/arriving home or their favorite routes. Therefore, we

calculated the average overlap ratio of the routes included

in the user-speciﬁc dataset by comparing each sample with

the other samples with the same class label. For each sam-

ple pair comparison, the overlap ratio is calculated as the

intersection-over-union of the tight rectangles encapsulating

the sample routes. The average overlap ratio of the user-

speciﬁc dataset is calculated as 35%.

5.1.2 City-Level Dataset

For the city-level dataset, we mined publicly available

training route segments in a popular ﬁtness tracking ap-

plication using its EXPLORESEGMENTS() functionality. We

note that our experiments do not put any users at risk and

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

1LearningLocationfromSharedElevationProlesinFitnessApps:APrivacyPerspectiveUlkuMeteriz-Yildiran,NecipFazilYildiran,JoongheonKim,andDavidMohaisenAbstractTheextensiveuseofsmartphonesandwearabledeviceshasfacilitatedmanyusefulapplications.Forexample,withGlobalPositioningSystem(GPS)-equippedsmartandwea...

展开>> 收起<<

1 Learning Location from Shared Elevation Proﬁles in Fitness Apps A Privacy Perspective.pdf

共16页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

1 Learning Location from Shared Elevation Proﬁles in Fitness Apps A Privacy Perspective

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: