1 Learning Location from Shared Elevation Profiles in Fitness Apps A Privacy Perspective

2025-04-30 0 0 7.06MB 16 页 10玖币
侵权投诉
1
Learning Location from Shared Elevation
Profiles in Fitness Apps: A Privacy Perspective
Ulku Meteriz-Yildiran, Necip Fazil Yildiran, Joongheon Kim, and David Mohaisen
Abstract—The extensive use of smartphones and wearable devices has facilitated many useful applications. For example, with Global
Positioning System (GPS)-equipped smart and wearable devices, many applications can gather, process, and share rich metadata,
such as geolocation, trajectories, elevation, and time. For example, fitness applications, such as Runkeeper and Strava, utilize the
information for activity tracking and have recently witnessed a boom in popularity. Those fitness tracker applications have their own web
platforms and allow users to share activities on such platforms or even with other social network platforms. To preserve the privacy of
users while allowing sharing, several of those platforms may allow users to disclose partial information, such as the elevation profile for
an activity, which supposedly would not leak the location of the users. In this work, and as a cautionary tale, we create a proof of
concept where we examine the extent to which elevation profiles can be used to predict the location of users. To tackle this problem, we
devise three plausible threat settings under which the city or borough of the targets can be predicted. Those threat settings define the
amount of information available to the adversary to launch the prediction attacks. Establishing that simple features of elevation profiles,
e.g., spectral features, are insufficient, we devise both natural language processing (NLP)-inspired text-like representation and
computer vision-inspired image-like representation of elevation profiles, and we convert the problem at hand into text and image
classification problem. We use both traditional machine learning- and deep learning-based techniques and achieve a prediction
success rate ranging from 59.59% to 99.80%. The findings are alarming, highlighting that sharing elevation information may have
significant location privacy risks.
Index Terms—location privacy, privacy breach, privacy in social media, fitness applications, natural language processing, applied
machine learning
F
1 INTRODUCTION
From smartphones to wearables, an increasing number of
Internet of Things (IoT) devices are equipped with Global
Positioning System (GPS), accelerometers, and gyroscopes
to allow applications to function or to present a better user
experience using geodata, such as location and elevation
information. More recently, fitness applications that run on
smartphones and smartwatches used these components to
collect spatial, temporal, and activity-specific information
to analyze, summarize, and visualize users’ activities. By
analyzing each activity, many of those applications deliver
personalized motivations and challenges for users to meet
their goals. Using social media support of these applications
for sharing updates about users’ activities, including train-
ing routes and elevation profiles for the routes taken for
an activity (e.g., walking, running, climbing, cycling), users
can have positive behavioral changes through a more active
lifestyle motivated by competitions with acquaintances [2].
Despite the broad set of advantages that geodata offers,
geodata usage and uncontrolled sharing can pose a signif-
U. Meteriz-Yildiran is with Meta and N. Yildiran is with Google; the work
of both authors was done while they were at the University of Central
Florida. D. Mohaisen is with the Department of Computer Science,
University of Central Florida, Orlando, FL 32816, USA. J. Kim is with the
Department of Electrical Engineering at Korea University, Republic of Ko-
rea. D. Mohaisen is the corresponding author (e-mail: mohaisen@ucf.edu).
An earlier version of this work has appeared in IEEE ICDCS 2020 [1]. This
work was supported in part by NRF under grant 2016K1A1A2912757
and CyberFlorida Seed Grant (2021/2022). J. Kim was supported by NRF
under grant 2022R1A2C20048690. J. Kim (joongheon@korea.ac.kr) and
D. Mohaisen (mohaisen@ucf.edu) are the corresponding authors.
76%
51%
15%
36%
7%
3%
2%
10%
0% 20% 40% 60% 80% 100%
Finish
Start
Home School Work Others
Fig. 1. Survey results for understanding users’ behavior with starting
point statistics and finishing point statistics. While 90% of the 60 par-
ticipants indicated their start of activity is either home, school, or work,
an overwhelming 98% of the participant indicated those to be the finish
(end) point of their activities.
icant privacy risk that can be further exploited in multiple
attacks, including stalking [3] and cybercasing [4]. For ex-
ample, with a large amount of geotagged data, including
text, images, and videos, cybercasing provides criminals and
maliciously motivated individuals with a significant attack
vector. Geo-tagged photos that are frequently posted on
image-sharing websites, such as Flickr, or second-hand sale
websites, such as Craigslist, may put owners of those images
at risk. For example, geo-tagged images posted on sales
websites may reveal the location of the advertised product,
leading to trespassing or even theft.
While geodata recorded by fitness applications is indeed
important and valuable for the operation of those applica-
tions, this data can also be used for launching attacks on
users by breaching their privacy since sensitive information
of users, such as home or workplace location, can be easily
inferred from such data. Even worse, a large number of
users, when sharing such information, would be unaware
of the ramifications of sharing and the potential risk of
inferring such contextual information, such as home, work
arXiv:2210.15529v1 [cs.CR] 27 Oct 2022
location, etc., from such shared location data. To support this
argument, we conducted an online survey with 60 partici-
pants who regularly use fitness applications outdoors. The
results of the survey, summarized in Figure 1, reveal that
51% of the participants start their training from their homes,
36% start from their school, and 3% start from their work-
place, while 76% of the participants finish their training at
their homes. Moreover, for the same set of users (results are
not shown in Figure 1), 42% of those users have indicated
that not sharing location information implies privacy protec-
tion, while 30% of the respondent were uncertain, and 28%
were certain that not sharing would not necessarily mean
their privacy is protected. The mixed responses highlight
the gap between reality and expectations of privacy when
sharing location information online and call for further
investigation.
Although it is possible to hide the location trajectory
by removing the activity map in the fitness applications,
users still want to share elevation profiles or certain statistics
of the activity to show the roughness, technicality, and
difficulty of the routes they took as a measure of their
workout. For example, up until recently, users have been de-
manding those fitness applications to allow for fine-grained
and customized access control by allowing them to share
the elevation profile of an activity while masking the map
that highlights the actual trajectory, which is deemed of high
privacy value to them [5]–[8].
In the same survey we conducted earlier, we asked our
60 subjects “while sharing an outdoor workout record, do
you think hiding the map and sharing only the statistics
of your training (such as speed and elevation changes)
is enough for protecting your privacy?”. The results were
overwhelmingly positive, with 25 of them indicating “yes”,
18 indicating “maybe” (together accounting for more than
71%), and only 17 indicating “no”.
Is sharing the elevation profile of activity enough to
maintain the privacy of users? In this paper, we argue
that an approximate location, extracted from the contexts of
activities and at different levels of location granularity, could
still be revealed from the elevation profile information. We
examine this problem comprehensively and develop tech-
niques that can be used to accurately associate an elevation
profile with contextual information, such as the location.
Contributions. In this paper, we contribute the following:
we translate the problem of location privacy inference
from elevation profiles into text classification and image
classification problems by encoding the elevation signals
as strings and visualizing the elevation signals as images
to employ various common approaches for solving image
and text classification problems,
we investigate the possible attack surface for the problem
by exploring three different threat models, which we
later use to evaluate the success of our approaches by
simulating our methods considering each threat model,
we demonstrate that location information can be
predicted from elevation profile using different ma-
chine/deep learning methods with accuracy in the range
80.25% 99.80% at different resolutions.
We note that examining the effect of the attack using
a large-scale in-the-wild case study is impractical as service
TABLE 1
Popular fitness applications and their features. ET: Exercise tracking.
SS: Ability to share to social media. SNS: Social networking capabilities
in the service. PR: Private records. BU: User blocking capability.
Service ET SS SNS PR BU
Strava • •
Runtastic • •
Runkeeper • •
Nike+ Running • •
MapMyRun • •
providers prevent the use of their data for tracking by a third
party. However, to motivate the effect of the attack, we con-
sider the scenario of an informed adversary who knows the
city where a victim with the exposed elevation profiles for
associated activities lives. As such, the adversary proceeds
by profiling the city and collecting elevation profiles for
different segments within the city. One can see how easily
such an adversary will be able to contextualize the elevation
profiles of the victim’s activity further by narrowing it
down to a few candidate precomputed elevation profiles.
Given the adversary’s awareness of the mapping between
the location and the profiles, the adversary will be able
to easily infer valuable information about the habits of the
victim by associating, for instance, end, start, and stopping
points on the elevation profile, with points of interest (cafes,
workplace, etc.).
Organization. We present the background in section 2,
the threat model in section 3, a high-level overview of
our approach in section 4, the implementation details are
presented in section 5, the evaluation results in section 6,
further discussions in section 7, the related work in section 8,
and concluding remarks in section 9.
2 BACKGROUND
In this section, we provide some background information
highlighting the significance of elevation profiles for ath-
letes, the use cases, some properties of the fitness applica-
tions on the market today, and some reported privacy breach
incidences of fitness applications to contextualize further the
work presented in the rest of this paper.
2.1 Elevation Profiles Importance for Athletes
Athletes who keep track of their activity records measure
various modalities and attributes associated with the activ-
ities, including the distance, speed, overall time, and heart
rate over the course of the activity. Based on these attributes,
they adjust their training strategies to reach their goals.
Elevation changes, often reported in the form of elevation
gain, are one of the most significant attributes measuring
the performance of a cyclist/runner and often depict how
hard the run or ride is. For example, riding a bike for a 20-
mile ride while climbing 1000 feet in total is significantly
more challenging than biking on a flat terrain [9]. Therefore,
when recording or sharing a ride/run, athletes care about
the changes in the elevation, thus elevation profiles.
2
2.2 Fitness Applications & Privacy Breach Incidents
Fitness applications allow users to track their workout his-
tory and provide them with statistics. Moreover, some fit-
ness applications have social network capabilities, as shown
in Table 1, and allow users to share workout summaries that
are known to motivate users and their social network con-
nections to achieve their goals [2]. Some fitness applications
also inherit user-blocking features and capabilities from
social network platforms, including user privacy options
such as private records–the activity records that are only
visible to the user.
Although fitness applications have configurable privacy
options, there have been a lot of privacy incidents concern-
ing location data obtained from those fitness applications.
We review some of those privacy breaches in the following
to contextualize our work in the broader privacy literature.
Revealing Secret U.S. Military Bases. Strava, which is
one the most popular fitness tracking applications in the
market today, collects users’ public data and publishes a
heatmap of the aggregates to highlight routes frequented
by users [10]. Although the aggregates in the heatmap do
not explicitly contain any identity information, activities in
desolate places revealed the location of many U.S. military
bases, which is considered sensitive information [11], [12].
Deanonymization Through Strava Segments. In Strava, the
heatmap feature was used to show “heat” made by the
aggregated and public activities of Strava users over the
past year. It is, however, shown that a dedicated adversary
can deanonymize heatmap to find out users who ran in a
specified route [13]. For example, by selecting a route from
the heatmap, a registered user can manually create a GPS
eXchange (GPX) track file and create a segment using it on
Strava. A segment is a portion of a road or a trail where
athletes compare their finishing times. Consequently, once
this segment is created, the users who previously ran that
route are shown on the leaderboard grouped by gender and
age. This feature is then leveraged to identify individuals
who ran that particular place.
Tracking and Bicycle Theft. Users of fitness applications
can share information related to the equipment used for
the activity, including bicycles, tracking devices, shoes, etc.,
along with the routes frequented. The combined shared
information makes them a target for robbery, and several
such incidents of bicycle theft are reported [14]–[17].
Attack on Privacy Zone. To cope with the increasing privacy
risks, Strava features privacy zones, a technique to obfuscate
the exact start and end points of a route. A recent study [18]
has demonstrated that it is possible to reveal the exact start
and end point of a route that utilizes the privacy zone
feature. The same study also claimed that around 95% of
the users are at risk of revealing their location information.
Live Activity Breach. In Runtastic, one of the popular
activity-tracking applications, users can share their live
activities. In theory, users should be able to configure the
privacy settings for their activities such that only privileged
users, such as connections on the application platform, can
track the shared live activity session. However, it has been
demonstrated [19] that the selected privacy settings are not
correctly applied to a live session. As a result, everyone can
go through live sessions and track Runtastic users in real
time, even though the associated privacy options should
have prevented this type of breach. Based on this incident, it
would be easy to stalk and locate a user, e.g., a lone runner
or cyclist with expensive equipment, in real time.
3 THREAT MODELS
We outline the potential threat models under which this
study is conducted. We describe three models under which
location privacy is breached only from associated elevation
profiles. We note that the following threat models are only
hypothetical: no attacks were actually launched on any
users. As mentioned earlier, this study in its entirety is
motivated by the aforementioned demands of users to have
more flexibility over-sharing partial data, such as elevation
profiles, and examines the ramifications of such sharing in
a hypothetical setting. We note, however, that those settings
are also plausible if such sharing is enabled.
Our study utilizes three threat models: TM-1,TM-2, and
TM-3, which we outline below with their justifications. The
adversarial capabilities in TM-1 are greater than in TM-2 and
TM-3, making it a more restrictive (powerful) model.
1TM-1.In TM-1, we assume an adversary with workout
history records of a target user, and the goal of the adversary
is to identify the last workout location of the target user from
the recently shared elevation profiles. TM-1 is justified by
multiple plausible scenarios in practice. For example, such
an adversary might have been a previous social network
connection of the target user that was later blocked. In
such a scenario, the adversary may have previous workout
records of the target from which the adversary may attempt
to de-anonymize the target’s activities. Another example
might include group activities, where two individuals (i.e.,
the adversary and target) may have shared the same route at
some point. In either case, by knowing the target’s previous
fitness activity records, the main goal of the adversary in
this model is to identify recent whereabouts only from
publicly shared elevation profiles in workout summaries,
thus breaching the target’s location privacy.
2TM-2.In TM-2, we assume an adversary with access to
limited information, such as the city where the target lives.
Such information is easily accessible from public profile
summaries, athlinks.com, public records, etc. The adver-
sary’s goal in TM-2 is to find out which region or part of a
given city the target’s activities are associated with. The TM-
2use scenario may include a targeted user sharing private
activities in which the route is hidden while the elevation
profile is shown. The adversary, knowing the city where
the target lives, would want to identify the region (e.g., a
borough in the city) associated with the user’s activity.
3TM-3.In TM-3, we assume an adversary trying to identify
the target user’s city using only publicly shared elevation
profiles without any prior information. We assume, how-
ever, the adversary has the ability to profile the elevation of
cities with information that is easily obtained from public
sources (e.g., Google Maps, OpenStreetMap). The use sce-
nario of TM-3 may be used as a stepping stone towards
launching the attack scenario in TM-2 upon narrowing
down the search space to a city.
3
4 APPROACH: HIGH-LEVEL OVERVIEW
In this section, we give a brief overview of our pipeline,
which consists of the data collection, preprocessing, feature
extraction, and classification as illustrated in Figure 2. Each
phase of the pipeline is detailed in section 5.
Data Collection. We collected three datasets with varying
and rich characteristics, namely (i) user-specific activity data
collected from an athlete, (ii) mined training route seg-
ments grouped at city-level, and (iii) mined training route
segments grouped at borough-level. For the user-specific
dataset, we collected physical activity records of athletes
and converted those activities to an intermediate format,
the GPS Exchange Format (GPX). Then, we parsed the GPX
files and manually labeled them according to the latitude
and longitude information included within each file. For the
second dataset, we mined training route segments from a
popular fitness tracking website by specifying the location
boundaries, i.e., the class label of the mined data, and
augmented each segment with the corresponding elevation
profiles obtained from Google Maps Elevation API. Finally,
we similarly constructed the borough-level dataset as in the
city-level dataset.
Preprocessing. We employ Natural Language Processing
(NLP) and computer vision techniques to convert the prob-
lem to text classification and image classification problems,
respectively. To this end, we prepare the data accordingly
in the preprocessing phase. Preprocessing consists of two
parts: (i) text-like and (ii) image-like representations.
For text-like representation, we discretize the elevation
signals and compute the minimum required word size. We
then create a mapping between each unique discrete value
and a string. By mapping the string correspondents to the
unique discrete values, we encode the elevation profiles in
text. We, then, form a vocabulary from the text sequences of
each dataset using the n-grams.
To obtain image-like representations, we convert the
elevation profiles to a fixed-sized line graph where the x-
axis stands for time and the y-axis stands for the elevation
values. We also color the lines in the graphs to represent the
elevation interval in which the elevation profiles range.
Feature Extraction. The classification algorithms operate on
high-quality and discriminative features obtained from the
representations of elevation profiles. For feature extraction,
we utilize NLP and computer vision approaches.
To employ NLP approaches using the vocabulary ob-
tained in preprocessing phase, we represent each elevation
profile as either a feature vector based on the vocabulary
frequency in the text-like representation (bag-of-words vec-
tor) or as a term frequency-inverse document frequency
(tf-idf) vector. To employ computer vision approaches, we
utilize Convolutional Neural Networks (CNN) over image-
like representations. The optimal features of an image-like
representation are efficiently extracted by the convolutional
and pooling layers in the CNN architecture.
Multi-Class Classification. We utilize various machine
learning and deep learning models for classification, in-
cluding Support Vector Machine (SVM) and Random Forest
Classification (RF), Multi-Layer Perceptron (MLP), Long
Short-Term Memory (LSTM), 1D Convolutional Neural Net-
work (C1D), and 2D Convolutional Neural Network (CNN).
5 IMPLEMENTATION DETAILS
The implementation details of data collection, preprocess-
ing, feature extraction, and multi-class classification are
addressed in the following subsections.
5.1 Data Collection
In this study, we compiled three datasets: the user-specific
dataset, the city-level dataset, and the borough-level dataset.
The user-specific dataset is retrieved from a voluntary ath-
lete who frequently records activities through fitness appli-
cations. It offers dense and thorough coverage of regions
frequented by the user; those regions are used as class labels.
The city-level and borough-level datasets are created from
scratch by collecting location trajectories that are created and
frequented by the athletes. Both city-level and borough-level
datasets provide sparse coverage of cities and boroughs.
5.1.1 User-Specific Dataset
For the user-specific dataset, we collected activity data,
including each activity’s location trajectory and the corre-
sponding elevation profile from a voluntary athlete who
records activities frequently through fitness applications.
First, the location trajectories included in the user-specific
dataset are converted to GPX format to avoid confusion
caused by different formats and settings across the activity
records. Then, to label the samples, the maximum and
minimum coordinates of each location trajectory are fetched.
Each sample location trajectory is encapsulated with a tight
rectangle whose top right (North East) and bottom left
(South West) corners are computed from the maximum
and minimum coordinates of the trajectory as illustrated in
Figure 4. To classify the samples, each rectangle encapsulat-
ing the trajectory is compared with the previously created
regions. If the Euclidean distance between the center of
the rectangle and the center of the existing region does
not exceed a predetermined threshold, the rectangle and its
corresponding sample are labeled with a unique identity
of the region. Then, we annotated the region labels, such
as Orlando, Washington DC etc., based on the manual
observation on the map. If no region includes the trajectory,
a new region is created. The final sample size distribution of
the user-specific dataset is shown in Table 2.
The user-specific dataset is prone to have similar loca-
tion trajectory portions across its samples since the user
may frequent the same set of places in his/her everyday
activities, such as the location trace they follow while leav-
ing/arriving home or their favorite routes. Therefore, we
calculated the average overlap ratio of the routes included
in the user-specific dataset by comparing each sample with
the other samples with the same class label. For each sam-
ple pair comparison, the overlap ratio is calculated as the
intersection-over-union of the tight rectangles encapsulating
the sample routes. The average overlap ratio of the user-
specific dataset is calculated as 35%.
5.1.2 City-Level Dataset
For the city-level dataset, we mined publicly available
training route segments in a popular fitness tracking ap-
plication using its EXPLORESEGMENTS() functionality. We
note that our experiments do not put any users at risk and
4
摘要:

1LearningLocationfromSharedElevationProlesinFitnessApps:APrivacyPerspectiveUlkuMeteriz-Yildiran,NecipFazilYildiran,JoongheonKim,andDavidMohaisenAbstract—Theextensiveuseofsmartphonesandwearabledeviceshasfacilitatedmanyusefulapplications.Forexample,withGlobalPositioningSystem(GPS)-equippedsmartandwea...

展开>> 收起<<
1 Learning Location from Shared Elevation Profiles in Fitness Apps A Privacy Perspective.pdf

共16页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:16 页 大小:7.06MB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 16
客服
关注