by college graduates. This can be particularly problematic in
the DSE market as recent statistics showed that a considerable
percentage of DSE users either do not possess a college-level
education or still college students [37], [5]. These observations
emphasize the need for a supporting mechanism to help
DSE users better assess the privacy risks of disclosing their
personal information to DSE apps. In fact, such a mechanism
can also help policymakers and regulators to better evaluate
the privacy practices of DSE apps, and consequently, devise
legislation to protect workers’ and consumers’ rights in one of
the fastest-growing, most diverse, yet under-regulated software
ecosystems in the world [38], [39].
Motivated by these observations, in this paper, we propose
a novel approach for annotating privacy policies in the DSE
market. Our approach automatically classifies data claims in
these policies and maps them to the quality features of the
app (e.g., security, safety, customizability, etc.). The main
assumption is that these generic categories of abstract system
features can be more easily comprehensible for the average
user. In particular, our research questions are:
∙RQ1:Can the privacy policies of DSE apps be auto-
matically annotated? Under this research question, we
investigate the effectiveness of several automated classi-
fication techniques in annotating data collection claims in
the privacy policies of DSE apps.
∙RQ2:Are annotated privacy policies more comprehen-
sible to the average DSE user? Under this research
question, we explore whether our annotated policies can
actually help average DSE users to better understand the
privacy practices of their DSE apps.
III. DATA AND MANUAL ANNOTATION
Our approach can be divided into four main steps, policy
collection, manual annotation, automated classification, and
policy presentation. A summary of the approach is presented
in Fig. 2. In what follows, we describe our data collection and
manual annotation steps.
A. Policy Collection
Recent statistics estimate that there are hundreds of active
DSE platforms listed on popular mobile app marketplaces [22].
In our analysis, we consider apps that operate in large geo-
graphical areas and have massive user bases. Privacy concerns
are more likely to manifest over these apps rather than
smaller ones which often have less heterogeneous user bases.
Specifically, for a DSE app to be included in our analysis, it
has to meet the following criteria:
1) The app must facilitate some sort of a P2P connection
and include the sharing of some sort of a resource, such
as a tangible asset (e.g., an apartment or a car) or a soft
skill (e.g., plumbing or hair styling).
2) The app must be available on Google Play or the Apple
App Store so that we can extract its meta-data.
3) The app must be located and/or have a substantial pres-
ence in the United States. By focusing on the U.S. market,
we ensure that our apps’ privacy policies are available in
English and that these apps offer services that are familiar
to the average U.S. user.
With these criteria in place, we selected the five most popu-
lar apps from five popular application domains of DSE [22]. In
general, five categories of DSE apps can be identified: ride-
sharing (e.g., Uber or Lyft), lodging (e.g., Airbnb), delivery
(e.g., DoorDash or UberEats), asset-sharing (e.g., GetMy-
Boat), and freelancing (e.g., TaskRabbit) [22], [40]. The top
five apps in each application domain are then identified based
on their installation and rating statistics as of January, 2021.
Table I shows the selected apps along with their popularity,
measured as the number of ratings and the average rating on
the Apple App Store as well as the average number of installs
from Google Play. We also extracted each app’s privacy policy,
which is typically posted on the app’s official website.
B. Manual Annotation
We start our analysis by qualitatively analyzing the content
of privacy policies of the apps in our dataset. The objective is
to identify the data collection claims in these policies along
with their justifications (i.e., establish our ground truth). We
define a justification as the rationale provided by the app for
collecting users’ sensitive information. In our approach, we
map such rationale into a set of high-level system quality
features. Apps supposedly collect data to enhance the quality
attributes of the app, and thus, its users’ experience. These
quality attributes are often described as the non-functional
requirements of the system (NFRs). NFRs can be thought of as
abstract behaviors of the system that can be enforced through
bundles of the system’s functional features. For instance, the
security NFR refers to the behavior that is enforced by the
functional features that are used to implement security in the
system, such as user authentication and data encryption.
Around 250 different kinds of software quality attributes
are defined in the literature [41]. These NFRs extend over a
broad range of categories and sub-categories. To simplify our
manual analysis, we limit our annotation to the most popular
types of NFRs that commonly appear in literature: Security,
Performance, Accessibility, Accuracy, Usability, Safety, Legal,
and Maintainability [42], [43], [44], [45]. These NFRs are
defined in Table II.
To annotate the privacy claims in our set of policies, we
follow a grounded theory approach [46]. In particular, three
judges manually extracted any policy statements related to
collecting, using, sharing, or storing personal information and
mapped these statements to one or more of the NFR categories
defined earlier. If no suitable category was found, the judges
were free to come up with new categories. An example of this
process is shown in Fig. 3. A statement can be labeled under
multiple categories if it raises more than one functionality-
related issue. This step was necessary to maintain the accuracy
of our annotations as NFRs are inherently vague—a single
statement can express multiple issues at the same time [44],
[41], [47]. After each round of annotation, the three judges
met to discuss any discrepancies and add/merge labels. This