Methods To Ensure Privacy Regarding Medical Data Including an examination of the differential privacy algorithm RAPPOR and its implementation in CrypTool 2

2025-05-02 0 0 514.75KB 10 页 10玖币
侵权投诉
Methods To Ensure Privacy Regarding Medical Data
Including an examination of the differential privacy algorithm
RAPPOR and its implementation in “CrypTool 2”
Christina W¨
olk
University of Siegen
Student (Bachelor’s Degree in Computer Science)
christina.woelk@student.uni-siegen.de
Abstract—This document examines several applicable methods
to ensure privacy of data gathered in the health care sector. To
ensure a common understanding of the topic, the introduction
explains the need for anonymization methods based on an
example. Next, reasons for data collection are introduced in
connection to the purpose to protect mentioned data, as well
as currently applicable privacy laws to enforce this privacy.
The question What kind of privacy we are talking about and
what conditions have to be fulfilled is dealt within the subsequent
chapter “Differential Privacy”. Thus being established, common
anonymization methods are explained and reviewed for their use
in the healthcare sector.
The RAPPOR algorithm and its differential privacy is dealt
with in more detail before coming to a conclusion.
I. INTRODUCTION
Privacy is valued by the majority of German citizens. With
the increasing amount of possibilities to use technology and
collect data, safety measures have to be increased as well.
So-called ”anonymity” (in this case meaning censoring one’s
name in a dataset) alone is not sufficient anymore to ensure
one’s privacy. We often can be identified by a very limited
amount of data (e.g., date of birth, hometown and gender).
Since personal data is often collected in several matters and
from several sources, records (e.g., medical records) can be
allocated to a specific person and exposed. An example from
the 1990s:
The “Massachusetts Group Insurance Commission”
released anonymized data on state employees revealing
hospital visits to establish a bigger database for researchers.
During the anonymization, name, address and social security
number were removed. Latanya Sweeney (at that time a
graduate student in computer science) was able to identify
the then Governor of Massachusetts just by buying the voter
rolls (for approximately 20 US-Dollars[2]) from the city of
Cambridge (where she knew the governor resided) which
included name, address, ZIP code, birth date and sex of every
voter. Now knowing his ZIP code, she was able to identify
his medical records; and therefore, knowing every diagnosis
and every prescription. [2]
This example depicts the limitations of anonymization. In
this case, the student left it at sending the medical records to
the governor’s office [2]. However, so-called ”linkage attacks”
(where sensitive information can be allocated to the person
it belongs to) still are a risk that should not be ignored.
Therefore, this paper examines methods to ensure privacy
regarding medical data, especially focusing on the RAPPOR
algorithm provided by Google, as described in the abstract.
II. DATA PROTECTION IN THE HEALTHCARE SECTOR
Information collected for research purposes is not new.
However, when dealing with medical researches, certain prob-
lems occur to a greater length then elsewhere. The most
frequent one is the problem of the size of the study. When
researching how to improve health, the need for probands
suffering from certain medical conditions is almost inevitable.
Even something small as a new medicine that alleviates
headaches can only be tested on persons who have a headache
for obvious reasons. For rare diseases, this can become quite
the big problem since the results would not be significant.
So the group of potential probands is often very limited and
can not be enlarged (also for obvious, ethical reasons). This
complicates the anonymization because the larger the group,
the harder it is to identify the individual.
A. What kind of data is gathered? (And for what reasons?)
Data is not always the same. Especially in the health care
sector, the patients data is often very sensitive. Let us review
the data types we often deal with in the following sections.
1) Data concerning age and sex: Effects and side effects
of medical treatments can differ depending on the sex of the
patient. This correlates with height and weight influencing the
appropriate dose rate, as well as the difference of hormones
of the patient interacting with specific medications. [19]
Therefore this information is needed to ensure a proper
medication. But not only is this relevant for treatment, it
is also essential for the prevention of statistical probable
illnesses or diseases. A good example is the invitation of the
arXiv:2210.09963v1 [cs.CR] 18 Oct 2022
general practitioner to get a checkup for breast cancer when
a female reaches the age of 30 or the checkup for prostate
cancer when a male reaches the age of 45 [20]. Even if the
causes for the increased occurrence happen to be unknown,
taking precautions is recommended because pure statistics
do, in fact, save lives.
All data regarding the treatment, as well as the diagnosis,
are protected by medical confidentiality and data protection
ordinance which the patient has to agree upon.
2) Socioeconomic factors: Socioeconomic factors such
as income, spoken languages, place of residence, job and
family status are rarely relevant for ordinary treatment.
Nevertheless they can be of utmost importance in terms of
research, especially long term studies. The development and
emergence of many diseases aren’t dependent on one, but a
large amount of factors. For instance, it is known that there
are many factors that increase the risk of getting cancer at a
younger age (although, there still is a lot of research to be
done). Therefore, this kind of data mining is necessary to get
closer to gain insight on these matters.
3) Data regarding lifestyle: Balanced diet, doing sport on
a regular basis, etc., influences health in a positive way; to
the contrary, a lack of exercise and eating a lot of unhealthy
food increases the risk of diseases like type 2 diabetes [21].
This data is also important for long term studies, as well as
almost everything else, as it affects our everyday life. It is
hard to measure and is almost solely observed by the patient
rather than the doctor. This leads to inaccuracy which is why
a lot of data has to be gathered before it can be deemed useful.
B. What are the consequences of insufficient protection of
medical data?
1) Patient’s point of view: As can be seen in the paragraphs
above, medical data consists of a variety of sensitive data
concerning our private life. Many of these factors (e.g.,
income) indicate or influence a certain social status.
Revealing sensitive data can lead to social stigmatization, and
therefore mental stress. A doctor or researcher will not judge
you for having mental illnesses or consuming drugs, but your
social environment or your employer might do so. It does not
have to be something with a lot of prejudices attached to it.
The concern of you, as an employee, who is on sick leave
more often due to being a migraine patient might be enough
for you to not get the job.
Other mentionable parties which are potentially interested
in your medical data are insurance companies. They want to
know the risk of having to pay for you as their customer or
having a reason to make the insurance more expensive for
you. Additionally, it has to be considered that information
regarding one’s medical data can also affect other people, as
some diseases are genetic.
2) Company’s point of view - The General Data Protection
Regulation: Since 2018 the General Data Protection
Regulation based on a decree of the European Union is in
force in Germany [4]. Its main goal is to ensure informational
self-determination as well as other fundamental freedoms. In
Germany, the person to whom the data is attributive is the
owner of the data. Therefore, it is prohibited to assimilate
personal data unless stated otherwise. Data protection aims for
data integrity, data confidentiality and (for this topic especially
important) data resilience which means the resilience towards,
e.g., hackers. [4]
Violations of the General Data Protection Regulation can
lead to fines up to 20 million Euro or 4% of the worldwide
sales of the company (depending on which amount is higher)
[4].
III. DIFFERENTIAL PRIVACY
Differential privacy pursues the goal to obtain as accurate
responses as possible (e.g., from surveys or user behaviour)
while making it as difficult as possible to identify a person
by his or her given answers. The parameter is used to
”measure” the extent of the given privacy. A small represents
a high privacy guarantee, as a consequence of the definition
of differential privacy:
”A randomized function κgives -differential privacy if for
all data sets D1 and D2 differing on at most one element, and
all SRange(κ):
Pr[κ(D1)S]eε×Pr[κ(D2)S].(1)
”[5]. Range(κ) is the set of every possible outcome of function
κwhich could, for example, be the set of all whole numbers.
The left side of the inequation describes the probability (Pr)
that the full database (D1), randomized by the function κ,
is included in the subset (S). The right side does the same
except one entry has been removed from the database and the
term is multiplied (×) with e. Note that for = 0, the term
is multiplied with one, giving the highest possible privacy.
This means that the privacy for each user is about the same
and it does not matter whether a person is included in the
database or not. When using differential privacy methods, the
real responses aren’t necessarily sent to the server. Instead,
with a certain probability, the given answer will be a random
one. This protects the users data, even if the users response
is intercepted multiple times, because the real response is
harder to reconstruct. Differential privacy is not focused on
the method, but on the result: how well is the privacy of the
user protected?
IV. OVERVIEW OF PRIVACY PRESERVING METHODS
Data mining in the health care sector can improve, e.g.,
detection of diseases, but requires to ensure the patients
摘要:

MethodsToEnsurePrivacyRegardingMedicalDataIncludinganexaminationofthedifferentialprivacyalgorithmRAPPORanditsimplementationin“CrypTool2”ChristinaW¨olkUniversityofSiegenStudent(Bachelor'sDegreeinComputerScience)christina.woelk@student.uni-siegen.deAbstract—Thisdocumentexaminesseveralapplicablemethods...

展开>> 收起<<
Methods To Ensure Privacy Regarding Medical Data Including an examination of the differential privacy algorithm RAPPOR and its implementation in CrypTool 2.pdf

共10页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:10 页 大小:514.75KB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 10
客服
关注