general practitioner to get a checkup for breast cancer when
a female reaches the age of 30 or the checkup for prostate
cancer when a male reaches the age of 45 [20]. Even if the
causes for the increased occurrence happen to be unknown,
taking precautions is recommended because pure statistics
do, in fact, save lives.
All data regarding the treatment, as well as the diagnosis,
are protected by medical confidentiality and data protection
ordinance which the patient has to agree upon.
2) Socioeconomic factors: Socioeconomic factors such
as income, spoken languages, place of residence, job and
family status are rarely relevant for ordinary treatment.
Nevertheless they can be of utmost importance in terms of
research, especially long term studies. The development and
emergence of many diseases aren’t dependent on one, but a
large amount of factors. For instance, it is known that there
are many factors that increase the risk of getting cancer at a
younger age (although, there still is a lot of research to be
done). Therefore, this kind of data mining is necessary to get
closer to gain insight on these matters.
3) Data regarding lifestyle: Balanced diet, doing sport on
a regular basis, etc., influences health in a positive way; to
the contrary, a lack of exercise and eating a lot of unhealthy
food increases the risk of diseases like type 2 diabetes [21].
This data is also important for long term studies, as well as
almost everything else, as it affects our everyday life. It is
hard to measure and is almost solely observed by the patient
rather than the doctor. This leads to inaccuracy which is why
a lot of data has to be gathered before it can be deemed useful.
B. What are the consequences of insufficient protection of
medical data?
1) Patient’s point of view: As can be seen in the paragraphs
above, medical data consists of a variety of sensitive data
concerning our private life. Many of these factors (e.g.,
income) indicate or influence a certain social status.
Revealing sensitive data can lead to social stigmatization, and
therefore mental stress. A doctor or researcher will not judge
you for having mental illnesses or consuming drugs, but your
social environment or your employer might do so. It does not
have to be something with a lot of prejudices attached to it.
The concern of you, as an employee, who is on sick leave
more often due to being a migraine patient might be enough
for you to not get the job.
Other mentionable parties which are potentially interested
in your medical data are insurance companies. They want to
know the risk of having to pay for you as their customer or
having a reason to make the insurance more expensive for
you. Additionally, it has to be considered that information
regarding one’s medical data can also affect other people, as
some diseases are genetic.
2) Company’s point of view - The General Data Protection
Regulation: Since 2018 the General Data Protection
Regulation based on a decree of the European Union is in
force in Germany [4]. Its main goal is to ensure informational
self-determination as well as other fundamental freedoms. In
Germany, the person to whom the data is attributive is the
owner of the data. Therefore, it is prohibited to assimilate
personal data unless stated otherwise. Data protection aims for
data integrity, data confidentiality and (for this topic especially
important) data resilience which means the resilience towards,
e.g., hackers. [4]
Violations of the General Data Protection Regulation can
lead to fines up to 20 million Euro or 4% of the worldwide
sales of the company (depending on which amount is higher)
[4].
III. DIFFERENTIAL PRIVACY
Differential privacy pursues the goal to obtain as accurate
responses as possible (e.g., from surveys or user behaviour)
while making it as difficult as possible to identify a person
by his or her given answers. The parameter is used to
”measure” the extent of the given privacy. A small represents
a high privacy guarantee, as a consequence of the definition
of differential privacy:
”A randomized function κgives -differential privacy if for
all data sets D1 and D2 differing on at most one element, and
all S⊆Range(κ):
Pr[κ(D1)∈S]≤eε×Pr[κ(D2)∈S].(1)
”[5]. Range(κ) is the set of every possible outcome of function
κwhich could, for example, be the set of all whole numbers.
The left side of the inequation describes the probability (Pr)
that the full database (D1), randomized by the function κ,
is included in the subset (S). The right side does the same
except one entry has been removed from the database and the
term is multiplied (×) with e. Note that for = 0, the term
is multiplied with one, giving the highest possible privacy.
This means that the privacy for each user is about the same
and it does not matter whether a person is included in the
database or not. When using differential privacy methods, the
real responses aren’t necessarily sent to the server. Instead,
with a certain probability, the given answer will be a random
one. This protects the users data, even if the users response
is intercepted multiple times, because the real response is
harder to reconstruct. Differential privacy is not focused on
the method, but on the result: how well is the privacy of the
user protected?
IV. OVERVIEW OF PRIVACY PRESERVING METHODS
Data mining in the health care sector can improve, e.g.,
detection of diseases, but requires to ensure the patients