Privacy-Preserving Link Prediction Didem Demirag1 Mina Namazi2 Erman Ayday3 and Jeremy Clark1 1Concordia University

2025-05-02 0 0 1.66MB 18 页 10玖币
侵权投诉
Privacy-Preserving Link Prediction
Didem Demirag1, Mina Namazi2, Erman Ayday3, and Jeremy Clark1
1Concordia University
Montreal, QC, Canada
d demira@encs.concordia.ca, j.clark@concordia.ca
2Open University of Catalonia
Barcelona, Spain
mnamaziesfanjani@uoc.edu
3Case Western Reserve University,
Cleveland, OH, USA
exa208@case.edu
Abstract. Consider two data holders, ABC and XYZ, with graph data
(e.g., social networks, e-commerce, telecommunication, and bio-informatics).
ABC can see that node A is linked to node B, and XYZ can see node B
is linked to node C. Node B is the common neighbour of A and C but
neither network can discover this fact on their own. In this paper, we pro-
vide a two party computation that ABC and XYZ can run to discover
the common neighbours in the union of their graph data, however neither
party has to reveal their plaintext graph to the other. Based on private
set intersection, we implement our solution, provide measurements, and
quantify partial leaks of privacy. We also propose a heavyweight solution
that leaks zero information based on additively homomorphic encryption.
Keywords: link prediction ·common neighbour ·privacy preserving graph min-
ing ·private set intersection ·social network graphs
1 Introduction
Link prediction discovers important linkages between nodes in a graph. Based
on the analysis of these linkages, it helps the data holder to forecast what fu-
ture connections might emerge between the nodes, and to predict if there are
missing links in the data. Some common applications include: (i) in social net-
works, to recommend links between users; (ii) in e-commerce or personalized
advertisement, to recommend products to users; (iii) in telecommunication, to
build optimal phone usage plans between the users; and (iv) in bioinformatics,
to predict associations between diseases and attributes of patients or to discover
associations between genes (or proteins) and different functions.
Link prediction is typically done on the local graph of a data holder or ser-
vice provider. For instance, a social network, analyzing the common neighbours
between its users decides whether to recommend links between the users. How-
ever, link prediction will be more accurate and correct by considering more
arXiv:2210.01297v1 [cs.CR] 4 Oct 2022
2
information about the graph nodes. This can be achieved by merging two or
more graph databases that include similar information, leading to “distributed
link prediction” between two or more graph databases. For instance, two social
networks may utilize the connections in their combined graph to provide more
accurate link prediction for their users. Furthermore, distributed link prediction
will enable different uses of link prediction, such as building connections between
users and products based on the tastes of other similar users (e.g., friends of a
user). Such an application may be possible between graph databases of a social
network and an e-commerce service provider. In some cases, collaboration is mu-
tually beneficial to both parties. In others, one party can pay the other party to
participate—one party gets better data and the other gets to monetize its data.
Distributed link prediction, although is a promising approach for more ac-
curate and richer link prediction applications, also results in privacy concerns
since it implies combining two or more different graph databases. In this sce-
nario, threats against privacy can be categorized into three groups [22]: identity
disclosure, link disclosure, and attribute disclosure. All these threats should be
considered in a distributed link prediction algorithm, since it involves privacy-
sensitive databases from multiple parties.
One promising solution for this privacy concern is cryptography to achieve
distributed link prediction in a privacy-preserving way. Thus, in this paper, our
goal is to develop a cryptographic solution for privacy-preserving distributed
link prediction between multiple graph databases. We propose a solution based
on private set intersection (PSI) to tackle this problem by considering both the
efficiency of the solution and its privacy. Via evaluations, we show that this
solution provides good efficiency. For example, it can run in under 1 second
(ignoring communication latency) for graphs based on a Flickr dataset with
40K nodes. The proposed protocol does not provide perfect privacy (it leaks
some intermediary values) and so we quantify this leakage to better understand
if it is consequential enough to move to a fully private solution (which we also
sketch).
1.1 Use Cases
Privacy-preserving distributed link prediction can be utilized in different set-
tings. Here, we explain some of the possible applications.
Social networks. In this setting, there are two social networks, Graph 1 and
Graph 2. Graph 1 aims to understand whether there will be a link formed be-
tween nodes xand yby also utilizing the similarity of xand yin Graph 2, as
distributed link prediction provides better accuracy compared to performing this
operation locally.
E-commerce. Another application can be between a social network and an e-
commerce service. In the previous use case, the link between two users is the main
concern of the protocol. Unlike the previous case, here the links between a user
and products are determined at the end of the protocol. In the e-commerce graph,
3
there are links between the users and the products that they have bought. The
aim here is to provide better advertising to users. The network will recommend
product nto the user xif this user’s friends also purchased the same product.
For this purpose, the e-commerce network has to know the friends of user xin
the social network. Unlike the previous use case, here link prediction cannot be
done locally on the e-commerce graph, as the knowledge of the social network’s
structure should be utilized in order to do the recommendation.
Telecommunication. In this use case, an advertising company wants to propagate
an advertisement in the telecom network. If user xis a target for that advertise-
ment, the company would like to know which nodes are likely to form links with
user x, in order to decide which nodes it will send the advertisement. The aim is
to maximize the number of nodes that learn about the advertisement. Another
application involves a social network graph and a phone operator graph. The
phone operator wants to find out friends of user xin the social network, so that
it offers the special services (e.g., discounts) to the users that are similar to user
x.
Bioinformatics. Here, the first graph consists of patients and diseases and the
aim is to predict the link between the patient iand the disease j. In the second
graph, there are similar patients to patient i. Using these similar patients, and
their connection to disease j, the link between patient iand disease jcan be
inferred.
1.2 Related Work
There is a rich literature on link prediction algorithms (without consideration
of privacy) in a variety of network structures: multiple partially aligned social
networks [25]; coupled networks [11]; and heterogenous networks [21]. Other
works consider node similarity when two nodes in the graph do not share com-
mon neighbours [17]; unbalanced, sparse data across multiple heterogeneous net-
works [10]; missing link prediction using local random walk [19]; and the inter-
section of link prediction and transfer learning [24].
Other research considers the use of cryptography for collaborating on graph-
based data between two parties with privacy protections. However such works
consider problems other than link prediction: merging and query performed on
knowledge graphs owned by different parties [6]; whether one graph is a subgraph
of the other graph [23]; single-source shortest distance and all-pairs shortest dis-
tance both in sparse and dense graphs [1]; all pairs shortest distance and single
source shortest distance [4]; and transitive closure [14]; anonymous invitation-
based system [2] and its extension to malicious adversarial model [3]. While it
may be possible to transform some of these into finding common neighbours with
a black-box approach, we provide a purpose-built protocol for common neigh-
bour. Later in Section 2.3, we review potential cryptographic building blocks in
the literature.
4
Similarity metric Definition
Common neighbours |Γ(x)Γ(y)|
Jaccard’s coefficient |Γ(x)Γ(y)|
|Γ(x)Γ(y)|
Adamic/Adar Pz|Γ(x)Γ(y)|
1
log(|Γ(z)|)
Katzβ
P
l=1 βl.|pathhli
x,y |
where pathhli
x,y := {paths of length exactly lfrom xto y}
weighted: pathhli
x,y := weight of the edge between xand y
unweighted: pathhli
x,y := 1 iff xand yare 1-hop neighbours
The weight is determined by the constant value β.
Table 1: Different similarity metrics in a graph.
2 Proposed Solution
2.1 Building Blocks from Data Mining
Link prediction. Given a snapshot of a graph at time t, link prediction algorithms
aim to accurately predict the edges that will be added to the graph during the
interval from time tto a given future time t0[18].
Similarity metrics. In Table 1, different metrics for calculating proximity are
given. Common neighbours, Jaccard coefficient and Adamic-Adar index are re-
garded as the node-dependent indices and they only require the information
about node degree and the nearest neighbourhood, whereas the Katz index is
defined as path-dependent index that consider the global knowledge of the net-
work topology [19]. While there are also other metrics that are used (some of
which are shown in Table 1), we choose common neighbours, as it is one of the
widely-used methods for link prediction.
Common Neighbours. Common neighbours is used to predict the existence of a
link between two nodes based on the number of their common neighbours. If two
nodes share common neighbours, it is more likely that they will be connected in
the future. In a local graph, the result of the metric can directly be computed
by determining the intersection of the neighbour sets of two nodes. Based on
the cardinality of the set, the network decides whether to suggest a link between
these two nodes. The cardinality is defined as
common neighbours = |Γ(x)Γ(y)|
, where Γ(x) and Γ(y) are the set of neighbours of nodes xand yrespectively.
Adapting Common Neighbours for Two Parties. For the use case in this paper,
we look at the problem of computing common neighbours metric across two
different graphs owned by different entities. For example, this could be two
separate social networks, or a social network with an e-commerce network. Graph
1 wants to perform link prediction between the nodes xand yby using the
摘要:

Privacy-PreservingLinkPredictionDidemDemirag1,MinaNamazi2,ErmanAyday3,andJeremyClark11ConcordiaUniversityMontreal,QC,Canadaddemira@encs.concordia.ca,j.clark@concordia.ca2OpenUniversityofCataloniaBarcelona,Spainmnamaziesfanjani@uoc.edu3CaseWesternReserveUniversity,Cleveland,OH,USAexa208@case.eduAbstr...

展开>> 收起<<
Privacy-Preserving Link Prediction Didem Demirag1 Mina Namazi2 Erman Ayday3 and Jeremy Clark1 1Concordia University.pdf

共18页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:18 页 大小:1.66MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 18
客服
关注