Privacy-Preserving Link Prediction Didem Demirag1 Mina Namazi2 Erman Ayday3 and Jeremy Clark1 1Concordia University

2025-05-02 0 0 1.66MB 18 页 10玖币

侵权投诉

Privacy-Preserving Link Prediction

Didem Demirag1, Mina Namazi2, Erman Ayday3, and Jeremy Clark1

1Concordia University

Montreal, QC, Canada

d demira@encs.concordia.ca, j.clark@concordia.ca

2Open University of Catalonia

Barcelona, Spain

mnamaziesfanjani@uoc.edu

3Case Western Reserve University,

Cleveland, OH, USA

exa208@case.edu

Abstract. Consider two data holders, ABC and XYZ, with graph data

(e.g., social networks, e-commerce, telecommunication, and bio-informatics).

ABC can see that node A is linked to node B, and XYZ can see node B

is linked to node C. Node B is the common neighbour of A and C but

neither network can discover this fact on their own. In this paper, we pro-

vide a two party computation that ABC and XYZ can run to discover

the common neighbours in the union of their graph data, however neither

party has to reveal their plaintext graph to the other. Based on private

set intersection, we implement our solution, provide measurements, and

quantify partial leaks of privacy. We also propose a heavyweight solution

that leaks zero information based on additively homomorphic encryption.

Keywords: link prediction ·common neighbour ·privacy preserving graph min-

ing ·private set intersection ·social network graphs

1 Introduction

Link prediction discovers important linkages between nodes in a graph. Based

on the analysis of these linkages, it helps the data holder to forecast what fu-

ture connections might emerge between the nodes, and to predict if there are

missing links in the data. Some common applications include: (i) in social net-

works, to recommend links between users; (ii) in e-commerce or personalized

advertisement, to recommend products to users; (iii) in telecommunication, to

build optimal phone usage plans between the users; and (iv) in bioinformatics,

to predict associations between diseases and attributes of patients or to discover

associations between genes (or proteins) and diﬀerent functions.

Link prediction is typically done on the local graph of a data holder or ser-

vice provider. For instance, a social network, analyzing the common neighbours

between its users decides whether to recommend links between the users. How-

ever, link prediction will be more accurate and correct by considering more

arXiv:2210.01297v1 [cs.CR] 4 Oct 2022

information about the graph nodes. This can be achieved by merging two or

more graph databases that include similar information, leading to “distributed

link prediction” between two or more graph databases. For instance, two social

networks may utilize the connections in their combined graph to provide more

accurate link prediction for their users. Furthermore, distributed link prediction

will enable diﬀerent uses of link prediction, such as building connections between

users and products based on the tastes of other similar users (e.g., friends of a

user). Such an application may be possible between graph databases of a social

network and an e-commerce service provider. In some cases, collaboration is mu-

tually beneﬁcial to both parties. In others, one party can pay the other party to

participate—one party gets better data and the other gets to monetize its data.

Distributed link prediction, although is a promising approach for more ac-

curate and richer link prediction applications, also results in privacy concerns

since it implies combining two or more diﬀerent graph databases. In this sce-

nario, threats against privacy can be categorized into three groups [22]: identity

disclosure, link disclosure, and attribute disclosure. All these threats should be

considered in a distributed link prediction algorithm, since it involves privacy-

sensitive databases from multiple parties.

One promising solution for this privacy concern is cryptography to achieve

distributed link prediction in a privacy-preserving way. Thus, in this paper, our

goal is to develop a cryptographic solution for privacy-preserving distributed

link prediction between multiple graph databases. We propose a solution based

on private set intersection (PSI) to tackle this problem by considering both the

eﬃciency of the solution and its privacy. Via evaluations, we show that this

solution provides good eﬃciency. For example, it can run in under 1 second

(ignoring communication latency) for graphs based on a Flickr dataset with

40K nodes. The proposed protocol does not provide perfect privacy (it leaks

some intermediary values) and so we quantify this leakage to better understand

if it is consequential enough to move to a fully private solution (which we also

sketch).

1.1 Use Cases

Privacy-preserving distributed link prediction can be utilized in diﬀerent set-

tings. Here, we explain some of the possible applications.

Social networks. In this setting, there are two social networks, Graph 1 and

Graph 2. Graph 1 aims to understand whether there will be a link formed be-

tween nodes xand yby also utilizing the similarity of xand yin Graph 2, as

distributed link prediction provides better accuracy compared to performing this

operation locally.

E-commerce. Another application can be between a social network and an e-

commerce service. In the previous use case, the link between two users is the main

concern of the protocol. Unlike the previous case, here the links between a user

and products are determined at the end of the protocol. In the e-commerce graph,

there are links between the users and the products that they have bought. The

aim here is to provide better advertising to users. The network will recommend

product nto the user xif this user’s friends also purchased the same product.

For this purpose, the e-commerce network has to know the friends of user xin

the social network. Unlike the previous use case, here link prediction cannot be

done locally on the e-commerce graph, as the knowledge of the social network’s

structure should be utilized in order to do the recommendation.

Telecommunication. In this use case, an advertising company wants to propagate

an advertisement in the telecom network. If user xis a target for that advertise-

ment, the company would like to know which nodes are likely to form links with

user x, in order to decide which nodes it will send the advertisement. The aim is

to maximize the number of nodes that learn about the advertisement. Another

application involves a social network graph and a phone operator graph. The

phone operator wants to ﬁnd out friends of user xin the social network, so that

it oﬀers the special services (e.g., discounts) to the users that are similar to user

Bioinformatics. Here, the ﬁrst graph consists of patients and diseases and the

aim is to predict the link between the patient iand the disease j. In the second

graph, there are similar patients to patient i. Using these similar patients, and

their connection to disease j, the link between patient iand disease jcan be

inferred.

1.2 Related Work

There is a rich literature on link prediction algorithms (without consideration

of privacy) in a variety of network structures: multiple partially aligned social

networks [25]; coupled networks [11]; and heterogenous networks [21]. Other

works consider node similarity when two nodes in the graph do not share com-

mon neighbours [17]; unbalanced, sparse data across multiple heterogeneous net-

works [10]; missing link prediction using local random walk [19]; and the inter-

section of link prediction and transfer learning [24].

Other research considers the use of cryptography for collaborating on graph-

based data between two parties with privacy protections. However such works

consider problems other than link prediction: merging and query performed on

knowledge graphs owned by diﬀerent parties [6]; whether one graph is a subgraph

of the other graph [23]; single-source shortest distance and all-pairs shortest dis-

tance both in sparse and dense graphs [1]; all pairs shortest distance and single

source shortest distance [4]; and transitive closure [14]; anonymous invitation-

based system [2] and its extension to malicious adversarial model [3]. While it

may be possible to transform some of these into ﬁnding common neighbours with

a black-box approach, we provide a purpose-built protocol for common neigh-

bour. Later in Section 2.3, we review potential cryptographic building blocks in

the literature.

Similarity metric Deﬁnition

Common neighbours |Γ(x)∩Γ(y)|

Jaccard’s coeﬃcient |Γ(x)∩Γ(y)|

|Γ(x)∪Γ(y)|

Adamic/Adar Pz|Γ(x)∩Γ(y)|

log(|Γ(z)|)

Katzβ

P∞

l=1 βl.|pathhli

x,y |

where pathhli

x,y := {paths of length exactly lfrom xto y}

weighted: pathhli

x,y := weight of the edge between xand y

unweighted: pathhli

x,y := 1 iﬀ xand yare 1-hop neighbours

The weight is determined by the constant value β.

Table 1: Diﬀerent similarity metrics in a graph.

2 Proposed Solution

2.1 Building Blocks from Data Mining

Link prediction. Given a snapshot of a graph at time t, link prediction algorithms

aim to accurately predict the edges that will be added to the graph during the

interval from time tto a given future time t0[18].

Similarity metrics. In Table 1, diﬀerent metrics for calculating proximity are

given. Common neighbours, Jaccard coeﬃcient and Adamic-Adar index are re-

garded as the node-dependent indices and they only require the information

about node degree and the nearest neighbourhood, whereas the Katz index is

deﬁned as path-dependent index that consider the global knowledge of the net-

work topology [19]. While there are also other metrics that are used (some of

which are shown in Table 1), we choose common neighbours, as it is one of the

widely-used methods for link prediction.

Common Neighbours. Common neighbours is used to predict the existence of a

link between two nodes based on the number of their common neighbours. If two

nodes share common neighbours, it is more likely that they will be connected in

the future. In a local graph, the result of the metric can directly be computed

by determining the intersection of the neighbour sets of two nodes. Based on

the cardinality of the set, the network decides whether to suggest a link between

these two nodes. The cardinality is deﬁned as

common neighbours = |Γ(x)∩Γ(y)|

, where Γ(x) and Γ(y) are the set of neighbours of nodes xand yrespectively.

Adapting Common Neighbours for Two Parties. For the use case in this paper,

we look at the problem of computing common neighbours metric across two

diﬀerent graphs owned by diﬀerent entities. For example, this could be two

separate social networks, or a social network with an e-commerce network. Graph

1 wants to perform link prediction between the nodes xand yby using the

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Privacy-PreservingLinkPredictionDidemDemirag1,MinaNamazi2,ErmanAyday3,andJeremyClark11ConcordiaUniversityMontreal,QC,Canadaddemira@encs.concordia.ca,j.clark@concordia.ca2OpenUniversityofCataloniaBarcelona,Spainmnamaziesfanjani@uoc.edu3CaseWesternReserveUniversity,Cleveland,OH,USAexa208@case.eduAbstr...

展开>> 收起<<

Privacy-Preserving Link Prediction Didem Demirag1 Mina Namazi2 Erman Ayday3 and Jeremy Clark1 1Concordia University.pdf

共18页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Privacy-Preserving Link Prediction Didem Demirag1 Mina Namazi2 Erman Ayday3 and Jeremy Clark1 1Concordia University

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: