Path Association Rule Mining

2025-05-02 0 0 9.1MB 11 页 10玖币

侵权投诉

Yuya Sasaki

sasaki@ist.osaka-u.ac.jp

Osaka University

ABSTRACT

Graph association rule mining is a data mining technique used for

discovering regularities in graph data. In this study, we propose

a novel concept, path association rule mining, to discover the cor-

relations of path patterns that frequently appear in a given graph.

Reachability path patterns (i.e., existence of paths from a vertex

to another vertex) are applied in our concept to discover diverse

regularities. We show that the problem is NP-hard, and we develop

an ecient algorithm in which the anti-monotonic property is used

on path patterns. Subsequently, we develop approximation and

parallelization techniques to eciently and scalably discover rules.

We use real-life graphs to experimentally verify the eectiveness

of path association rules and the eciency of our algorithm.

1 INTRODUCTION

Association rule mining is data mining technique used for dis-

covering regularities between itemsets in large databases [

Association rules are represented as

𝑋⇒𝑌

, where

𝑋

and

𝑌

are

the disjoint and called antecedent and consequent, respectively.

Association rules have attracted research attention because of their

usefulness in many applications such as market analysis [

], web

mining [29], and bioinformatics [32].

Regularities between entities in large-scale graphs are a critical

topic of research. Graph association rule mining on a single large

graph, which is an extended form of general association rule mining,

exhibits considerable potential for studying regularities [

]. Graph association rules are represented as

𝐺𝑋⇒𝐺𝑌

where

𝐺𝑋

and

𝐺𝑌

are graph patterns. Since graphs are widely

used in many applications, association rules on graphs have a large

opportunity to discovery valuable insights and knowledge.

Applications on graph association rule mining.

Graph associ-

ation rule mining exhibits numerous application scenarios.

Social analysis

: Social analysis is critical for understanding and im-

proving our life. Social relationships can be modeled using graphs,

and graph association rule mining can be used to discover regular-

ities between the status of people and their relationship patterns.

For example, social relationships aect to our health [

] and hap-

piness [

]. We can nd interesting rules, for example, “if people

are happy, they are healthy and have friends.”

Recommendation

: In current recommendation systems, user be-

havior, such as viewing and clicking [

], toward particular

products are tracked. Bipartite graphs are used to model user be-

havior to products. Graph association rule mining can be used nd

potential customers by nding regularities among user behavior,

products, and proles.

Social discrimination checking

: Social discrimination in graphs should

be solved for building fair machine learning models [

]. Machine

learning models for graph data can be suered from such discrimi-

natory bias as same as transactional data. Since graph association

rules can discover regularities that include discriminatory bias, they

can be used to remove discriminatory bias in the graph data.

Motivation and challenge.

Graph association rule mining is fun-

damental for graph analysis, but only a few techniques have been

developed [

]. Existing methods cannot be applied to

the aforementioned applications due to their own semantics (see

Sec. 7 in detail).

Existing works have four obstacles to apply the aforementioned

applications. First, in all existing works, a set of vertices in conse-

quent (i.e.,

𝐺𝑌

) must be a subset of those in the antecedent (i.e.,

𝐺𝑋

Thus, they cannot discover association rules such that

𝐺𝑋

and

𝐺𝑌

involve distinct vertices. Second, since

𝐺𝑋

and

𝐺𝑌

are restrictive

(e.g., a single edge or no common edges in

𝐺𝑋

and

𝐺𝑌

), they cannot

nd the dierence of vertex attributes. Third, each rule and discov-

ery algorithm assumes a single graph type, for example, graphs

with labeled vertices and unlabeled edges, and thus does not handle

more generalized graphs than that it assumes (e.g., if discovery

algorithms assume graphs with unlabeled edges, it cannot handle

ones with labeled edges). Fourth, their antecedent and consequent

are basic graph pattern, so it does not capture the connectivity

between vertices.

Therefore, we need a novel graph association rule mining tech-

nique. We introduce three challenges that have not been addressed

in existing studies. First, simple association rules are required to

exibly capture regularities that can represent (1) no restriction of

vertices in antecedent and consequent and (2) connectivity between

vertices. Second, an ecient and scalable mining algorithm is re-

quired to accurately nd all frequent patterns in general property

graphs that include vertex attributes and edge labels. Finally, such

rules should be eective and provide insight and knowledge, and

oer conventional metrics such as support, condence, and lift.

Contribution.

We propose a novel and simple concept, path as-

sociation rule mining, and develop an ecient and scalable algo-

rithm for discovering path association rules. Path association rule

mining aims to discover regularities between path patterns; path

patterns are sequences of vertex attribute sets and edge labels.

Reachablity patterns can be applied as path patterns that repre-

sent label-constrained reachability between two vertices in directed

edges. This mining nds rules

𝑝𝑋⇒𝑝𝑌

such that more than

𝜃

vertices become sources of path patterns

𝑝𝑋

and

𝑝𝑌

, where

𝜃

a given minimum support. This rule can naturally oer standard

metrics, such as absolute/relative support, condence, and lift.

Example 1.1. Figure 1 illustrates an example of path associa-

tion rule mining. A given graph includes 12 vertices, 15 edges, 4

types of edge labels, and 8 types of vertex attributes. From this

graph, we mine path association rules with

𝜃

is one. Path pattern

⟨{Male},Follows,{Art}⟩

is frequent because vertices

𝑣8

and

𝑣9

be-

come the sources of the path patterns (we say that

𝑣8

and

𝑣9

match

with the path pattern). Similarly, path pattern

⟨{CS},BelongTo,{Uni}⟩

arXiv:2210.13136v1 [cs.DB] 24 Oct 2022

Yuya Sasaki

Follows

Art

BelongTo LocatedIn Likes

CS Chem

Art

Length 1 simple path

Museum Uni City Male Female

CS CS

Length 2 simple path Kleene path

Art CS

Art

＊

Rule

Art

Chem

CS CS

Art

CS CS CS

Art

CS CS CS

Art

p1 p2 p3 p4 p5 p6 p7p8 p9

e1e2

e5e6

e3e4

e12

e11

e10

e13 e14 e15

p10 p11 p12 p13

v1v2

v3v4

v10

v5v6

v7v8v11

v12

r1 r2 r3 r4 r5 r6

Figure 1: Path association rules

matches

𝑣8

and

𝑣9

⟨{CS},Follows,{Art}⟩ ⇒ ⟨{Male},BelongTo,

{Uni}⟩ is discovered as a path association rule.

The path association rule problem is NP-hard. Therefore, we

develop an algorithm to eciently nd the set of frequent path

patterns and rules. Our algorithm employs an Apriori approach

using anti-monotonic property that path association rules hold to

prune the candidate of frequent path patterns and rules. We can

prune path patterns in two ways, namely vertically, which stops

extending the length of path patterns, and horizontally, which stops

extending the number of attributes in path patterns. Our algorithm

can exactly compute support, condence, and other measures of

rules. Furthermore, we develop parallelization and approximation

techniques to scalably accelerate nding association rules. In our

parallelization technique, we estimate costs of vertices to compute

path patterns, and assign the set of vertices into threads according

to the costs. We reduce the computation costs in two approximate

methods; reducing candidates of sux of path patterns and sam-

pling vertices to be computed.

The contributions of the study are summarized as follows:

•Novel Concept

: We propose path association rule mining

to capture regularities between path patterns. We can apply

reachability path patterns, which diers from existing graph

association rule mining.

•Ecient and Scalable Algorithm

: We develop an ecient

and scalable algorithm that parallely nds the rules with

pruning infrequent path patterns. Our algorithm can apply

approximation techniques for more eciency.

•Discoveries in Real-World Graphs

: We demonstrate that

path association rule mining is eective in knowledge ex-

traction and bias checking.

Reproducibility

. The code and datasets used in this study are

available at github. Since codes on the existing works [

]

are not open publicly, our code is the rst open software on graph

association rule mining on a single large graph. Our supplementary

le in the github includes proofs of our theorems and lemmas, and

additional related studies and experimental results.

2 OUR CONCEPT

We propose a novel concept, path association rule mining to ef-

fectively determine the regulations between attributes of vertices

connected by edge labels. Its remarkable feature compared with

existing graph association rules is that it captures correlations of

distinct path patterns from the same vertices, which are often useful

in many applications. In addition, the path association rule mining

discovers rules on general property graphs, which widely cover

many graph types (e.g., labeled graphs).

2.1 Notations

We consider graph

G=(V,E,L,A)

where

is a nite set of

vertices,

E ⊂ V × L × V

is the set of edges with label

ℓ∈ L

a nite set of edge labels, and

is a nite set of attributes. Edge

𝑒∈ E

is represented by a triple

(𝑣, ℓ𝑒, 𝑣′)

where an edge from vertex

𝑣

to vertex

𝑣′

with edge label

ℓ𝑒

. Attribute

𝑎∈ A

is a categorical

value that represents a feature of vertex. We dene

𝐴⊆ A

as a set

of attributes and each 𝑣∈ V has a set of attributes 𝐴(𝑣).

Path is a sequence of vertices and edges

⟨𝑣0, 𝑒0, 𝑣1, . . . , 𝑒𝑛−1, 𝑣𝑛⟩

where

𝑛

is a length of the path. We dene

𝑣0

and

𝑣𝑛

as the source

and target of path, respectively.

Example 2.1. In Figure 1,

L={Follows,BelongTo,LocatedIn,

Likes}

and

A={Museum,Uni,City,Male,Female,CS,Chem,Art}

⟨𝑣8, 𝑒9, 𝑣5, 𝑒3, 𝑣2⟩

is a two length path where

𝑣8

and

𝑣2

are the source

and target of the path, respectively.

2.2 Path Association Rule

We here dene path association rules that we propose in this paper,

after dening path patterns.

Path pattern:

We dene two path patterns, namely simple and

reachability path patterns.

•

Simple path pattern is a sequence of attribute sets and edge

labels

⟨𝐴0, ℓ0, 𝐴1, . . . , ℓ𝑛−1, 𝐴𝑛⟩

where

𝐴𝑖⊆ A

and

ℓ𝑖∈ L

Given vertex

𝑣

and simple path pattern

𝑝

𝑣

matches

𝑝

𝑣

a source of a path that satises 𝐴𝑖⊆𝐴(𝑣𝑖)and ℓ𝑖=ℓ𝑒𝑖for all 𝑖.

•

Reachability path pattern is a pair of attribute sets and edge

label

⟨𝐴0, ℓ∗, 𝐴1⟩

, where

𝐴0, 𝐴1⊆ A

and

ℓ∗∈ L

. Given

Path Association Rule Mining

vertex

𝑣

and reachability path pattern

𝑝

𝑣

matches

𝑝

𝑣

is a

source of a path that satises

𝐴0⊆𝐴(𝑣0)

𝐴1⊆𝐴(𝑣𝑛)

, and

ℓ∗=ℓ𝑒𝑖for all 𝑖.

We denote by

V (𝑝)

the set of all matched vertices of path pattern

𝑝

and by

|V (𝑝)|

the number of matched vertices of

𝑝

. We call a

path pattern whose all attributes sets include a single attribute unit

path pattern.

Denition 2.2 (Dominance). Given two path patterns

𝑝=⟨𝐴0, ℓ0, 𝐴1,

. . . , ℓ𝑛−1, 𝐴𝑛⟩

and

𝑝′=⟨𝐴′

0, ℓ′

0, 𝐴′

1, . . . , ℓ ′

𝑚−1, 𝐴′

𝑚⟩

𝑝

dominates

𝑝′

𝑚≤𝑛

, and

ℓ′

𝑖=ℓ𝑖

and

𝐴′

𝑖⊆𝐴𝑖

for all

𝑖=

0to

𝑚

. We denote

𝑝′⊂𝑝

when 𝑝′is dominated by 𝑝.

Intuitively, dominating path patterns are more complex than

their dominated path patterns.

Example 2.3. In Figure 1,

𝑣𝐻

matches simple path patterns

⟨{CS,

Male},Follows,{CS,female}⟩

and reachability path patterns

⟨{CS,Male},

Follows∗,{CS,Male}⟩

V (⟨{Male},Follows,{Male}⟩)

{𝑣8, 𝑣9, 𝑣12}

Path association rule:

We dene path association rules as fol-

lows.

Denition 2.4. A path association rule

𝑟

is dened by

𝑝𝑋⇒𝑝𝑌

where

𝑝𝑋

and

𝑝𝑌

are path patterns. We refer to

𝑝𝑋

and

𝑝𝑌

as the

antecedent and consequent of rule, respectively.

The path association rule evaluates frequencies and condential

probabilities of vertices that match both

𝑝𝑋

and

𝑝𝑌

. We apply

homorphily semantics on path association rules. Thus, we allow that

a single path is shared by both 𝑝𝑋and 𝑝𝑌.

Example 2.5. Given a rule

⟨{CS},Follows,{Art}⟩ ⇒ ⟨{CS},

BelongTo,{Uni}⟩

in Figure 1,

𝑣8

and

𝑣9

are matched with this rule.

2.3 Metrics of association rules

The path association rule oers general metrics of association rules.

We can naturally dene support, condence, and lift for path asso-

ciation rules.

Support: The support of path association rules indicates how many

vertices are applied into sources of paths. We dene two support

metrics of absolute and relative supports. Most of graph association

rule mining does not oer relative support (see Sec 7) because it is

hard to compute the possible maximum number of matched graph

patterns. The absolute support is dened as follows:

ASupp(𝑝𝑋⇒𝑝𝑌)=|V (𝑝𝑋) ∩ V (𝑝𝑌)|.

Since the maximum value of

|V (𝑝)|

is the number of vertices,

the relative support is dened as follows:

RSupp(𝑝𝑋⇒𝑝𝑌)=|V (𝑝𝑋) ∩ V (𝑝𝑌)|

|V| .

Condence: The condence of path association rules indicates

the probability that vertices satises

𝑝𝑋

if they satisfy

𝑝𝑌

. The

condence is dened as follows:

Conf (𝑝𝑋⇒𝑝𝑌)=|V (𝑝𝑋) ∩ V (𝑝𝑌)|

|V (𝑝𝑋)| .

Lift: Lift is not oered by most of graph association rule mining as

well as the relative support. We can dene lift as follows:

Li(𝑝𝑋⇒𝑝𝑌)=|V (𝑝𝑋) ∩ V (𝑝𝑌)| · |V|

|V (𝑝𝑋)| · |V(𝑝𝑌)| .

Others: Similarly, we can dene other measures.

Example 2.6. We explain the metrics by using Figure 1.

ASupp(r1)

RSupp(r1)

Conf (r1)

, and

Li(r1)

are 2, 0.16, 1.0, and 6, respectively.

2.4 Problem Denition

We dene a problem that we solve in this paper.

Denition 2.7 (Path association rule mining problem). Given graph

, minimum (absolute or relative) support

𝜃

, maximum path length

𝑘

, the path association rule mining problem aims to nd all rules

with their metrics, where

𝑟

satises (1)

supp(𝑟)

is larger than

𝜃

, (2)

lengths of paths are at most

𝑘

, and (3)

𝑝𝑋

is not dominated

𝑝𝑌

and

vice versa. We say that 𝑝is frequent if |V (𝑝) | >𝜃.

Theorem 2.8. Path association rule mining problem is NP-hard.

Proof sketch: We show that the problem is NP-hard through reduc-

tion from as frequent subsequence enumeration problem, which

is NP-hard [

]. Sequences and subsequences are sub classes of

graphs and path patterns, respectively. Thus, the path association

rule mining problem is NP-hard. □

Remarks1:

The path association rule is general compared with

conventional association rule mining for itemsets. Each vertex and

its attributes can be considered as a transaction and items of the

transaction, respectively. If no edges exist in graphs, path associ-

ation rule mining is equivalent to conventional association rule

mining for itemsets.

Remarks2:

In recent real-life query logs for Wikidata [

], more

than 90 % of queries are path patterns [

]. From this fact, graph

analysis often utilizes path patterns instead of graph patterns to un-

derstand real-world relationships. Therefore, path association rule

mining can be useful in real-world analysis on many applications.

3 OUR ALGORITHM

We present an ecient algorithm for path association rule mining.

3.1 Overview

For solving the path association rule mining problem, we need to

enumerate frequent path patterns and vertices that match the path

patterns. To reduce the candidates of path patterns, our algorithm

employs an Apriori strategy which initially nds non-complex

frequent path patterns and generates candidates of frequent com-

plex path patterns form the found path patterns, based on anti-

monotonic properties.

Anti-monotonic properties.

Anti-monotonic properties are lever-

aged to reduce candidates of path patterns eectively. Path patterns

naturally satisfy anti-monotonic properties as follows:

Theorem 3.1. If 𝑝′⊂𝑝,V (𝑝′) ⊇ V (𝑝).

Proof: When

𝑝

is dominated by

𝑝′

𝑝

is not longer than

𝑝′

, all

edge labels on

𝑝

and

𝑝′

are the same order, and attribute sets on

𝑝

are subset or equal to those on

𝑝′

. All vertices that match

𝑝′

are matched with path patterns whose attributes are subset of

𝑝′

Therefore, if path pattern 𝑝⊆𝑝′,V (𝑝) ⊇ V (𝑝′).□

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

PathAssociationRuleMiningYuyaSasakisasaki@ist.osaka-u.ac.jpOsakaUniversityABSTRACTGraphassociationruleminingisadataminingtechniqueusedfordiscoveringregularitiesingraphdata.Inthisstudy,weproposeanovelconcept,pathassociationrulemining,todiscoverthecor-relationsofpathpatternsthatfrequentlyappearinagive...

展开>> 收起<<

Path Association Rule Mining.pdf

共11页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Path Association Rule Mining

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: