Towards a Complete Direct Mapping From Relational Databases To Property Graphs Abdelkrim Boudaoud Houari Mahfoud Azeddine Chikh

2025-05-06 0 0 1.34MB 16 页 10玖币
侵权投诉
Towards a Complete Direct Mapping From
Relational Databases To Property Graphs
Abdelkrim Boudaoud , Houari Mahfoud , Azeddine Chikh
Abou-Bekr Belkaid University & LRIT Laboratory
Tlemcen, Algeria
{abdelkrim.boudaoud, houari.mahfoud, azeddine.chikh}@univ-tlemcen.dz
Abstract. It is increasingly common to find complex data represented
through the graph model. Contrary to relational models, graphs offer
a high capacity for executing analytical tasks on complex data. Since
a huge amount of data is still presented in terms of relational tables,
it is necessary to understand how to translate this data into graphs.
This paper proposes a complete mapping process that allows transform-
ing any relational database (schema and instance) into a property graph
database (schema and instance). Contrary to existing mappings, our so-
lution preserves the three fundamental mapping properties, namely: in-
formation preservation, semantic preservation and query preservation.
Moreover, we study mapping any SQL query into an equivalent Cypher
query, which makes our solution practical. Existing solutions are either
incomplete or based on non-practical query language. Thus, this work is
the first complete and practical solution for mapping relations to graphs.
Keywords: Direct mapping, Complete mapping, Relational database,
Graph database, SQL, Cypher
1 Introduction
Relational databases (RDs) have been widely used and studied by researchers
and practitioners for decades due to their simplicity, low data redundancy, high
data consistency, and uniform query language (SQL). Hence, the size of web
data has grown exponentially during the last two decades. The interconnections
between web data entities (e.g. interconnection between YouTube videos or peo-
ple on Facebook) are measured by billions or even trillions [5] which pushes the
relational model to quickly reach its limits as querying high interconnected web
data requires complex SQL queries which are time-consuming. To overcome this
limit, the graph database model is increasingly used on the Web due to its flex-
ibility to present data in a normal form, its efficiency to query a huge amount
of data and its analytic powerful. This suggests studying a mapping from RDs
to graph databases (GDs) to benefit from the aforementioned advantages. This
kind of mapping has not received more attention from researchers since only a
few works [3,4,12,13] have considered it. A real-life example of this mapping
has been discussed in [12]: “investigative journalists have recently found, through
arXiv:2210.00457v1 [cs.DB] 2 Oct 2022
2 A. Boudaoud et al.
graph analytics, surprising social relationships between executives of companies
within the Offshore Leaks financial social network data set, linking company of-
ficers and their companies registered in the Bahamas. The Offshore Leaks PG
was constructed as a mapping from relational database (RDB) sources”. In a
nutshell, the proposed mappings suffer from at least one of the following limits :
a) they do not study fundamental properties of mapping; b) they do not consider
a practical query language to make the approach more useful; c) they generate
an obfuscated schema.
This paper aims to provide a complete mapping (CM ) from RDs to GDs by
investigating the fundamental properties of mapping [13], namely : information
preservation (IP), query preservation (QP), and semantic preservation (SP). In
addition to data mapping, we study the mapping of SQL queries to Cypher
queries which makes our results more practical since SQL and Cypher are the
most used query languages for relational and graph data respectively.
Contributions and Road-map. Our main contributions are as follows : 1) we
formalize a CM process that maps RDs to GDs in the presence of schema; 2) we
propose definitions of schema graph and graph consistency that are necessary for
this mapping; 3) we show that our CM preserves the three fundamental mapping
properties (IP,SM, and QP); 4) in order to prove QP, we propose an algorithm
to map SQL queries into equivalent Cypher queries. To our knowledge, this work
is the first complete effort that investigates mapping relations to graphs.
Related Work. We classify previous works as follows:
Mapping RDs to RDF Data. Squeda et al. [11] studied the mapping of RDs to
RDF graph data and relational schema to OWL ontology’s. Moreover, SQL
queries are translated into SPARQL queries. They were the first to define a set
of mapping properties : information preservation, query preservation, semantic
preservation and monotonicity preservation. They proved first that their map-
ping is information preserving and query preserving, while when it comes to the
two remaining properties, preserving semantics makes the mapping not mono-
tonicity preserving.
Mapping RDs to Graph Data. De Virgilio et al. [3,4] studied mapping a) RDs
to property graph data (PG) by considering schema both in source and target;
and b) any SQL query into a set of graph traversal operations that realize the
same semantic over the resulted graph data.We remark that the proposed map-
ping obfuscates the relational schema since the resulted graph schema is difficult
to understand. Moreover, the mapping does not consider typed data. From the
practical point of view, the graph querying language considered is not really
used in practice and the proposed query mapping depends on the syntax and
semantics of this language which makes hard the application of their proposal
for another query language. In addition, they apply an aggregation process that
maps different relational tuples to the same graph vertex in order to optimize
graph traversal operations. However, this makes the mapping not information
preserving and can skew the result of some analytical tasks that one would like
to apply over the resulted data graph.
Complete Direct Mapping From Relational Databases To Property Graphs 3
Table 1: Comparative table of related works.
Type Work Mapping Preserved
properties Mapping rules
Schema Instance IP QP SP
RDs RDF
Sequeda et al. [11]X X X X X X
De vergillio et al. [3,4]X X X
Stoica et al. [12,13]X X X X X X
RDF PG Angeles et al. [2]X X X X X
RDs PG
O.Orel et al. [10]X X
S.Li et al. [8]X
Our Work X X X X X X
Stoica et al. [12,13] studied the mapping of RDs to GDs and any relational
query (formalized as an extension of relational algebra) into a G-core query.
Firstly, the choice of source and destination languages hinders the practicability
of the approach. Moreover, it is hard to see if the mapping is semantic preserving
since no definition of graph data consistency is given. Attributes, primary and
foreign keys are verbosely represented by the data graph, which makes this later
hard to understand and to query.
O.Orel et al. [10] discussed mapping relational data only into property graphs
without giving attention neither to schema nor mapping properties.
The Neo4j system provides a tool called Neo4j-ETL [1], which allows users
to import their relational data into Neo4j to be presented as property graphs.
Notice that the relational structure (both instance and schema) is not preserved
during this mapping since some tuples of the relational data (resp. relations of the
relational schema) are represented as edges for storage concerns (as done in [4]).
However, as remarked in [12], this may skew the results of some analytical tasks
(e.g. density of the generated graphs). Moreover, Neo4j-ETL does not allow the
mapping of queries. S. Li et al. [8] study an extension of Neo4j-ETL by proposing
mapping of SQL queries to Cypher queries. However, their mapping inherits the
limits of Neo4j-ETL. In addition, no detailed algorithm is given for the query
mapper which makes impossible the comparison of their proposal with other
ones. This is also the limit of [9].
Finally, Angles et al. [2] studied mappings RDF databases to property graphs
by considering both data and schema. They proved that their mapping ensures
both information and semantic preservation properties.
Table 1summarizes most important features of related works.
2 Preliminaries
This section defines the several notions that will be used throughout this paper.
Let Rbe an infinite set of relation names, Ais an infinite set of attribute
names with a special attribute tid,Tis a finite set of attribute types (String,
4 A. Boudaoud et al.
Date,Integer,Float,Boolean,Object), Dis a countably infinite domain of data
values with a special value null.
2.1 Relational Databases
Arelational schema is a tuple S= (R, A, T, Σ) where:
1. R⊆ R is a finite set of relation names;
2. Ais a function assigning a finite set of attributes for each relation rR
such that A(r)⊆ A\{tid};
3. Tis a function assigning a type for each attribute of a relation, i.e. for each
rRand each aA(r)\ {tid},T(a) T ;
4. Σis a finite set of primary (PKs) and foreign keys (FKs) defined over R
and A. A primary key over a relation rRis an expression of the form
r[a1,· · · , an] where a1inA(r). A foreign key over two relations rand sis
an expression of the form r[a1,· · · , an]s[b1,· · · , bn] where a1inA(r)
and s[b1,· · · , bn]Σ.
An instance Iof Sis an assignment to each rRof a finite set I(r) =
{t1,· · · , tn}of tuples. Each tuple ti:A(r){tid}→Dis identified by tid 6=null
and assigns a value to each attribute aA(r). We use ti(a) (resp. ti(tid)) to
refer to the value of attribute a(resp. tid) in tuple ti. Moreover, for any tuples
ti, tjI(r), ti(tid)6=tj(tid) if i6=j.
For any instance Iof a relational schema S= (R, A, T, Σ), we say that I
satisfies a primary key r[a1,· · · , an] in Σif: 1) for each tuple tI(r), t(a1in)6=
null; and 2) for any t0I(r), if t(a1in) = t0(a1in) then t=t0must hold.
Moreover, Isatisfies a foreign key r[a1,· · · , an]s[b1,· · · , bn] in Σif: 1) I
satisfies s[b1,· · · , bn]; and 2) for each tuple tI(r), either t(a1in) = null
or there exists a tuple t0I(s) where t(a1in) = t0(b1in). The instance I
satisfies all integrity constraints in Σ, denoted by IΣ, if it satisfy all primary
keys and foreign keys in Σ.
Finally, a relational database is defined with DR= (SR, IR) where SRis a
relational schema and IRis an instance of SR.
2.2 Property Graphs
Aproperty graph (PG) is a multi-graph structure composed of labeled and at-
tributed vertices and edges defined with G= (V, E, L, A) where: 1) Vis a finite
set of vertices; 2) EV×Vis a finite set of directed edges where (v, v0)E
is an edge starting at vertex vand ending at vertex v0; 3) Lis a function that
assigns a label to each vertex in V(resp. edge in E); and 4) Ais a function
assigning a nonempty set of key-value pairs to each vertex (resp. edge). For any
edge eE, we denote by e.s (resp. e.d) the starting (resp. ending) vertex of e.
摘要:

TowardsaCompleteDirectMappingFromRelationalDatabasesToPropertyGraphsAbdelkrimBoudaoud,HouariMahfoud,AzeddineChikhAbou-BekrBelkaidUniversity&LRITLaboratoryTlemcen,Algeriafabdelkrim.boudaoud,houari.mahfoud,azeddine.chikhg@univ-tlemcen.dzAbstract.Itisincreasinglycommonto ndcomplexdatarepresentedthrough...

展开>> 收起<<
Towards a Complete Direct Mapping From Relational Databases To Property Graphs Abdelkrim Boudaoud Houari Mahfoud Azeddine Chikh.pdf

共16页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:16 页 大小:1.34MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 16
客服
关注