Scaling Knowledge Graphs for Automating AI of Digital Twins Joern Ploennigs10000000263208891

2025-05-03 0 0 413.66KB 17 页 10玖币

侵权投诉

Scaling Knowledge Graphs for Automating AI

of Digital Twins

Joern Ploennigs1[0000−0002−6320−8891],

Konstantinos Semertzidis1[0000−0002−7040−6706],

Fabio Lorenzi1, and Nandana Mihindukulasooriya1[0000−0003−1707−4842]

IBM Research Europe

Joern.Ploennigs@ie.ibm.com,

{konstantinos.semertzidis1,fabio.lorenzi1,nandana}@ibm.com

Abstract. Digital Twins are digital representations of systems in the In-

ternet of Things (IoT) that are often based on AI models that are trained

on data from those systems. Semantic models are used increasingly to

link these datasets from diﬀerent stages of the IoT systems life-cycle to-

gether and to automatically conﬁgure the AI modelling pipelines. This

combination of semantic models with AI pipelines running on external

datasets raises unique challenges particular if rolled out at scale. Within

this paper we will discuss the unique requirements of applying semantic

graphs to automate Digital Twins in diﬀerent practical use cases. We

will introduce the benchmark dataset DTBM that reﬂects these charac-

teristics and look into the scaling challenges of diﬀerent knowledge graph

technologies. Based on these insights we will propose a reference architec-

ture that is in-use in multiple products in IBM and derive lessons learned

for scaling knowledge graphs for conﬁguring AI models for Digital Twins.

Keywords: Knowledge Graphs, Semantic Models, Scalability, Internet

of Things, Machine Learning, Digital Twins

1 Introduction

Semantic models are establishing across industries in the Internet of Things (IoT)

to model and manage domain knowledge. They range from driving the next gen-

eration of manufacturing in Industry 4.0 [3,17,19], to explainable transport [29],

energy savings in buildings for a sustainable future [5, 11]. Their application cu-

mulates in the use of semantic integration of various IoT sensors [28] to automate

analytics of the created data [11, 37].

Digital Twins are one area of applying semantic models. A Digital Twin is a

digital representation of an IoT system that is able to continuously learn across

the systems life cycle and predict the behaviour of the IoT system [26]. They have

multiple uses across the life cycle from providing recommendations in the design

of the system, to automating its manufacturing and optimizing its operation

by diagnosing anomalies or improving controls with prediction [36]. The core

of a Digital Twin is formed by two tightly interacting concepts. First, an AI

arXiv:2210.14596v1 [cs.AI] 26 Oct 2022

2 J. Ploennigs et al.

model, such as a Machine Learning (ML) or simulation model, that is capable

of continuous learning from data and explaining and predicting its behaviour.

Second, a Digital Thread that is linking these underlying data sources across the

systems life cycle [34]. Both approaches interact tightly as the Thread needs to

be used to automate the conﬁguration of the AI models to allow to scale their

application while results from the AI model should be injected back into the

Thread to learn and explain knowledge.

Semantic Knowledge Graph technologies are very well suited for implement-

ing this Digital Thread [1]. They promise to solve several common challenges

from normalizing labelling of the various data sources to being ﬂexible enough

to be extended over the life cycle when new applications arise [8]. However, scal-

ing knowledge graphs is challenging in its own terms [24] and in our practice

we experience multiple issues in scaling Digital Threads. Within this paper we

will deep dive into this use case and discuss some of the practical issues. We will

follow the typical industry workﬂow for designing and selecting a solution from

collecting requirements and deﬁning a test example, to deriving a reference ar-

chitecture and evaluating ﬁnal realization options for some large scale examples.

The contributions of the paper are:

–Requirements for Digital Twins: We collect the requirements for semantic

representation of Digital Twins in Section 3.

–In-Use Experience for Scaling: We discuss our in-use experience in scaling

Digital Twins and propose a reference architecture.

–Benchmark model for Digital Twins: We deﬁne a benchmark model for se-

mantic Digital Twins for an manufacturing example that tests some of the

identiﬁed requirements in Section 5.

–Comparison of KG Technologies: We compare diﬀerent knowledge graph

technologies for managing the semantic models for Digital Twins in Section 6

including our own semantic property graph.

2 State of the art

Knowledge Graphs for Digital Twins: There are several examples of applying

semantic models for representing Digital Twins [1,18, 20]. Kharlamov et al. [18]

argues for the beneﬁts of using semantic models for digital twins e. g. to simplify

analytics in Industry 4.0 settings. Similarly, Kalayci et al. [17] shows how to

manage industry data with semantic models. Lietaert et al. [21] presents a Digital

Twin architecture for Industry 4.0. Chevallier et al. [10] proposes one for Smart

Buildings. Akroyd et al. [1] reviews multiple approaches for geospatial knowledge

graph for a Digital Twin of the UK. Their work demonstrates the challenges

in incorporating data from diﬀerent domains into one knowledge graph like the

heterogenity of data sources. These example represent the use of semantic models

for building Digital Twins in diﬀerent industries that we also see in practise.

Semantic Data Management: A common goal of using knowledge graphs for Dig-

ital Twins is to integrate data from various systems. Established solutions exist

Scaling Knowledge Graphs for Automating AI of Digital Twins 3

for doing this with semantic knowledge graphs that also may integrate external

data. Pan et al. [25] presents a survey of semantic data management systems and

benchmarks. The authors classify the systems using a taxonomy that includes

native RDF stores, RDBMS-based and NoSql-based data management systems.

Besta et al. [6] provide a classiﬁcation based on the database technology. Their

analysis shows that the diﬀerent design have various pros and cons. Some of the

widely-used generic triple-stores such as OpenLink Virtuoso [13], Apache Jena,

Blazegraph, GraphDB excel on managing RDF data, but, do not scale well in

integrating non RDF data. General purpose property graphs like Neo4J or Janus-

Graph lack intrinsic understanding of semantic models. Multi-modal databases

like ArgangoDB or Redis combine a no-sql database with a graph database that

allows to manage documents alongside the graph. But, they also suﬀer from a

good understanding of semantic [30]. Entris [12] and Schmidt [31] extend this

idea and use semantic models to manage additional data in a data lake. In Sec-

tion 3 we will discuss some unique requirements that create challenges in scaling

such knowledge graphs. We derive a reference architecture that separation the

semantic graph layer from the data layer to scale better to large volumes of data

and have federated access to address the Semantic Digital Threads requirements.

As shown by our experiments, such design seems to provide better scalability

for our use case compared to the other semantic data management approaches.

Benchmarks for Semantic Data: To validate that the requirements in modelling

Digital Twins are unique and evaluate diﬀerent knowledge graph technologies, we

created a new Digital Twin Benchmark Model (DTBM). We compare it against

some established benchmarks. The Berlin SPARQL Benchmark (BSBM) [7] and

Lehigh University Benchmark (LUBM) [14] are generic RDF Benchmarks that

run a variant of queries on generated datasets. SP2Bench [33] is based on DBLP

library dataset and reﬂects the social network characteristics of semantic web

data. DBpedia SPARQL benchmark [23] uses real queries that were performed

by humans and applications against on DBpedia. Additional work reﬂects the

requirements and characteristics of certain domains. PODiGG [35] and GTFS-

Madrid-Bench [9] are examples of benchmarks for public transport domain fo-

cused on use cases and requirements on route planning on gespatial and temporal

transport data. LSLOD [15] contains datasets from the life sciences domain and

the Linked Data cloud and 10 simple and 10 complex queries that need query

federation. Fedbench suite [32] evaluates eﬃciency and eﬀectiveness of federated

SPARQL queries over three diﬀerent datasets: cross-domain, life science, and

SPBenc. We will use BSBM and LUBM in the evaluation in Section 6 as they

are very well established and tested for many knowledge graphs technologies and

address themselves diﬀerent RDF characteristics. In addition, we will propose a

new benchmark focused on our use case.

3 Requirements for Semantic Digital Threads

A Digital Thread is linking data from diﬀerent life cycle stages of a Digital

Twin. This starts from design documents such as textual requirements, test

4 J. Ploennigs et al.

speciﬁcations, to CAD-ﬁles and handbooks that may exist in diﬀerent formats.

During production additional properties may be attached to a Digital Twin such

as sensor data from machines, materials used, material providers, and test re-

sults. During operation the data collected from the ﬁnal system is also added.

It is often related to asset management data such as fault reports, maintenance

work-orders, and replacement histories as well as timeseries data collected from

IoT sensors embedded in the systems such as temperature measurements, oper-

ational states or alarms. The diﬀerent datasets that are collected across the life

cycle are linked together in the Digital Thread and often analyzed by Machine

Learning algorithms to discover and explain anomalies, predict the behaviour of

the system and advise people in improved manufacturing and operation of the

system.

From this description we can synthesize some characteristics to a semantic

knowledge graph that can be used to implement such a Digital Thread.

–C1 - Heterogenous Semantic Types: The connected data is very heteroge-

nous, representing domainspeciﬁc semantic types. A domain ontology can

contain thousands of types. For example, the BRICK ontology [5] contains

ca. 3.000 classes for modelling smart buildings datasets.

–C2 - Multi-modal Representation: The data is multi-modal and represented

in diﬀerent formats from timeseries, to binary ﬁles, and text documents.

–C3 - Federated Data: The data is stored and managed in various systems such

as complex Continuous Engineering Systems, Asset Management Systems,

or IoT platforms.

–C4 - Flexible Hierarchies: Data is often structured in hierarchical models

such as location hierarchies (Country >City >Factory >Production Line)

and asset hierarchies (Robot >Arm >Joint) that are of ﬂexible depth.

–C5 - Large size: We see graph sizes often in the range of 100.000 datasets

for a mid-size Digital Twin.

–C6 - Composability: Digital Twins often contain other Digital Twins. For

example, a factory twin may contain a robot twin.

–C7 - Lack of semantic knowledge: We often experience that domain experts

do not have deep semantic knowledge. Though, they often understand soft-

ware engineering concepts like classes and inheritance.

–C8 - Dynamic: Digital Twins change over their lifetime and so does the

Digital Thread. In consequence, the knowledge graph does change regularly

bringing in the need to represent time, states and versioning.

The goals for building the Digital Thread are:

–G1 - Data Linking: The ﬁrst goal of the Digital Thread is to link data from

various life cycle stages and backend systems together to create an integrated

view of the data.

–G2 - Data Contextualization: The second goal of building Digital Threads is

to contextualize the data and understand spatial and functional context to

summarize and explain the data.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ScalingKnowledgeGraphsforAutomatingAIofDigitalTwinsJoernPloennigs1[0000000263208891],KonstantinosSemertzidis1[0000000270406706],FabioLorenzi1,andNandanaMihindukulasooriya1[0000000317074842]IBMResearchEuropeJoern.Ploennigs@ie.ibm.com,fkonstantinos.semertzidis1,fabio.lorenzi1,nandanag@ibm.comAbstract....

展开>> 收起<<

Scaling Knowledge Graphs for Automating AI of Digital Twins Joern Ploennigs10000000263208891.pdf

共17页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Scaling Knowledge Graphs for Automating AI of Digital Twins Joern Ploennigs10000000263208891

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: