Scaling Knowledge Graphs for Automating AI of Digital Twins Joern Ploennigs10000000263208891

2025-05-03 0 0 413.66KB 17 页 10玖币
侵权投诉
Scaling Knowledge Graphs for Automating AI
of Digital Twins
Joern Ploennigs1[0000000263208891],
Konstantinos Semertzidis1[0000000270406706],
Fabio Lorenzi1, and Nandana Mihindukulasooriya1[0000000317074842]
IBM Research Europe
Joern.Ploennigs@ie.ibm.com,
{konstantinos.semertzidis1,fabio.lorenzi1,nandana}@ibm.com
Abstract. Digital Twins are digital representations of systems in the In-
ternet of Things (IoT) that are often based on AI models that are trained
on data from those systems. Semantic models are used increasingly to
link these datasets from different stages of the IoT systems life-cycle to-
gether and to automatically configure the AI modelling pipelines. This
combination of semantic models with AI pipelines running on external
datasets raises unique challenges particular if rolled out at scale. Within
this paper we will discuss the unique requirements of applying semantic
graphs to automate Digital Twins in different practical use cases. We
will introduce the benchmark dataset DTBM that reflects these charac-
teristics and look into the scaling challenges of different knowledge graph
technologies. Based on these insights we will propose a reference architec-
ture that is in-use in multiple products in IBM and derive lessons learned
for scaling knowledge graphs for configuring AI models for Digital Twins.
Keywords: Knowledge Graphs, Semantic Models, Scalability, Internet
of Things, Machine Learning, Digital Twins
1 Introduction
Semantic models are establishing across industries in the Internet of Things (IoT)
to model and manage domain knowledge. They range from driving the next gen-
eration of manufacturing in Industry 4.0 [3,17,19], to explainable transport [29],
energy savings in buildings for a sustainable future [5, 11]. Their application cu-
mulates in the use of semantic integration of various IoT sensors [28] to automate
analytics of the created data [11, 37].
Digital Twins are one area of applying semantic models. A Digital Twin is a
digital representation of an IoT system that is able to continuously learn across
the systems life cycle and predict the behaviour of the IoT system [26]. They have
multiple uses across the life cycle from providing recommendations in the design
of the system, to automating its manufacturing and optimizing its operation
by diagnosing anomalies or improving controls with prediction [36]. The core
of a Digital Twin is formed by two tightly interacting concepts. First, an AI
arXiv:2210.14596v1 [cs.AI] 26 Oct 2022
2 J. Ploennigs et al.
model, such as a Machine Learning (ML) or simulation model, that is capable
of continuous learning from data and explaining and predicting its behaviour.
Second, a Digital Thread that is linking these underlying data sources across the
systems life cycle [34]. Both approaches interact tightly as the Thread needs to
be used to automate the configuration of the AI models to allow to scale their
application while results from the AI model should be injected back into the
Thread to learn and explain knowledge.
Semantic Knowledge Graph technologies are very well suited for implement-
ing this Digital Thread [1]. They promise to solve several common challenges
from normalizing labelling of the various data sources to being flexible enough
to be extended over the life cycle when new applications arise [8]. However, scal-
ing knowledge graphs is challenging in its own terms [24] and in our practice
we experience multiple issues in scaling Digital Threads. Within this paper we
will deep dive into this use case and discuss some of the practical issues. We will
follow the typical industry workflow for designing and selecting a solution from
collecting requirements and defining a test example, to deriving a reference ar-
chitecture and evaluating final realization options for some large scale examples.
The contributions of the paper are:
Requirements for Digital Twins: We collect the requirements for semantic
representation of Digital Twins in Section 3.
In-Use Experience for Scaling: We discuss our in-use experience in scaling
Digital Twins and propose a reference architecture.
Benchmark model for Digital Twins: We define a benchmark model for se-
mantic Digital Twins for an manufacturing example that tests some of the
identified requirements in Section 5.
Comparison of KG Technologies: We compare different knowledge graph
technologies for managing the semantic models for Digital Twins in Section 6
including our own semantic property graph.
2 State of the art
Knowledge Graphs for Digital Twins: There are several examples of applying
semantic models for representing Digital Twins [1,18, 20]. Kharlamov et al. [18]
argues for the benefits of using semantic models for digital twins e. g. to simplify
analytics in Industry 4.0 settings. Similarly, Kalayci et al. [17] shows how to
manage industry data with semantic models. Lietaert et al. [21] presents a Digital
Twin architecture for Industry 4.0. Chevallier et al. [10] proposes one for Smart
Buildings. Akroyd et al. [1] reviews multiple approaches for geospatial knowledge
graph for a Digital Twin of the UK. Their work demonstrates the challenges
in incorporating data from different domains into one knowledge graph like the
heterogenity of data sources. These example represent the use of semantic models
for building Digital Twins in different industries that we also see in practise.
Semantic Data Management: A common goal of using knowledge graphs for Dig-
ital Twins is to integrate data from various systems. Established solutions exist
Scaling Knowledge Graphs for Automating AI of Digital Twins 3
for doing this with semantic knowledge graphs that also may integrate external
data. Pan et al. [25] presents a survey of semantic data management systems and
benchmarks. The authors classify the systems using a taxonomy that includes
native RDF stores, RDBMS-based and NoSql-based data management systems.
Besta et al. [6] provide a classification based on the database technology. Their
analysis shows that the different design have various pros and cons. Some of the
widely-used generic triple-stores such as OpenLink Virtuoso [13], Apache Jena,
Blazegraph, GraphDB excel on managing RDF data, but, do not scale well in
integrating non RDF data. General purpose property graphs like Neo4J or Janus-
Graph lack intrinsic understanding of semantic models. Multi-modal databases
like ArgangoDB or Redis combine a no-sql database with a graph database that
allows to manage documents alongside the graph. But, they also suffer from a
good understanding of semantic [30]. Entris [12] and Schmidt [31] extend this
idea and use semantic models to manage additional data in a data lake. In Sec-
tion 3 we will discuss some unique requirements that create challenges in scaling
such knowledge graphs. We derive a reference architecture that separation the
semantic graph layer from the data layer to scale better to large volumes of data
and have federated access to address the Semantic Digital Threads requirements.
As shown by our experiments, such design seems to provide better scalability
for our use case compared to the other semantic data management approaches.
Benchmarks for Semantic Data: To validate that the requirements in modelling
Digital Twins are unique and evaluate different knowledge graph technologies, we
created a new Digital Twin Benchmark Model (DTBM). We compare it against
some established benchmarks. The Berlin SPARQL Benchmark (BSBM) [7] and
Lehigh University Benchmark (LUBM) [14] are generic RDF Benchmarks that
run a variant of queries on generated datasets. SP2Bench [33] is based on DBLP
library dataset and reflects the social network characteristics of semantic web
data. DBpedia SPARQL benchmark [23] uses real queries that were performed
by humans and applications against on DBpedia. Additional work reflects the
requirements and characteristics of certain domains. PODiGG [35] and GTFS-
Madrid-Bench [9] are examples of benchmarks for public transport domain fo-
cused on use cases and requirements on route planning on gespatial and temporal
transport data. LSLOD [15] contains datasets from the life sciences domain and
the Linked Data cloud and 10 simple and 10 complex queries that need query
federation. Fedbench suite [32] evaluates efficiency and effectiveness of federated
SPARQL queries over three different datasets: cross-domain, life science, and
SPBenc. We will use BSBM and LUBM in the evaluation in Section 6 as they
are very well established and tested for many knowledge graphs technologies and
address themselves different RDF characteristics. In addition, we will propose a
new benchmark focused on our use case.
3 Requirements for Semantic Digital Threads
A Digital Thread is linking data from different life cycle stages of a Digital
Twin. This starts from design documents such as textual requirements, test
4 J. Ploennigs et al.
specifications, to CAD-files and handbooks that may exist in different formats.
During production additional properties may be attached to a Digital Twin such
as sensor data from machines, materials used, material providers, and test re-
sults. During operation the data collected from the final system is also added.
It is often related to asset management data such as fault reports, maintenance
work-orders, and replacement histories as well as timeseries data collected from
IoT sensors embedded in the systems such as temperature measurements, oper-
ational states or alarms. The different datasets that are collected across the life
cycle are linked together in the Digital Thread and often analyzed by Machine
Learning algorithms to discover and explain anomalies, predict the behaviour of
the system and advise people in improved manufacturing and operation of the
system.
From this description we can synthesize some characteristics to a semantic
knowledge graph that can be used to implement such a Digital Thread.
C1 - Heterogenous Semantic Types: The connected data is very heteroge-
nous, representing domainspecific semantic types. A domain ontology can
contain thousands of types. For example, the BRICK ontology [5] contains
ca. 3.000 classes for modelling smart buildings datasets.
C2 - Multi-modal Representation: The data is multi-modal and represented
in different formats from timeseries, to binary files, and text documents.
C3 - Federated Data: The data is stored and managed in various systems such
as complex Continuous Engineering Systems, Asset Management Systems,
or IoT platforms.
C4 - Flexible Hierarchies: Data is often structured in hierarchical models
such as location hierarchies (Country >City >Factory >Production Line)
and asset hierarchies (Robot >Arm >Joint) that are of flexible depth.
C5 - Large size: We see graph sizes often in the range of 100.000 datasets
for a mid-size Digital Twin.
C6 - Composability: Digital Twins often contain other Digital Twins. For
example, a factory twin may contain a robot twin.
C7 - Lack of semantic knowledge: We often experience that domain experts
do not have deep semantic knowledge. Though, they often understand soft-
ware engineering concepts like classes and inheritance.
C8 - Dynamic: Digital Twins change over their lifetime and so does the
Digital Thread. In consequence, the knowledge graph does change regularly
bringing in the need to represent time, states and versioning.
The goals for building the Digital Thread are:
G1 - Data Linking: The first goal of the Digital Thread is to link data from
various life cycle stages and backend systems together to create an integrated
view of the data.
G2 - Data Contextualization: The second goal of building Digital Threads is
to contextualize the data and understand spatial and functional context to
summarize and explain the data.
摘要:

ScalingKnowledgeGraphsforAutomatingAIofDigitalTwinsJoernPloennigs1[0000000263208891],KonstantinosSemertzidis1[0000000270406706],FabioLorenzi1,andNandanaMihindukulasooriya1[0000000317074842]IBMResearchEuropeJoern.Ploennigs@ie.ibm.com,fkonstantinos.semertzidis1,fabio.lorenzi1,nandanag@ibm.comAbstract....

展开>> 收起<<
Scaling Knowledge Graphs for Automating AI of Digital Twins Joern Ploennigs10000000263208891.pdf

共17页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:17 页 大小:413.66KB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 17
客服
关注