Tele-Knowledge Pre-training for Fault Analysis Zhuo ChenyWen Zhangy Yufeng Huang Mingyang ChenYuxia Geng Hongtao Yu Zhen Bi Yichi Zhang Zhen Yao Zhejiang University Hangzhou China

2025-05-06 0 0 6.59MB 15 页 10玖币

侵权投诉

Tele-Knowledge Pre-training for Fault Analysis

Zhuo Chen†,Wen Zhang†, Yufeng Huang, Mingyang Chen,Yuxia Geng, Hongtao Yu, Zhen Bi, Yichi Zhang, Zhen Yao

Zhejiang University, Hangzhou, China

{zhuo.chen, zhang.wen, huangyufeng, mingyangchen, gengyx, yuhongtaoaaa, bizhen zju, 22221092, 22151303}@zju.edu.cn

Wenting Song, Xinliang Wu, Yi Yang, Mingyi Chen, Zhaoyang Lian, Yingying Li, Lei Cheng

NAIE PDU, Huawei Technologies Co., Ltd., Xi’an, China

{songwenting, wuxinliang1, yangyi193, chenmingyi2, lianzhaoyang, liyingying66, chenglei}@huawei.com

Huajun Chen‡

Zhejiang University

huajunsir@zju.edu.cn

Abstract—In this work, we share our experience on tele-

knowledge pre-training for fault analysis, a crucial task in

telecommunication applications that requires a wide range of

knowledge normally found in both machine log data and product

documents. To organize this knowledge from experts uniformly,

we propose to create a Tele-KG (tele-knowledge graph). Using

this valuable data, we further propose a tele-domain language

pre-training model TeleBERT and its knowledge-enhanced ver-

sion, a tele-knowledge re-training model KTeleBERT. which

includes effective prompt hints, adaptive numerical data en-

coding, and two knowledge injection paradigms. Concretely,

our proposal includes two stages: ﬁrst, pre-training TeleBERT

on 20 million tele-related corpora, and then re-training it

on 1 million causal and machine-related corpora to obtain

KTeleBERT. Our evaluation on multiple tasks related to fault

analysis in tele-applications, including root-cause analysis, event

association prediction, and fault chain tracing, shows that pre-

training a language model with tele-domain data is beneﬁcial

for downstream tasks. Moreover, the KTeleBERT re-training

further improves the performance of task models, highlighting

the effectiveness of incorporating diverse tele-knowledge into the

model.

Index Terms—telecommunication, model pre-training, knowl-

edge graph, numeric encoding, fault analysis

I. INTRODUCTION

Faults in telecommunication networks (tele-network) can

have a major impact on the availability and effectiveness of

the global network, resulting in signiﬁcant maintenance costs

for operating companies. Thus, quick elimination of the faults

and preventing the causes of fault generation are crucial for

the special interest of operating companies. Fault analysis is

a complex task composed of multiple sub-tasks, requiring a

wealth of tele-knowledge such as the network architecture

and the dependence among tele-products. Historically, this

knowledge was stored in the minds of experts. While in now-

days, massive product data and expert experience in tele-ﬁeld

are accumulated in various forms. For example, as the valuable

ﬁrst-hand data, the machine (log) data (e.g., abnormal event

like the alarm or normal indicator like the KPI score) is raised

continuously in both real tele-scenario and laboratory environ-

ments. Additionally, the product documents are created for

tele-products in the network, containing detailed information

†Equal Contribution.

‡Corresponding Author.

<latexit sha1_base64="JSBergsoSUjBGzs/ZH8sZFpFwTg=">AAAC93icjVLLSsNAFD2Nr1pftS7dBIvgqqTx2Z1UBMFNlVYFFUniVEPzYjIRS+mv6E679XP8AvUvvDNOQRdVJyS5c+45Z+bOHTcJ/FRY1mvOGBufmJzKTxdmZufmF4qLpZM0zrjHWl4cxPzMdVIW+BFrCV8E7CzhzAndgJ26nT2ZP71jPPXjqCm6CbsMnZvIb/ueIwi6KpYuQkfcuu3eYZMFrL5/3OxfFctWZcte37RrplWx1FDBes22zapGytCjERffcIFrxPCQIQRDBEFxAAcpPeeowkJC2CV6hHGKfJVn6KNA2oxYjBgOoR363tDsXKMRzaVnqtQerRLQy0lpYpU0MfE4xXI1U+Uz5SzRUd495Sn31qW/q71CQgVuCf1LN2T+VydrEWhjR9XgU02JQmR1nnbJ1KnInZvfqhLkkBAm42vKc4o9pRyes6k0qapdnq2j8u+KKVE59zQ3w8ev1bnkSh2hCzDssjk6OLEr1a3KxtFGebeur0Iey1jBGvV7G7s4QAMt8r/HA54xMLrGo/FkDL6oRk5rlvBjGC+flVWd0g==</latexit>

KTeleBERT

<latexit sha1_base64="rRQG22d9HM/cyrd/HhmRCfSsZ64=">AAAC9nicjVLLTsMwEBzCu7wKHLlEVEicqjS8bxVw4AgSLUhQVYnrthZJHCUOUFV8CtyAK7/DFwB/wdqkEhwKOEqynp0Ze73240CkynFeR6zRsfGJyanpwszs3PxCcXGpnsosYbzGZCCTc99LeSAiXlNCBfw8TrgX+gE/868OdP7smiepkNGp6sW8EXqdSLQF8xRBzeLiZeiprt/uH0qWhTxSd81iySlvuxtb7p7tlB0zTLCx57p2JUdKyMexLL7hEi1IMGQIwRFBURzAQ0rPBSpwEBPWQJ+whCJh8hx3KJA2IxYnhkfoFX07NLvI0Yjm2jM1akarBPQmpLSxRhpJvIRivZpt8plx1ugw777x1Hvr0d/PvUJCFbqE/qUbMP+r07UotLFrahBUU2wQXR3LXTJzKnrn9reqFDnEhOm4RfmEYmaUg3O2jSY1teuz9Uz+3TA1qucs52b4+LU6n1ypI3QBBl22hwd1t1zZLm+ebJaq+/lVmMIKVrFO/d5BFUc4Ro38b3CPJzxbt9aD9Wg9f1GtkVyzjB/DevkEAx2eBQ==</latexit>

Document

<latexit sha1_base64="Tp67NAuzY9N09N/mIS5h7GCg6tI=">AAADB3icjVLLLgRBFD3a+z1Y2nRMJDYm3UywFBYkNiQGiRHpbjV09Cvd1UIms+dn2GHrM3wB/sKp0hOveFSnu889955TdavKTQI/k5b12GF0dnX39Pb1DwwODY+MlsbGd7I4Tz1R8+IgTvdcJxOBH4ma9GUg9pJUOKEbiF33dFXld89EmvlxtC0vEnEQOseR3/A9R5I6LE3VQ0eeuI3mtghEqy7FuWQw22rTG2utw1LZqizYc/OLS6ZVsfR4B3YByijGZlx6Qh1HiOEhRwiBCJI4gIOMzz5sWEjIHaBJLiXydV6ghQFqc1YJVjhkT/k9ZrRfsBFj5ZlptcdZAr4plSamqYlZlxKr2Uydz7WzYn/ybmpPtbYL/t3CKyQrcUL2L1278r861YtEA0u6B589JZpR3XmFS653Ra3c/NCVpENCTuEj5lNiTyvb+2xqTaZ7V3vr6PyzrlSsir2iNsfLr925dOWJ8ALYX4/7O9iZq9gLlepWtby8UlyFPkxiCjM870UsYx2bqNH/Ete4w71xZdwYt8b9W6nRUWgm8GkYD6/OwqVQ</latexit>

Tele-KG

<latexit sha1_base64="4mXHGKvKYM2ZYB/TyHkC3sQiQLY=">AAAC+nicjVLLTtwwFD2kDyilbSjqik3EqBKrUSZQHjvUdsGmEkgdGGlAyDGemYi8lDhIowD/QneFbX+GL4D+BccmI7ULaB0luT73nGNfX4d5HJXa92+mnGfPX7ycnnk1+3ruzdt37vz7vTKrCqm6MouzoheKUsVRqro60rHq5YUSSRir/fDki8nvn6qijLL0ux7n6jARwzQaRFJoQkfuh4NE6FE4qL8JOaLHxVehxfmR2/Lba8HKp2DT89u+HTZY2QwCr9MgLTRjJ3NvcYBjZJCokEAhhWYcQ6Dk00cHPnJih6iJFYwim1c4xyy1FVmKDEH0hN8hZ/0GTTk3nqVVS64S8y2o9PCRmoy8grFZzbP5yjob9DHv2nqavY35DxuvhKjGiOi/dBPm/+pMLRoDbNgaItaUW8RUJxuXyp6K2bn3R1WaDjkxEx8zXzCWVjk5Z89qSlu7OVth83eWaVAzlw23wu8nqwvpyo7wAky67D0e7AXtzlp7dXe1tfW5uQozWMQSltnvdWxhGzvo0r/GJa5w7Zw5P5yfzvUD1ZlqNAv4azi/7gFL9Z+b</latexit>

Machine Data

<latexit sha1_base64="nSixJOf4gb0VKRy1yW1i6TwEG/8=">AAADEnicjVLLTuMwFD1kGN4DZViyiaiQ2FClKc9Z8diwBEQBiSLkBBesJnEUO2iqqvwE/AzsgC0/wBcAf8G1SSVYwIyjJNfnnnPs6+sgjYTSnvfU5/zo/zkwODQ8Mjr2a3yiNPl7X8k8C3k9lJHMDgOmeCQSXtdCR/wwzTiLg4gfBK1Nkz+44JkSMtnT7ZQfx+wsEU0RMk3QSWl+j6nWn8tGEHd2pdTdhuZ/ddDszHcNtMlyxS/d9YRFbSVU96RU9ipLfm3RX3W9imeHDWqrvu9WC6SMYmzL0jMaOIVEiBwxOBJoiiMwKHqOUIWHlLBjdAjLKBI2z9HFCGlzYnFiMEJb9D2j2VGBJjQ3nsqqQ1olojcjpYtZ0kjiZRSb1Vybz62zQb/y7lhPs7c2/YPCKyZU45zQf+l6zP/VmVo0mlixNQiqKbWIqS4sXHJ7Kmbn7oeqNDmkhJn4lPIZxaFV9s7ZtRplazdny2z+xTINauZhwc3x+m11AblSR+gC9Lrsfh3s+5XqUmVhZ6G8tlFchSFMYwZz1O9lrGEL26iT/xVucI8H59q5de6ch3eq01dopvBpOI9vMxuqLg==</latexit>

Task :Root-Cause Analysis

<latexit sha1_base64="o546KQYvmnqHSGQ8dtThaiU+Yic=">AAADDHicjVLLThsxFD0ZCqUppYEuuxkRVe0qmgxpeKxoERLLICUEKYmiGcehVuYl2xMJReQP4GfaXWHbf+gXAH/BtTuR2gVtPZqZ43PvOfb1dZhFQmnP+1lylp4trzxffVF+ufZq/XVlY/NUpblkvMPSKJVnYaB4JBLe0UJH/CyTPIjDiHfDyaGJd6dcKpEmbX2R8UEcnCdiLFigiRpW3rcDNdmf98N4djTliZ5/UiplwkbnLclHghl4OaxUvVrT3/7o77lezbPDgu0933frBVNFMVpp5Q59jJCCIUcMjgSacIQAip4e6vCQETfAjDhJSNg4xyXKpM0pi1NGQOyEvuc06xVsQnPjqaya0SoRvZKULt6RJqU8Sdis5tp4bp0N+5T3zHqavV3QPyy8YmI1vhD7L90i8391phaNMXZtDYJqyixjqmOFS25Pxezc/a0qTQ4ZcQaPKC4JM6tcnLNrNcrWbs42sPF7m2lYM2dFbo6Hv1YXkit1hC7Aosvu0+DUr9WbtcZJo3rwubgKq3iLLXygfu/gAMdooUP+V/iKG9w6184357tz+yvVKRWaN/hjOD8eAaQnp9Y=</latexit>

Task :Event Association Prediction

<latexit sha1_base64="uFIs1Un/7LnqMcJvvOO0qffuzmQ=">AAAC8XicjVLLTsMwEJyGd3kVOHKJqJA4RWko5XFCcOEIEi2VSkFJ6paoecl2kFAF3wE34MoX8QXAX7A2qQSHAo6SrGdnxl6vvTQMhLTt14IxNj4xOTU9U5ydm19YLC0tN0SScZ/V/SRMeNNzBQuDmNVlIEPWTDlzIy9kZ17/UOXPrhkXQRKfypuUtSO3FwfdwHclQRenrujv3Z170cCyrNvLUtm2as7mlrNr2pathw42dx3HrORIGfk4TkpvOEcHCXxkiMAQQ1IcwoWgp4UKbKSEtTEgjFMU6DzDLYqkzYjFiOES2qdvj2atHI1prjyFVvu0SkgvJ6WJddIkxOMUq9VMnc+0s0JHeQ+0p9rbDf293CsiVOKK0L90Q+Z/daoWiS52dA0B1ZRqRFXn5y6ZPhW1c/NbVZIcUsJU3KE8p9jXyuE5m1ojdO3qbF2df9dMhaq5n3MzfPxanUeu1BG6AMMum6ODhmNValb1pFreP8ivwjRWsYYN6vc29nGEY9TJn+MeT3g2hPFgPBrPX1SjkGtW8GMYL5/E8pup</latexit>

Task :...

<latexit sha1_base64="qgVA4W84fhzK/FREPNeafJpPwEM=">AAADA3icjVLLSsQwFD3W93vUpSDFQXA1dOrblSiIS4UZFVQkjXEM0xdpKsjgrPRndKdu/RC/QP0Lb2IHdOEjpe3Nueec5OYmSEOZac976XK6e3r7+gcGh4ZHRsfGSxOT+1mSKy7qPAkTdRiwTIQyFnUtdSgOUyVYFITiIGhumfzBpVCZTOKavkrFScQasTyXnGmCTkszNZY119vHQdTaZnmo21sXTMbtmmJcxo3r01LZqyz7C0v+mutVPDtssLDm+261QMooxm5SesUxzpCAI0cEgRia4hAMGT1HqMJDStgJWoQpiqTNC1xjiLQ5sQQxGKFN+jZodlSgMc2NZ2bVnFYJ6VWkdDFHmoR4imKzmmvzuXU26E/eLetp9nZF/6DwigjVuCD0L12H+V+dqUXjHKu2Bkk1pRYx1fHCJbenYnbufqlKk0NKmInPKK8o5lbZOWfXajJbuzlbZvNvlmlQM+cFN8f7r9UF5EodoQvQ6bL7c7DvV6rLlcW9xfLGZnEVBjCNWcxTv1ewgR3sok7+N7jDI56cW+feeXCePqlOV6GZwrfhPH8ABsyjkw==</latexit>

Task :F ault Chain T racing

Fig. 1: Workﬂow for our KTeleBERT.

such as the product proﬁle, event description, fault case, and

solutions to particular issues, primarily in natural language.

Nevertheless, some knowledge such as the the types of faults

and their hierarchy, is still not uniformly recorded. Considering

the diversity of such knowledge, knowledge graph (KG) is a

common choice to represent them, which represents facts as

triples, such as (China, capitalIs, Beijing). In recent years,

KGs have been widely adopted in industry [1]–[3] due to

their ﬂexibility and convenience to easily combine data from

multiple sources. To uniformly represent the recorded tele-

knowledge, we built a tele-product knowledge graph (a.k.a.

Tele-KG). For example, given the triple ([Alm] ALM-100072

The NF destination service is unreachable, trigger, [KPI]

1929480378 The number of initial registration requests in-

creases abnormally), it represents that the Network Function

(NF) destination service being unreachable (alarm 100072)

always results in the number of initial registration requests

increasing (KPI abnormal event 1929480378). We note that

the majority of knowledge in Tele-KG is derived from experts

and engineers, providing an integrated view of tele-knowledge

and accumulated experience.

Although the Tele-KG can be used as a knowledge base

to retrieve knowledge using SPARQL queries [4] for simple

fault analysis support, this solution is still inﬂexible and have

limitations in generalization capabilities to those indirectly

associated tasks. Another way to utilize Tele-KG is through

knowledge graph embedding (KGE) methods [5]–[8], which

aims to learn embeddings of entities and relations in a con-

arXiv:2210.11298v2 [cs.AI] 17 Feb 2023

tinuous vector space and then assist the knowledge inference

like the task of link prediction or triple classiﬁcation in a KG.

However, those technologies always suffer from the knowledge

inconsistency, i.e., the same entity or noun in the real world

may have different surfaces like the “Alm” v.s. “Alarm”.

Besides, the textual knowledge and semantic information in

entity surfaces are always abandoned during training, limiting

models’ intra-domain scalability and cross-domain portability.

The textual product documents are valuable resources in

tele-domain. Instead of simply using them as handbooks, one

approach is to pre-train a domain-speciﬁc language model

(LM). LM pre-training [9]–[12] is a good recipe for learning

implicit semantic knowledge with self-supervised text recon-

struction as the training objective in a vast amount of language

data. However, their challenges lie in exploiting the structured

knowledge for explicit intellectual reasoning. Additionally, our

machine data is semi-structured and multi-directional: with a

vertical direction of the time and a horizontal direction of

multiple indicators extending the machine data at a single

moment, as shown in Fig. 2(a). This differs from the typically

log-based anomaly detection methods [13]–[15] which target

at the unidirectional and serial log data.

In this work, we propose to pre-train all data that contains

tele-knowledge, including machine data, Tele-Corpus from

the product documents, and triples from the Tele-KG. We

expect that this pre-trained model can aid in downstream fault

analysis tasks in a convenient and effective manner, and boost

their performance, especially for tasks with limited data (also

known as low-resource tasks).

To achieve this, we ﬁrst address the issue from multi-

source and multi-modal data (e.g., multi-directional machine

data, textual documents, and semi-structured KG), which can

distract the model from efﬁcient learning. To remedy this,

we refer to the prompt engineering techniques [16]–[18] for

modality uniﬁcation and provide relevant template hints to

the model for modalities uniﬁcation.

Secondly, we address the challenge of handling numerical

data, which is an essential component of data in tele-domain

and frequently appears in machine data (e.g., KPI scores).

This data format is similar to the tabular data, sharing the

characteristic of: (i) The text part is short; (ii) The Numerical

values always have different meanings and ranges under dif-

ferent circumstances; (iii) Data stretches from both vertically

and horizontally which is hierarchical. However, existing table

pre-training methods mainly study the hierarchical structure

of tabular data [19]–[24] where the numerical information is

rarely studied in depth. Furthermore, those methods that target

at learning numerical features [13]–[15] focus on learning ﬁeld

embedding for each numerical ﬁeld. They tend to consider the

task with limited ﬁelds (e.g., the user attributes like height and

weight) but fail when migrated to our tele-scenario where the

ﬁeld number (e.g., KPI name) is numerous (≥1000) and new

ﬁelds are often generated during the development of enterprise.

Thus, we propose an adaptive numeric encoder (ANEnc) in

tele-domain for type-aware numeric encoding.

Thirdly, we are aware of different training target among

the tele-corpus, machine data and the knowledge triples.

Thus we adopt a multi-stage training mode for multi-level

knowledge acquisition: (i) TeleBERT: in stage one we follow

ELECTRA [25] pre-training paradigm and data augmentation

method SimCSE [26] for large-scale (about 20 million) textual

tele-corpus pre-training; (ii) KTeleBERT: In stage two, we

extract those causal sentences which contain relevant causal

keywords to re-train TeleBERT together with the numeric-

related machine data, where a knowledge embedding training

objective and multi-task learning method are introduced for

explicit knowledge integration.

With our pre-trained model, we apply the model-generated

service vectors to enhance three tasks of fault analysis: root-

cause analysis (RCA), event association prediction (EAP),

and fault chain tracing (FCT). The experimental results show

that our TeleBERT and KTeleBERT successfully improve the

performance of these three tasks.

In summary, the contributions of this work are as follows:

•We emphasize the importance of encoding knowledge

uniformly in tele-domain application, and share our en-

coding experience in real-world scenarios.

•We propose a tele-domain pre-training model TeleBERT

and its knowledge-enhanced version KTeleBERT to fuse

and encode diverse tele-knowledge in different forms.

•We prove that our proposed models could serve multiple

fault analysis task models and boost their performance.

II. BACKGROUND

A. Corpus in Telecommunication

1) Machine Log Data: The machine (log) data, such as

abnormal events or normal indicator logs, is continuously

generated in both real-world tele-environments and simulation

scenes. Typically, as shown in 2(a), these abnormal events

like the service interruption, have varying levels of importance

and are always accompanied by anomalies in relevant network

elements (NEs). The normal indicators like the numerical

KPI data, on the other hand, are cyclical and persistent in

nature and make up the majority of automatically generated

machine data. Most abnormal events can self-recover after

existing a period of time, (e.g., network congestion), and there

may be correlation or causal relationships across abnormal

events or indicators, e.g, the alarm “(NF destination service

is unreachable)”, always lead to abnormal KPI score “(the

number of initial registration requests increases abnormally)”.

2) Product Document: Those domain engineers or experts

are constantly recording and updating the product docu-

mentation. Particularly, each scenario may contain one or

more product documents, which are maintained by different

departments and may include nearly all relevant information in

the ﬁeld, such as the fault cases, solutions for already occurred

or potential cases, and the event descriptions shown in 2(b).

3) Tele-product Knowledge Graph (Tele-KG): We construct

the Tele-KG to integrate massive information about events

and resources on our platform. Our goal is intuitive: hoping

that such a ﬁne-grained Tele-KG could reﬁne and purify the

knowledge of tele-domain, as a semi-structured knowledge

Alarm Metric incident Metric incident Abnormal event

……

User Experience Tele service Logical network Interface

……

Event Resource

Hierarchy

Instance

Interaction

Top Tele

Concept

ALM-81011 SIG Knowledge base upgrade failed

Explanation:1) Alarm trigger mechanism:The system will generate this alarm

when the SIG knowledge base upgrade fails.Then the system will continue to

run according to the previously successfully loaded version of the knowledge

base, so that recognition ability before the upgrade will not be affected;

2) Alarm recovery mechanism:...

Attribute:Alarm_ID:81011;Alarm_Level:Importance;Automatically cleared:Yes

Parameter:POD name:...;NE name:…;Event type: …

Impact on the system:1) Protocols and adaptation relations defined in the new

knowledge base are not available. 2) …

Possible reason:1) The knowledge base digital signature file does not exist;

Failure due to internal processing error; 2) …

<latexit sha1_base64="MTCpVzcNpdNJ8gnf7Vi8dpFtatk=">AAADBnicjVLBTttAFJyYlqYuhUCPXNxGSOESOYi2HBHtgQNIqdQAUoii9bIEC8e27DUCRXAtP0NvJdf+Bl8A/YvObo3UNmphLduz897M7tu3QRqFufb9m4oz9eTp9LPqc/fFzMvZudr8wk6eFJlUHZlESbYXiFxFYaw6OtSR2kszJYZBpHaD4w8mvnuisjxM4s/6LFW9oRjE4WEohSbVr712G2L5Yl+rUz3aFvKINl5jKxksex+FFs1z1+3X6n7Tt8ObBK0S1FGOdlK7xT4OkECiwBAKMTRxBIGcTxct+EjJ9TAilxGFNq5wDpfaglmKGYLsMb8DzrolG3NuPHOrllwl4ptR6WGJmoR5GbFZzbPxwjob9l/eI+tp9nbGf1B6DclqHJF9SHef+VidqUXjEGu2hpA1pZYx1cnSpbCnYnbu/VaVpkNKzuADxjNiaZX35+xZTW5rN2crbPzOZhrWzGWZW+DHf6sL6MqO8AK0/m73JNhZabbeNd9+Wq2vb5RXoYpFvEGD/X6PdWyijQ79v+AK1xg7l85X55sz/pXqVErNK/wxnO8/ARQ8oe0=</latexit>

(a) Machine (Log) Data.

<latexit sha1_base64="3W8ZLZM7s36vQINzR/DjGylBewM=">AAADBXicjVLJTsMwFJyGrYStwJFLRIUElypFbEcEHDgWibZIgFDiGohI4shxEFUFR/gZuEGvfAdfAPwFzyaVWMTiKMl43puxn5/9JAxS5bpPBauvf2BwqDhsj4yOjU+UJqcaqcgk43UmQiH3fC/lYRDzugpUyPcSyb3ID3nTP9vU8eY5l2kg4l3VTvhh5J3EwXHAPEXUUcmx5/2FqwPFL1SnJkUrY8rZEiyLeKzSyqVtH5XKbsU1w/kOqjkoIx81UXrGAVoQYMgQgSOGIhzCQ0rPPqpwkRB3iA5xklBg4hyXsEmbURanDI/YM/qe0Gw/Z2Oaa8/UqBmtEtIrSelgjjSC8iRhvZpj4plx1uxP3h3jqffWpr+fe0XEKpwS+5eul/lfna5F4RhrpoaAakoMo6tjuUtmTkXv3PlQlSKHhDiNWxSXhJlR9s7ZMZrU1K7P1jPxF5OpWT1neW6G11+r88mVOkIXoPq13d9BY7FSXaks7yyV1zfyq1DEDGYxT/1exTq2UUOd/K9xiwd0rRvrzrq3uu+pViHXTOPTsB7fAPuBorU=</latexit>

(b)ProductDocuments.

Abnormal event

The numerical KPI data

<latexit sha1_base64="c1ejjl9CXyM/0kOxFGf3FMZvyqo=">AAADGnicjVLLSsQwFD3Wd32NunRTHIRx4dARX0vRhYIbBUcFFWkzcSyTaUqa+mDQH9Gf0Z26deMXqH/hTazgAx8pbU/OveckNzdhIqJU+/5jm9Pe0dnV3dPr9vUPDA4Vhke2UpkpxqtMCql2wiDlIop5VUda8J1E8aAZCr4dNpZNfPuYqzSS8aY+S/h+M6jH0WHEAk3UQWHWLbHJiz3NT3Vrkws+lShZy5j21mJ5Initzr0VFSRHXslG11Ymy+eue1Ao+mXfDu87qOSgiHysy8IT9lCDBEOGJjhiaMICAVJ6dlGBj4S4fbSIU4QiG+c4h0vajLI4ZQTENuhbp9luzsY0N56pVTNaRdCrSOlhgjSS8hRhs5pn45l1NuxP3i3rafZ2Rv8w92oSq3FE7F+698z/6kwtGodYsDVEVFNiGVMdy10yeypm596HqjQ5JMQZXKO4Isys8v2cPatJbe3mbAMbf7aZhjVzludmePm1upBcqSN0ASpf2/0dbE2XK3Pl2Y2Z4uJSfhV6MIZxlKjf81jEKtZRJf9LXOMO986Vc+PcOvdvqU5brhnFp+E8vALJUaoZ</latexit>

Fig. 2: Corpus overview where all the Chinese corpus are translated into English to improve the comprehensibility.

graph is more ﬂexible and has higher knowledge density

than traditional structured databases or unstructured product

documents. Speciﬁcally, we deﬁne a hierarchical tele-schema

as the guidance for KG construction, as shown in Fig. 2(c),

where the top-down modeling method is adopted for schema

design. The concept classes across different levels are inherited

via “subclassOf ”, and those classes within the same levels are

connected via common relations like “provide”.

We note that top superclasses “Event” and “Resource” are

deﬁned as the root in tele-domain, with other top tele-concept

as the subdivisions. The instantiation of the tele-schema at

instance level contains interactions among different instances

and forms the majority of the Tele-KG, including those triple

cases mentioned before.

B. Task of Fault Analysis

1) Root-Cause Analysis: In modern telecommunication

systems, the identiﬁcation of the root causes of abnormal

events is essential for reducing ﬁnancial losses and maintaining

system stability. However, traditional methods of root-cause

analysis rely heavily on manual work by experts, using sum-

marized documents meanwhile incurring signiﬁcant ﬁnancial

and human resources. As the size and complexity of these

systems continue to grow, manual analysis becomes increas-

ingly difﬁcult. Therefore, developing an automated method for

root-cause analysis is a pressing need in tele-domain.

2) Event Association Prediction: One approach for ﬁnding

the root cause of a fault event is to utilize prior trigger re-

lationships between different fault events. These relationships

can reveal patterns of fault causation, such as a triple (Alarm

A, triggers, Alarm B) indicating that the Alarm B is caused

by the Alarm A. By traversing these trigger relations, the root

cause of a current fault event can be determined. However,

traditionally, these trigger relationships have been identiﬁed

by tele-experts through manual analysis of a large number

of fault cases, which is time-consuming and is limited by

personal bias. This is also difﬁculty in updating or adapting to

new network changes. Thus proposing effective methods for

automatically predicting the trigger relationship in candidate

event pairs is important.

3) Fault Chain Tracing: Network equipment failure is a

common phenomenon in tele-domain due to high operating

pressure of the network. In these failure scenarios, alarms are

often raised, which can have a cascading effect and cause

damage to the entire system. Tracing the source of these

failures is crucial for maintaining the stability of the tele-

network. Traditionally, this task is accomplished by experts

with their experience, sharing the limitations with the above

two tasks. Therefore, developing an automated method for

fault chain tracing is quite valuable and necessary.

III. PRE-TRAINING ON TELE-COMMUNICATION CORPORA

In this section we introduce our TeleBERT, a tele-domain

speciﬁc PLM pre-trained on large-scale textual Tele-Corpus.

A. Telecommunication Corpus Integration

The large-scale textual telecommunication corpora consists

of sentences from various sources, including product docu-

ments and entity surfaces within the Tele-KG. To expand

the dataset and increase the diversity of the training data,

we apply two data augmentation techniques from the NLP

community: (i) Explicit data augmentation: we splice together

a range of adjacent sentences from the same document to

expand the dataset and create a ﬁnal pre-training corpus of

20 million sentences (a.k.a. Tele-Corpus). (ii) Implicit data

augmentation: following SimCSE [26], we introduce noise

into the dataset through a dropout strategy to enhance the

robustness of our model.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Tele-KnowledgePre-trainingforFaultAnalysisZhuoCheny,WenZhangy,YufengHuang,MingyangChen,YuxiaGeng,HongtaoYu,ZhenBi,YichiZhang,ZhenYaoZhejiangUniversity,Hangzhou,Chinafzhuo.chen,zhang.wen,huangyufeng,mingyangchen,gengyx,yuhongtaoaaa,bizhenzju,22221092,22151303g@zju.edu.cnWentingSong,XinliangWu,YiYang,...

展开>> 收起<<

Tele-Knowledge Pre-training for Fault Analysis Zhuo ChenyWen Zhangy Yufeng Huang Mingyang ChenYuxia Geng Hongtao Yu Zhen Bi Yichi Zhang Zhen Yao Zhejiang University Hangzhou China.pdf

共15页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Tele-Knowledge Pre-training for Fault Analysis Zhuo ChenyWen Zhangy Yufeng Huang Mingyang ChenYuxia Geng Hongtao Yu Zhen Bi Yichi Zhang Zhen Yao Zhejiang University Hangzhou China

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: