Data-driven Approaches to Surrogate Machine Learning Model Development_2

2025-04-27 0 0 2.87MB 17 页 10玖币

侵权投诉

This work has been submitted to the IEEE for possi-

ble publication. Copyright may be transferred without

notice, after which this version may no longer be acces-

sible.

arXiv:2210.02631v3 [cs.LG] 3 Nov 2022

Data-driven Approaches to Surrogate Machine Learning Model Development

H. Rhys Jones, Tingting Mu, Andrei C. Popescu, Yusuf Sulehman

University of Manchester, Oxford Rd, Manchester M13 9PL, United Kingdom

Abstract

We demonstrate the adaption of three established methods to the ﬁeld of surrogate machine learning model devel-

opment. These methods are data augmentation, custom loss functions and transfer learning. Each of these methods

have seen widespread use in the ﬁeld of machine learning, however, here we apply them speciﬁcally to surrogate

machine learning model development. The machine learning model that forms the basis behind this work was in-

tended to surrogate a traditional engineering model used in the UK nuclear industry. Previous performance of this

model has been hampered by poor performance due to limited training data. Here, we demonstrate that through a

combination of additional techniques, model performance can be signiﬁcantly improved. We show that each of the

aforementioned techniques have utility in their own right and in combination with one another. However, we see them

best applied as part of a transfer learning operation. Five pre-trained surrogate models produced prior to this research

were further trained with an augmented dataset and with our custom loss function. Through the combination of all

three techniques, we see an improvement of at least 38% in performance across the ﬁve models.

Keywords:

Nuclear, Machine Learning, Graphite, Advanced Gas-cooled Reactor, Data Science, Data Analysis, Surrogate

Model, Convolutional Neural Network, Regression, Supervised Learning, Data Augmentation, Transfer Learning,

Loss Function,

1. Introduction

A machine learning surrogate (MLS) is a model

which aims to explain natural or mathematical phe-

nomena which can already be explained using an ex-

isting model. Using data from the original model, ma-

chine learning techniques are used to produce an op-

timised MLS model. The advantages of an MLS in-

clude increased computational eﬃciency when gener-

ating model outputs, with the trade-oﬀbeing reduced

accuracy. Once developed and trained, machine learn-

ing models (including an MLS) can produce new data

instances almost instantly using a standard computer,

whereas generating the same information using the orig-

inal model and equivalent hardware may require hours

or days of computational eﬀort. The reduction in ac-

curacy between an MLS and an original model must

be quantiﬁed on a case-by-case basis and assessed on

whether it is acceptable for practical use.

Previous research works have dealt with the produc-

tion of MLS in areas such as material properties pre-

Email address: huw.jones@manchester.ac.uk ()

diction [Nyshadham et al. (2019)] and [Asteris et al.

(2021)], with a recent work [Jones et al. (2022)] focus-

ing on seismic analysis for nuclear graphite cores. It

is the MLS model from this latest research work that

will be focused on in this paper. In the aﬀoremen-

tioned works, a strong focus on neural networks [Gur-

ney (1997)] is seen, including convolutional neural net-

works (CNNs).

Despite the motivation for the production of MLS

models being to reduce the need for expensive produc-

tion of data, a large amount of this data is required to

train such a model. A machine learning model trained

on an insuﬃcient number of data instances may re-

sult in overﬁtting [Hawkins (2004)]. Some techniques

were employed in the aforementioned paper, including

randomised layer dropout [Srivastava et al. (2014)], to

counteract the eﬀects of overﬁtting .

A common technique used to improve model perfor-

mance given a limited dataset is to manipulate existing

data instances in a process known as data augmentation

[Perez and Wang (2017)]. This approach is commonly

employed in machine learning applications involving

Preprint submitted to arXiv November 4, 2022

Figure 1: The Trade-oﬀBetween a Machine Learning Surrogate Model and the Original Model. Once trained on data from the original model, the

production of new data is likely to be signiﬁcantly more eﬃcient in terms of computation and time. However, as the machine learning model is

produced using data from the original model, there will be some inevitable reduction in accuracy.

image recognition and analysis [Hansen (2015)], with

techniques such as mirroring and rotation used to in-

crease the number of data instances in a dataset.

Another commonly encountered problem during ma-

chine learning model development is dataset bias. In

this situation, the dataset used to train the model is

weighted towards a particular region of the input and/or

output space. Alternatively, the dataset may be sparse

in a particular region of the data space i.e. there may

only be few data examples for a part of the data input

or output continuum. Several methods can be employed

to counteract the problem of dataset bias, including em-

phasising underrepresented data samples to a greater de-

gree. We may instead use a loss function during model

training which is designed to correct for dataset bias.

A third problem encountered when training neural

networks is the computational cost associated with their

development and optimisation. This is particularly

problematic when the problem space is complex - such

as it is in this research. Instead of starting from scratch,

we may use models produced from previous research

works as a starting point during the development of neu-

ral networks for our own research. Through a process

of transfer learning [Torrey and Shavlik (2010)] we can

adapt the model architecture, as well as the optimised

weights, generated during previous works. By using

transfer learning we may be able to make our model de-

velopment process more eﬃcient by reducing the time

and computational resource needed to optimise a model

for our purposes.

A research question to be investigated and answered

by this paper is whether data augmentation can be ap-

plied to problems such as machine learning surrogates.

To this end, a framework will be developed to apply im-

age manipulation techniques to the dataset used in the

aforementioned graphite core model. In addition, we

will investigate whether the use of custom loss functions

and transfer learning can improve model performance.

2. Background

2.1. Advanced Gas-cooled Reactors and the Parmec

Model

The computational model Parmec [Koziara (2019)] is

the underlying model which was the subject of a MLS

model in [Jones et al. (2022)] and the same problem

and base dataset is considered in this work. Parmec

is employed to simulate the seismic response of the

graphite core within the UK’s advanced gas-cooled re-

actor (AGR). This model consists of a simpliﬁed 3-

dimensional representation of the AGR graphite core,

including the positional arrangement of the graphite

bricks and other components. Parmec can be used to

simulate a range of diﬀerent seismic scenarios with the

resulting component translation, rotation etc. being cal-

culated by the model.

In addition to the seismic conﬁguration, another in-

put to the Parmec model is conﬁguration of cracked fuel

bricks within the graphite core. Due to years of expo-

sure to high temperatures and irradiation, some of the

fuel bricks within the reactor are cracking, causing them

to break into two pieces. The presence and conﬁgura-

tion of these cracks has an impact on the reaction of the

core to seismic loading. It is possible that up to 40% of

the fuel bricks will eventually crack, although it is diﬃ-

cult to determine or predict where and when cracks will

occur.

The relationship between crack conﬁguration and

seismic response of core components is complex, hence

the Parmec model consists of many thousands of pa-

rameters and equations. In addition, there are over

102500 possible permutations of crack conﬁguration, as-

suming 40% cracking. With each conﬁguration requir-

ing around 2 hours to compute the seismic response

via Parmec, it is clearly impractical to generate data

for even a small percentage of them. Instead, industry

practice is to generate random conﬁgurations of cracks,

passing each through Parmec in order to build up a

stochastic distribution of the seismic response.

2.2. Previous Machine Learning Surrogate Model of

Parmec

In previous machine learning assessments of AGR

graphite core seismic analysis [Jones et al. (2022)], each

crack conﬁguration is considered an individual data in-

stance, with the encoding of cracked bricks being the

input features and the response of core components to

the earthquake being the output labels. The Parmec

software generates a time-history of the earthquake re-

sponse for all of the thousands of components within

the core. For the sake of simplicity and focus, the MLS

model was trained to predict the earthquake response

for a single interstitial brick at a single time frame - see

Figure 2.

To summarise the features of the MLS, each instance

has an input size of 1988 with this being the number

of fuel bricks within the AGR graphite. This input was

arranged into a 3D tensor which retains physical posi-

tional relationships within the actual AGR graphite core

(Figure 3). Each element is either a 1, -1 or 0 repre-

senting a cracked brick, uncracked brick or ‘empty’ po-

sition. The 3-dimensional encoding of the input fea-

tures also allows the dataset to be used with a convolu-

tional neural network [Albawi et al. (2017)] which was

found to be the best performing type of machine learn-

ing model.

For the aforementioned study, a dataset of approx-

imately 8300 instances was created using the random

crack pattern generator and the Parmec software. Out

of these instances, 6300 (75%) were used for training

with the remaining 2000 samples retained for testing.

Figure 2: A Top Down Diagram of the AGR Graphite Core Parmec

Model. Bricks are arranged into channels of two diﬀerent types: fuel

(blue) and interstitial (grey). Both types of channel are the same

height, with fuel bricks being stacked seven high and the shorter in-

terstitial bricks being stacked 12 high. The cracking status of all

1988 fuel bricks is included in the input features (whether the brick

is cracked or not) of the surrogate machine learning model. For the

output labels, only the earthquake response of the upper most inter-

stitial brick (orange) is predicted by the surrogate machine learning

model.

Figure 3: Visualisation of a 3-dimensional Feature Encoding. This

example represents a single instance with each data-point representing

a fuel brick. Yellow and black data points represent uncracked and

cracked bricks, respectively.

2.3. Data Augmentation

Data augmentation is frequently employed in classi-

ﬁcation problems within the ﬁeld of machine learning

[Shorten and Khoshgoftaar (2019)], where the model

predicts a discrete category for each dataset instance.

A classic example of classiﬁcation is in computer vi-

sion, where a 2D or 3D tensor representing an image

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ThisworkhasbeensubmittedtotheIEEEforpossi-blepublication.Copyrightmaybetransferredwithoutnotice,afterwhichthisversionmaynolongerbeacces-sible.1Data-drivenApproachestoSurrogateMachineLearningModelDevelopmentH.RhysJones,TingtingMu,AndreiC.Popescu,YusufSulehmanUniversityofManchester,OxfordRd,Manchester...

展开>> 收起<<

Data-driven Approaches to Surrogate Machine Learning Model Development_2.pdf

共17页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Data-driven Approaches to Surrogate Machine Learning Model Development_2

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: