Data-driven Approaches to Surrogate Machine Learning Model Development_2

2025-04-27 0 0 2.87MB 17 页 10玖币
侵权投诉
This work has been submitted to the IEEE for possi-
ble publication. Copyright may be transferred without
notice, after which this version may no longer be acces-
sible.
1
arXiv:2210.02631v3 [cs.LG] 3 Nov 2022
Data-driven Approaches to Surrogate Machine Learning Model Development
H. Rhys Jones, Tingting Mu, Andrei C. Popescu, Yusuf Sulehman
University of Manchester, Oxford Rd, Manchester M13 9PL, United Kingdom
Abstract
We demonstrate the adaption of three established methods to the field of surrogate machine learning model devel-
opment. These methods are data augmentation, custom loss functions and transfer learning. Each of these methods
have seen widespread use in the field of machine learning, however, here we apply them specifically to surrogate
machine learning model development. The machine learning model that forms the basis behind this work was in-
tended to surrogate a traditional engineering model used in the UK nuclear industry. Previous performance of this
model has been hampered by poor performance due to limited training data. Here, we demonstrate that through a
combination of additional techniques, model performance can be significantly improved. We show that each of the
aforementioned techniques have utility in their own right and in combination with one another. However, we see them
best applied as part of a transfer learning operation. Five pre-trained surrogate models produced prior to this research
were further trained with an augmented dataset and with our custom loss function. Through the combination of all
three techniques, we see an improvement of at least 38% in performance across the five models.
Keywords:
Nuclear, Machine Learning, Graphite, Advanced Gas-cooled Reactor, Data Science, Data Analysis, Surrogate
Model, Convolutional Neural Network, Regression, Supervised Learning, Data Augmentation, Transfer Learning,
Loss Function,
1. Introduction
A machine learning surrogate (MLS) is a model
which aims to explain natural or mathematical phe-
nomena which can already be explained using an ex-
isting model. Using data from the original model, ma-
chine learning techniques are used to produce an op-
timised MLS model. The advantages of an MLS in-
clude increased computational eciency when gener-
ating model outputs, with the trade-obeing reduced
accuracy. Once developed and trained, machine learn-
ing models (including an MLS) can produce new data
instances almost instantly using a standard computer,
whereas generating the same information using the orig-
inal model and equivalent hardware may require hours
or days of computational eort. The reduction in ac-
curacy between an MLS and an original model must
be quantified on a case-by-case basis and assessed on
whether it is acceptable for practical use.
Previous research works have dealt with the produc-
tion of MLS in areas such as material properties pre-
Email address: huw.jones@manchester.ac.uk ()
diction [Nyshadham et al. (2019)] and [Asteris et al.
(2021)], with a recent work [Jones et al. (2022)] focus-
ing on seismic analysis for nuclear graphite cores. It
is the MLS model from this latest research work that
will be focused on in this paper. In the aoremen-
tioned works, a strong focus on neural networks [Gur-
ney (1997)] is seen, including convolutional neural net-
works (CNNs).
Despite the motivation for the production of MLS
models being to reduce the need for expensive produc-
tion of data, a large amount of this data is required to
train such a model. A machine learning model trained
on an insucient number of data instances may re-
sult in overfitting [Hawkins (2004)]. Some techniques
were employed in the aforementioned paper, including
randomised layer dropout [Srivastava et al. (2014)], to
counteract the eects of overfitting .
A common technique used to improve model perfor-
mance given a limited dataset is to manipulate existing
data instances in a process known as data augmentation
[Perez and Wang (2017)]. This approach is commonly
employed in machine learning applications involving
Preprint submitted to arXiv November 4, 2022
Figure 1: The Trade-oBetween a Machine Learning Surrogate Model and the Original Model. Once trained on data from the original model, the
production of new data is likely to be significantly more ecient in terms of computation and time. However, as the machine learning model is
produced using data from the original model, there will be some inevitable reduction in accuracy.
image recognition and analysis [Hansen (2015)], with
techniques such as mirroring and rotation used to in-
crease the number of data instances in a dataset.
Another commonly encountered problem during ma-
chine learning model development is dataset bias. In
this situation, the dataset used to train the model is
weighted towards a particular region of the input and/or
output space. Alternatively, the dataset may be sparse
in a particular region of the data space i.e. there may
only be few data examples for a part of the data input
or output continuum. Several methods can be employed
to counteract the problem of dataset bias, including em-
phasising underrepresented data samples to a greater de-
gree. We may instead use a loss function during model
training which is designed to correct for dataset bias.
A third problem encountered when training neural
networks is the computational cost associated with their
development and optimisation. This is particularly
problematic when the problem space is complex - such
as it is in this research. Instead of starting from scratch,
we may use models produced from previous research
works as a starting point during the development of neu-
ral networks for our own research. Through a process
of transfer learning [Torrey and Shavlik (2010)] we can
adapt the model architecture, as well as the optimised
weights, generated during previous works. By using
transfer learning we may be able to make our model de-
velopment process more ecient by reducing the time
and computational resource needed to optimise a model
for our purposes.
A research question to be investigated and answered
by this paper is whether data augmentation can be ap-
plied to problems such as machine learning surrogates.
To this end, a framework will be developed to apply im-
age manipulation techniques to the dataset used in the
aforementioned graphite core model. In addition, we
will investigate whether the use of custom loss functions
and transfer learning can improve model performance.
2. Background
2.1. Advanced Gas-cooled Reactors and the Parmec
Model
The computational model Parmec [Koziara (2019)] is
the underlying model which was the subject of a MLS
model in [Jones et al. (2022)] and the same problem
and base dataset is considered in this work. Parmec
is employed to simulate the seismic response of the
graphite core within the UK’s advanced gas-cooled re-
actor (AGR). This model consists of a simplified 3-
dimensional representation of the AGR graphite core,
including the positional arrangement of the graphite
bricks and other components. Parmec can be used to
simulate a range of dierent seismic scenarios with the
resulting component translation, rotation etc. being cal-
culated by the model.
3
In addition to the seismic configuration, another in-
put to the Parmec model is configuration of cracked fuel
bricks within the graphite core. Due to years of expo-
sure to high temperatures and irradiation, some of the
fuel bricks within the reactor are cracking, causing them
to break into two pieces. The presence and configura-
tion of these cracks has an impact on the reaction of the
core to seismic loading. It is possible that up to 40% of
the fuel bricks will eventually crack, although it is di-
cult to determine or predict where and when cracks will
occur.
The relationship between crack configuration and
seismic response of core components is complex, hence
the Parmec model consists of many thousands of pa-
rameters and equations. In addition, there are over
102500 possible permutations of crack configuration, as-
suming 40% cracking. With each configuration requir-
ing around 2 hours to compute the seismic response
via Parmec, it is clearly impractical to generate data
for even a small percentage of them. Instead, industry
practice is to generate random configurations of cracks,
passing each through Parmec in order to build up a
stochastic distribution of the seismic response.
2.2. Previous Machine Learning Surrogate Model of
Parmec
In previous machine learning assessments of AGR
graphite core seismic analysis [Jones et al. (2022)], each
crack configuration is considered an individual data in-
stance, with the encoding of cracked bricks being the
input features and the response of core components to
the earthquake being the output labels. The Parmec
software generates a time-history of the earthquake re-
sponse for all of the thousands of components within
the core. For the sake of simplicity and focus, the MLS
model was trained to predict the earthquake response
for a single interstitial brick at a single time frame - see
Figure 2.
To summarise the features of the MLS, each instance
has an input size of 1988 with this being the number
of fuel bricks within the AGR graphite. This input was
arranged into a 3D tensor which retains physical posi-
tional relationships within the actual AGR graphite core
(Figure 3). Each element is either a 1, -1 or 0 repre-
senting a cracked brick, uncracked brick or ‘empty’ po-
sition. The 3-dimensional encoding of the input fea-
tures also allows the dataset to be used with a convolu-
tional neural network [Albawi et al. (2017)] which was
found to be the best performing type of machine learn-
ing model.
For the aforementioned study, a dataset of approx-
imately 8300 instances was created using the random
crack pattern generator and the Parmec software. Out
of these instances, 6300 (75%) were used for training
with the remaining 2000 samples retained for testing.
Figure 2: A Top Down Diagram of the AGR Graphite Core Parmec
Model. Bricks are arranged into channels of two dierent types: fuel
(blue) and interstitial (grey). Both types of channel are the same
height, with fuel bricks being stacked seven high and the shorter in-
terstitial bricks being stacked 12 high. The cracking status of all
1988 fuel bricks is included in the input features (whether the brick
is cracked or not) of the surrogate machine learning model. For the
output labels, only the earthquake response of the upper most inter-
stitial brick (orange) is predicted by the surrogate machine learning
model.
Figure 3: Visualisation of a 3-dimensional Feature Encoding. This
example represents a single instance with each data-point representing
a fuel brick. Yellow and black data points represent uncracked and
cracked bricks, respectively.
2.3. Data Augmentation
Data augmentation is frequently employed in classi-
fication problems within the field of machine learning
[Shorten and Khoshgoftaar (2019)], where the model
predicts a discrete category for each dataset instance.
A classic example of classification is in computer vi-
sion, where a 2D or 3D tensor representing an image
4
摘要:

ThisworkhasbeensubmittedtotheIEEEforpossi-blepublication.Copyrightmaybetransferredwithoutnotice,afterwhichthisversionmaynolongerbeacces-sible.1Data-drivenApproachestoSurrogateMachineLearningModelDevelopmentH.RhysJones,TingtingMu,AndreiC.Popescu,YusufSulehmanUniversityofManchester,OxfordRd,Manchester...

展开>> 收起<<
Data-driven Approaches to Surrogate Machine Learning Model Development_2.pdf

共17页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:17 页 大小:2.87MB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 17
客服
关注