Dont Waste Data Transfer Learning to Leverage All Data for Machine-Learnt Climate Model Emulation Raghul Parthipan12

2025-05-01 0 0 309.04KB 9 页 10玖币
侵权投诉
Don’t Waste Data: Transfer Learning to Leverage All
Data for Machine-Learnt Climate Model Emulation
Raghul Parthipan1,2
rp542@cam.ac.uk
Damon J. Wischik1
damon.wischik@cl.cam.ac.uk
1Department of Computer Science and Technology, University of Cambridge, UK
2British Antarctic Survey, Cambridge, UK
Abstract
How can we learn from all available data when training machine-learnt climate
models, without incurring any extra cost at simulation time? Typically, the
training data comprises coarse-grained high-resolution data. But only keeping
this coarse-grained data means the rest of the high-resolution data is thrown
out. We use a transfer learning approach, which can be applied to a range
of machine learning models, to leverage all the high-resolution data. We use
three chaotic systems to show it stabilises training, gives improved general-
isation performance and results in better forecasting skill. Our code is at
https://github.com/raghul-parthipan/dont_waste_data.
1 Introduction
Accurate weather and climate models are key to climate science and decision-making. Often we have
a high-resolution physics-based model which we trust, and want to use that to create a lower-cost
(lower-resolution) emulator of similar accuracy. There has been much work using machine learning
(ML) to learn such models from data [
1
,
2
,
3
,
4
,
5
,
6
,
7
,
8
,
9
,
10
,
11
,
12
,
13
,
14
], due to the difficulty
in manually specifying them.
The naive approach is to use coarse-grained high-resolution model data as training data. The high-
resolution data is averaged onto the lower-resolution grid and treated as source data. The goal is
to match the evolution of the coarse-grained high-resolution model using the lower-resolution one.
Such procedures have been used successfully [
2
,
3
,
8
,
12
,
15
,
16
]. This has several benefits over
using observations, including excellent spatio-temporal coverage. But it has a key downside — the
averaging procedure means much high-resolution data is thrown away.
Our novelty is showing that climate model emulation can be framed as a transfer learning task.
We can do better by using all of the high-resolution data as an auxiliary task to help learn the low-
resolution emulator. And we can do this without any further cost at simulation time. As far as we
know, this has not yet been reported in the climate literature. This results in improved generalization
performance and forecasting ability, and we demonstrate this on three chaotic dynamical systems.
Related Work.
Transfer learning (TL) has been successfully used for fine-tuning models, including
sequence models, in various domains such as natural language processing (NLP) and image classi-
fication. There are various methods used such as (1) fine-tuning on an auxiliary task and then the
target task [17, 18, 19]; (2) multi-task learning, where fine-tuning is done on the target task and one
or more auxiliary tasks simultaneously [
20
,
21
,
22
,
23
,
24
,
25
,
26
]; and mixtures of the two. Our
Tackling Climate Change with Machine Learning workshop at NeurIPS 2022.
arXiv:2210.04001v2 [cs.LG] 30 Oct 2022
approach is most similar to the first one. However, our models are not pre-trained as is standard in
NLP. Despite this, we show our approach remains successful.
Climate Impact.
A major source of inaccuracies in weather and climate models arises from
‘unresolved’ processes (such as those relating to convection and clouds) [
27
,
28
,
29
,
30
,
31
,
32
].
These occur at scales smaller than the resolution of the climate model but have key effects on the
overall climate. For example, most of the variability in how much global surface temperatures
increase after
CO2
concentrations double is due to the representation of clouds [
27
,
29
,
33
]. There
will always be processes too costly to be explicitly resolved by our current operational models.
The standard approach to deal with these unresolved processes is to model their effects as a function
of the resolved ones. This is known as ‘parameterization’ and there is much ML work on this
[
1
,
2
,
3
,
4
,
5
,
6
,
7
,
8
,
9
,
10
,
11
,
12
,
13
,
14
]. We propose that by using all available high-resolution
data, better ML parameterization schemes and therefore better climate models can be created.
2 Methods
Our approach is a two-step process: first, we train our model on the high-resolution data, and second,
we fine-tune it on the low-resolution (target) data.
We denote the low-resolution data at time
t
as
XtRd
. The goal is to create a sequence model for
the evolution of
Xt
through time, whilst only tracking
Xt
. We denote the high-resolution data at time
t
as
YtRdm
. In parameterization,
Xt
is often a temporal and/or spatial averaging of
Yt
. We wish
to use Ytto learn a better model of Xt.
A range of ML models for sequences may be used, but we suggest they should contain both shared
and task-specific layers.
We first model
Yt
, training in the standard teacher-forcing way for ML sequence models. We use the
framework of probability, and so train by maximising the log-likelihood of
Yt
,
log Pr(y1,y2, ..., yn)
.
Informally, the likelihood measures how likely
Yt
is to be generated by our sequence model. Next,
the weights of the shared layers are frozen and the weights of the target-specific layers are trained
to model the low-resolution training data,
Xt
. Again, under the probability framework, this means
maximising the log-likelihood of Xt,log Pr(x1,x2, ..., xn).
2.1 RNN Model
We use the recurrent neural network (RNN) to demonstrate our approach (though it is not limited to the
RNN). RNNs are well-suited to parameterization tasks [
4
,
5
,
11
,
14
,
34
] as they only track a summary
representation of the system history, reducing simulation cost. This is unlike the Transformer [
35
]
which requires a slice of the actual history of Xt.
For our RNN, the hidden state is shared and its evolution is described by
ht+1 =fθ(ht,Xt)
where
htRHand fθis a GRU cell [36]. We model the low-resolution data as
Xt+1 =Xt+gθ(ht+1) + σzt(1)
and the high-resolution as
Yt+1 =Yt+jθ(ht+1) + ρwt(2)
where the functions
gθ
and
jθ
are represented by task-specific dense layers,
zt N (0, I)
and
wt N (0, I)
. The learnable parameters are the neural network weights
θ
and the noise terms
σR1and ρR1. Further details are in Appendix A.
2.2 Evaluation
We use hold-out log-likelihood to assess generalization to unseen data, a standard probabilistic
approach in ML. The models were trained with 15 different random seed initializations to ensure
the differences in the results were due to our approach as opposed to a quirk of a particular random
seed. This is used to generate 95% confidence intervals. Likelihood is not easily interpretable nor the
end-goal of operational climate models. Ultimately we want to use weather and climate models to
make forecasts, and it is common to measure forecast skill with error and spread [
6
,
37
] so this is
also done for evaluation.
2
摘要:

Don'tWasteData:TransferLearningtoLeverageAllDataforMachine-LearntClimateModelEmulationRaghulParthipan1;2rp542@cam.ac.ukDamonJ.Wischik1damon.wischik@cl.cam.ac.uk1DepartmentofComputerScienceandTechnology,UniversityofCambridge,UK2BritishAntarcticSurvey,Cambridge,UKAbstractHowcanwelearnfromallavailabled...

展开>> 收起<<
Dont Waste Data Transfer Learning to Leverage All Data for Machine-Learnt Climate Model Emulation Raghul Parthipan12.pdf

共9页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:9 页 大小:309.04KB 格式:PDF 时间:2025-05-01

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 9
客服
关注