In order to answer
RQ 1
, we select datasets which (1) correspond to impactful directions in
climate change research, and (2) have existing strong human-designed baselines. For example,
we choose datasets which were recently featured in large competitions, with top solutions now
open-source. We describe the details of each dataset in Section 3.
For each of the datasets we choose, we first find open-source high-performing human-designed
models. Then we run Optuna [
1
] or SMAC3 [
30
], two of the most widely-used AutoML libraries
today, using top human-designed models as the base. We compare the resulting searched models to
top human-designed models.
In order to answer
RQ 2
, we check for general weaknesses in AutoML techniques applied to
CCAI tasks, which can be overcome with future work. For example, we look at whether the AutoML
techniques are limited due to being implicitly tailored to CV tasks.
3 Experiments and Discussion
In this section, for three CCAI tasks, we give a brief description of the task, dataset, and our AutoML
experiments. Then, in Section 3.2, we use our experiments to answer RQ 1 and RQ 2.
3.1 Experimental Setup
Atmospheric Radiative Transfer.
Numerical weather prediction models, as well as global and
regional climate models, give crucial information to policymakers and the public about the impact of
changes in the Earth’s climate. The bottleneck is atmospheric radiative transfer (ART) calculations,
which are used to compute the heating rate of any given layer of the atmosphere. While ART has
historically been calculated using computationally intensive physics simulations, researchers have
recently used neural networks to substantially reduce the computational bottleneck, enabling ART
to be run at finer resolutions and obtaining better overall predictions.
We use the ClimART dataset [
6
] from the NeurIPS Datasets and Benchmarks Track 2021. It
consists of global snapshots of the atmosphere across a discretization of latitude, longitude, atmo-
spheric height, and time from 1979 to 2014. Each datapoint contains measurements of temperature,
water vapor, and aerosols. Prior work has tested MLPs, CNNs, GNNs, and GCNs as baselines [6].
We run HPO on the CNN baseline from Cachay et al. [
6
] using the Optuna library [
1
]. The CNN
model is chosen because it had the lowest RMSE and second-lowest latency out of all five baselines
from Cachay et al. We tune learning rate, weight decay, dropout, and batch size. We also run NAS
using SMAC3 [
30
]. We set a categorical hyperparameter to choose among MLP, CNN, GNN, GCN,
and L-GCN [
5
] while also tuning learning rate and batch size. See Appendix A.1 for more details of
the dataset and experiments.
Wind Power Forecasting.
Wind power is one of the leading renewable energy types, since it is
cheap, efficient, and harmless to the environment [
2
,
19
,
40
]. The only major downside in wind power
is its unreliablility: changes in wind speed and direction make the energy gained from wind power
inconsistent. In order to keep the balance of energy generation and consumption on the power grid,
other sources of energy must be added on short notice when wind power is down, which is not always
possible (for example, coal plants take at least 6 hours to start up) [
20
]. Therefore, forecasting wind
power is an important problem that must be solved to facilitate greater adoption of wind power.
We use the SDWPF (Spatial Dynamic Wind Power Forecasting) dataset, which was recently
featured in a KDD Cup 2022 competition that included 2490 participants [
49
]. This is by far the
3