Attention-Based Scattering Network for Satellite Imagery Jason Stock Charles Anderson

2025-05-02 0 0 3.11MB 11 页 10玖币
侵权投诉
Attention-Based Scattering Network
for Satellite Imagery
Jason Stock & Charles Anderson
Computer Science
Colorado State University
{stock,anderson}@colostate.edu
Abstract
Multi-channel satellite imagery, from stacked spectral bands or spatiotemporal data,
have meaningful representations for various atmospheric properties. Combining
these features in an effective manner to create a performant and trustworthy model
is of utmost importance to forecasters. Neural networks show promise, yet suffer
from unintuitive computations, fusion of high-level features, and may be limited
by the quantity of available data. In this work, we leverage the scattering transform
to extract high-level features without additional trainable parameters and introduce
a separation scheme to bring attention to independent input channels. Experiments
show promising results on estimating tropical cyclone intensity and predicting the
occurrence of lightning from satellite imagery.
1 Introduction
Machine learning has received great attention in the atmospheric science community over the past
couple of years. Many satellite-based applications leverage convolutional neural networks (CNN)s for
tasks including but not limited to: forecasting next-hour tornado occurrences [
1
], predicting intense
convection [
2
], and estimating topical cyclone intensity [
3
]. These applications create input samples
with stacked channel-wise features consisting of satellite imagery at different wavelengths and train a
network to recognize certain patterns. While the approach is undoubtedly effective, it is not clear
how these input channels are combined in the earlier layers of the network. On the other hand, a
trained forecaster may intuitively look at individual channels, or the differences between multiple
channels, to identify relevant features and patterns that can easily be explained. Furthermore, these
networks are often limited by the quantity of available labeled data, which can lead to a model that
underperforms with too few parameters or overfits as complexity increases. This further motivates
the need for an architecture that is both interpretable-by-design and generalizable to small datasets.
One effective technique to modeling sparsely labeled data is with the scattering transform as was
introduced by Mallat [
4
]. This uses a cascade of wavelet transforms with a predetermined filterbank
and a non-linear modulus, akin to the structure of CNNs. Not only has the scattering transform
shown promise for applications with relatively few training samples [
5
8
], but it also has many nice
properties for modeling satellite data. Specifically, the design builds strong geometric invariants (e.g.,
translations, rotations, and scaling) and is stable to the action of diffeomorphisms – a desirable trait
due to the continuous change in cloud structure over time. Studies have also shown the scattering
transform to promote sparse representation of data with a high degree of discriminability which can
ultimately simplify downstream tasks [5, 9].
To build an architecture that more closely aligns with the visual interpretations of satellite imagery
done by forecasters, we incorporate attention into the early layers preceding the scattering transform.
Attention mechanisms work to identify salient regions in complex scenes as inspired by aspects of
the human visual system [
10
]. Recent computer vision studies have shown attention to increase
Tackling Climate Change with Machine Learning: workshop at NeurIPS 2022.
arXiv:2210.12185v1 [cs.CV] 21 Oct 2022
Ac
k×1×1
×
UcAs
1×h×w
× ×
c×k×h×w
S2
`
k×h×w
˜
S2
×+
k×h×w
F
w1
1w1
c1:
c0
scattering
coefficients
separate channel attention spatial attention fusion
Figure 1:
Network architecture illustrating the separation of attention modules on the scattering transform. The
left most block represents the output of the scattering transform on the input. The separate operator isolates a
single channel, e.g.,
C0
, and passes the normalized scattering coefficients,
˜
S2
, through channel attention and
spatial attention before fusion. There are Ctotal attention modules in the network. Figure modified from [12].
performance and interpretability as well as improve confidence of post hoc explainability methods
[
11
,
12
]. Most similar to this work are the studies from [
13
,
8
]. In [
13
], residual layers mix the input
channels before applying attention and [
8
] applies a scattering attention module after each step in a
U-Net. However, our approach differs in that we introduce a separation scheme that applies attention
to individual input channels that directly follow the scattering transform.
2 Methodology
Figure 1 illustrates the primary components of our network, starting with our output of the scattering
transform and showing an attention module separated by input channel. The implementation and
design choice for each part is described in detail below.
Scattering Transform
Scattering representations yield invariant, stable (to noise and deformations),
and informative signal descriptors with cascading wavelet decomposition using a non-linear modulus
followed by spatial averaging. Using the Kymatio package [
14
], we compute a 2D transform with a
predetermined filter bank of Morlet wavelets at
J= 3
scales and
L= 6
orientations. For each input
channel, we apply a second-order transform to obtain the scattering coefficients
S2
. These channels
are processed independently and combined later in the network. Additional details on the scattering
transform can be found in Appendix A.1.
Channel Separation
Local attention methods routinely process their input using all the channel
information at once, e.g., feature maps from RGB color channels. However, the result of the scattering
transform yields a 5-dimensional tensor,
S2
, where each channel,
C
, in the input has their own set of
K
scattering coefficients. Rather than stacking the result and passing them all through the subsequent
layers together, we propose to first separate the input channels and process the coefficients individually.
This creates
C
new attention modules, each with independent weights, that are processed in parallel.
By following this separation scheme we add the benefit of localizing patterns in the input before
joining high-level features. Thus, the interpretation of attention over individual input channels is
improved significantly, especially if the channels have different meaning, e.g., temporal, visible,
infrared, derived products, etc.
Attention Modules
The attention modules encompass three primary components, namely: (i) chan-
nel attention, (ii) spatial attention and (iii) feature fusion. The channel attention features are used to
inform the spatial attention module before fusion via feature recalibration. Specifically, the network
learns to use the spatial information over the
K
channels to selectively highlight the more informative
coefficients from the less useful ones. Not only does this offer a performance improvement to
our network, but it also adds an additional layer of interpretability with channels corresponding
to particular coefficients. The spatial attention features highlight the salient features in the spatial
resolution of independent input channels. This differs from most computer vision problems with
RGB imagery that only have one heat map for the full image. As such, our network provides a more
transparent interpretation of how the spatial information in each input channel is used to form a
prediction. Implementation details of each component can be found in appendices A.2, A.3, and A.4.
2
Combining Features
The result of applying attention to the scattering coefficients of each input
channel yields
C
output filters,
F
, that are stacked to
UfRC×K×W×H
. Following could be
any task specific transformation, e.g., additional convolutions, upsampling, residual connections,
etc., but for our tasks we show how to design a regression and classification head to have relatively
few trainable parameters. Specifically, we reshape
Uf
to have
C·K
channels, which we reduce
to
16
via a pointwise convolution. This effectively combines the high-level features of each input
channel. The feature maps are flattened and input to a layer with
8
fully-connected units before a
single linear output. After the convolutional and fully-connected layers are a ReLU activation for
added non-linearity.
Table 1: Experimental results using ntraining samples and pparameters.
Scattering ResNet18 MobileNetV3 Conv.
np(51.8K) (11.2M) (1.5M) (268.2K)
TC Intensity, rmse (R2)
1000 15.83 (0.59) 16.47 (0.56) 56.85 (-4.28) 17.51 (0.50)
5000 12.01 (0.76) 14.30 (0.67) 55.18 (-3.97) 13.34 (0.71)
10000 10.98 (0.80) 11.85 (0.77) 21.13 (0.27) 13.81 (0.69)
30000 10.35 (0.83) 10.74 (0.81) 13.07 (0.72) 11.68 (0.78)
47904 9.34 (0.86) 10.66 (0.81) 11.90 (0.77) 11.67 (0.78)
Lightning Occurrence, acc. (F1)
1000 86.04 (0.85) 73.68 (0.74) 62.46 (0.39) 78.27 (0.74)
5000 88.01 (0.87) 87.59 (0.87) 68.82 (0.55) 82.35 (0.82)
10000 88.87 (0.88) 86.33 (0.85) 81.46 (0.83) 84.37 (0.84)
50000 89.58 (0.89) 89.20 (0.88) 87.49 (0.87) 87.99 (0.87)
212604 90.46 (0.90) 90.51 (0.90) 86.87 (0.88) 89.57 (0.89)
3 Experiments
We demonstrate the performance of our network on two separate datasets, namely estimating wind
speeds from tropical storms and predicting the occurrence of lightning over previous observations.
Note that the experiments serve as an outline that could extend to other tasks that leverage multi-
channel inputs.
For each experiment we compare results to a handcrafted CNN (named Conv) inspired by [
15
]: three
convolutional layers with
8,16,and 32
filters, each followed by ReLU and max pooling before a
fully-connected layer with
32
units and a linear output unit. Further inspiration is taken from (a subset
of) recent state-of-the-art vision models, namely ResNet18 [
16
] and MobileNetV3 (small) [
17
], to
better understand how larger and more complex networks compare with our proposed method.
3.1 Estimating Tropical Cyclone Intensity
Tropical cyclones are among the most devastating natural disasters, causing billions of dollars of
damage and significant loss of life every year. Predicting the track of these cyclones is well studied,
but there is still an imperative need to improve upon the forecast of intensity [
18
]. The NASA Tropical
Storm Wind Speed Competition [
19
] was released to study new automated and reliable methods of
forecasting intensity. The data are single-band infrared images (i.e., band-13 or
10.3µm
) captured
by the Geostationary Operational Environmental Satellite (GOES)-16 Advanced Baseline Imager
(ABI), with pixel values representing heat energy in the infrared spectrum, normalized to grayscale.
We leverage the temporal relationships of previous timesteps up to the point of prediction to estimate
the maximum sustained surface wind speed. Additional details can be found in Appendix B.1.
The state-of-the-art reaches a root-mean-squared error (RMSE) of
6.256 kn
with an ensemble of 51
models [
20
]. We omit a direct comparison as interpreting these models would be increasingly difficult
for end users. The proposed scattering network, with significantly fewer parameters, performs best
overall with a minimum RMSE of
9.342 kn
when using all available data for training. This is
12.35%
lower than the closest competitor, ResNet18, and
21.44%
and
19.92%
lower than MobileNetV3 and
Conv, respectively (Table 1). As such, the competing networks are more prone to overfit or lack
3
摘要:

Attention-BasedScatteringNetworkforSatelliteImageryJasonStock&CharlesAndersonComputerScienceColoradoStateUniversity{stock,anderson}@colostate.eduAbstractMulti-channelsatelliteimagery,fromstackedspectralbandsorspatiotemporaldata,havemeaningfulrepresentationsforvariousatmosphericproperties.Combiningth...

展开>> 收起<<
Attention-Based Scattering Network for Satellite Imagery Jason Stock Charles Anderson.pdf

共11页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:11 页 大小:3.11MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 11
客服
关注