Attention-Based Scattering Network for Satellite Imagery Jason Stock Charles Anderson

2025-05-02 0 0 3.11MB 11 页 10玖币

侵权投诉

Attention-Based Scattering Network

for Satellite Imagery

Jason Stock & Charles Anderson

Computer Science

Colorado State University

{stock,anderson}@colostate.edu

Abstract

Multi-channel satellite imagery, from stacked spectral bands or spatiotemporal data,

have meaningful representations for various atmospheric properties. Combining

these features in an effective manner to create a performant and trustworthy model

is of utmost importance to forecasters. Neural networks show promise, yet suffer

from unintuitive computations, fusion of high-level features, and may be limited

by the quantity of available data. In this work, we leverage the scattering transform

to extract high-level features without additional trainable parameters and introduce

a separation scheme to bring attention to independent input channels. Experiments

show promising results on estimating tropical cyclone intensity and predicting the

occurrence of lightning from satellite imagery.

1 Introduction

Machine learning has received great attention in the atmospheric science community over the past

couple of years. Many satellite-based applications leverage convolutional neural networks (CNN)s for

tasks including but not limited to: forecasting next-hour tornado occurrences [

], predicting intense

convection [

], and estimating topical cyclone intensity [

]. These applications create input samples

with stacked channel-wise features consisting of satellite imagery at different wavelengths and train a

network to recognize certain patterns. While the approach is undoubtedly effective, it is not clear

how these input channels are combined in the earlier layers of the network. On the other hand, a

trained forecaster may intuitively look at individual channels, or the differences between multiple

channels, to identify relevant features and patterns that can easily be explained. Furthermore, these

networks are often limited by the quantity of available labeled data, which can lead to a model that

underperforms with too few parameters or overﬁts as complexity increases. This further motivates

the need for an architecture that is both interpretable-by-design and generalizable to small datasets.

One effective technique to modeling sparsely labeled data is with the scattering transform as was

introduced by Mallat [

]. This uses a cascade of wavelet transforms with a predetermined ﬁlterbank

and a non-linear modulus, akin to the structure of CNNs. Not only has the scattering transform

shown promise for applications with relatively few training samples [

–

], but it also has many nice

properties for modeling satellite data. Speciﬁcally, the design builds strong geometric invariants (e.g.,

translations, rotations, and scaling) and is stable to the action of diffeomorphisms – a desirable trait

due to the continuous change in cloud structure over time. Studies have also shown the scattering

transform to promote sparse representation of data with a high degree of discriminability which can

ultimately simplify downstream tasks [5, 9].

To build an architecture that more closely aligns with the visual interpretations of satellite imagery

done by forecasters, we incorporate attention into the early layers preceding the scattering transform.

Attention mechanisms work to identify salient regions in complex scenes as inspired by aspects of

the human visual system [

]. Recent computer vision studies have shown attention to increase

Tackling Climate Change with Machine Learning: workshop at NeurIPS 2022.

arXiv:2210.12185v1 [cs.CV] 21 Oct 2022

k×1×1

UcAs

1×h×w

× ×

c×k×h×w

k×h×w

×+

k×h×w

1−w1

c1:

scattering

coefﬁcients

separate channel attention spatial attention fusion

Figure 1:

Network architecture illustrating the separation of attention modules on the scattering transform. The

left most block represents the output of the scattering transform on the input. The separate operator isolates a

single channel, e.g.,

, and passes the normalized scattering coefﬁcients,

, through channel attention and

spatial attention before fusion. There are Ctotal attention modules in the network. Figure modiﬁed from [12].

performance and interpretability as well as improve conﬁdence of post hoc explainability methods

[

]. Most similar to this work are the studies from [

]. In [

], residual layers mix the input

channels before applying attention and [

] applies a scattering attention module after each step in a

U-Net. However, our approach differs in that we introduce a separation scheme that applies attention

to individual input channels that directly follow the scattering transform.

2 Methodology

Figure 1 illustrates the primary components of our network, starting with our output of the scattering

transform and showing an attention module separated by input channel. The implementation and

design choice for each part is described in detail below.

Scattering Transform

Scattering representations yield invariant, stable (to noise and deformations),

and informative signal descriptors with cascading wavelet decomposition using a non-linear modulus

followed by spatial averaging. Using the Kymatio package [

], we compute a 2D transform with a

predetermined ﬁlter bank of Morlet wavelets at

J= 3

scales and

L= 6

orientations. For each input

channel, we apply a second-order transform to obtain the scattering coefﬁcients

. These channels

are processed independently and combined later in the network. Additional details on the scattering

transform can be found in Appendix A.1.

Channel Separation

Local attention methods routinely process their input using all the channel

information at once, e.g., feature maps from RGB color channels. However, the result of the scattering

transform yields a 5-dimensional tensor,

, where each channel,

, in the input has their own set of

scattering coefﬁcients. Rather than stacking the result and passing them all through the subsequent

layers together, we propose to ﬁrst separate the input channels and process the coefﬁcients individually.

This creates

new attention modules, each with independent weights, that are processed in parallel.

By following this separation scheme we add the beneﬁt of localizing patterns in the input before

joining high-level features. Thus, the interpretation of attention over individual input channels is

improved signiﬁcantly, especially if the channels have different meaning, e.g., temporal, visible,

infrared, derived products, etc.

Attention Modules

The attention modules encompass three primary components, namely: (i) chan-

nel attention, (ii) spatial attention and (iii) feature fusion. The channel attention features are used to

inform the spatial attention module before fusion via feature recalibration. Speciﬁcally, the network

learns to use the spatial information over the

channels to selectively highlight the more informative

coefﬁcients from the less useful ones. Not only does this offer a performance improvement to

our network, but it also adds an additional layer of interpretability with channels corresponding

to particular coefﬁcients. The spatial attention features highlight the salient features in the spatial

resolution of independent input channels. This differs from most computer vision problems with

RGB imagery that only have one heat map for the full image. As such, our network provides a more

transparent interpretation of how the spatial information in each input channel is used to form a

prediction. Implementation details of each component can be found in appendices A.2, A.3, and A.4.

Combining Features

The result of applying attention to the scattering coefﬁcients of each input

channel yields

output ﬁlters,

, that are stacked to

Uf∈RC×K×W×H

. Following could be

any task speciﬁc transformation, e.g., additional convolutions, upsampling, residual connections,

etc., but for our tasks we show how to design a regression and classiﬁcation head to have relatively

few trainable parameters. Speciﬁcally, we reshape

to have

C·K

channels, which we reduce

via a pointwise convolution. This effectively combines the high-level features of each input

channel. The feature maps are ﬂattened and input to a layer with

fully-connected units before a

single linear output. After the convolutional and fully-connected layers are a ReLU activation for

added non-linearity.

Table 1: Experimental results using ntraining samples and pparameters.

Scattering ResNet18 MobileNetV3 Conv.

n↓p→(51.8K) (11.2M) (1.5M) (268.2K)

TC Intensity, rmse (R2)

1000 15.83 (0.59) 16.47 (0.56) 56.85 (-4.28) 17.51 (0.50)

5000 12.01 (0.76) 14.30 (0.67) 55.18 (-3.97) 13.34 (0.71)

10000 10.98 (0.80) 11.85 (0.77) 21.13 (0.27) 13.81 (0.69)

30000 10.35 (0.83) 10.74 (0.81) 13.07 (0.72) 11.68 (0.78)

47904 9.34 (0.86) 10.66 (0.81) 11.90 (0.77) 11.67 (0.78)

Lightning Occurrence, acc. (F1)

1000 86.04 (0.85) 73.68 (0.74) 62.46 (0.39) 78.27 (0.74)

5000 88.01 (0.87) 87.59 (0.87) 68.82 (0.55) 82.35 (0.82)

10000 88.87 (0.88) 86.33 (0.85) 81.46 (0.83) 84.37 (0.84)

50000 89.58 (0.89) 89.20 (0.88) 87.49 (0.87) 87.99 (0.87)

212604 90.46 (0.90) 90.51 (0.90) 86.87 (0.88) 89.57 (0.89)

3 Experiments

We demonstrate the performance of our network on two separate datasets, namely estimating wind

speeds from tropical storms and predicting the occurrence of lightning over previous observations.

Note that the experiments serve as an outline that could extend to other tasks that leverage multi-

channel inputs.

For each experiment we compare results to a handcrafted CNN (named Conv) inspired by [

]: three

convolutional layers with

8,16,and 32

ﬁlters, each followed by ReLU and max pooling before a

fully-connected layer with

units and a linear output unit. Further inspiration is taken from (a subset

of) recent state-of-the-art vision models, namely ResNet18 [

] and MobileNetV3 (small) [

], to

better understand how larger and more complex networks compare with our proposed method.

3.1 Estimating Tropical Cyclone Intensity

Tropical cyclones are among the most devastating natural disasters, causing billions of dollars of

damage and signiﬁcant loss of life every year. Predicting the track of these cyclones is well studied,

but there is still an imperative need to improve upon the forecast of intensity [

]. The NASA Tropical

Storm Wind Speed Competition [

] was released to study new automated and reliable methods of

forecasting intensity. The data are single-band infrared images (i.e., band-13 or

10.3µm

) captured

by the Geostationary Operational Environmental Satellite (GOES)-16 Advanced Baseline Imager

(ABI), with pixel values representing heat energy in the infrared spectrum, normalized to grayscale.

We leverage the temporal relationships of previous timesteps up to the point of prediction to estimate

the maximum sustained surface wind speed. Additional details can be found in Appendix B.1.

The state-of-the-art reaches a root-mean-squared error (RMSE) of

6.256 kn

with an ensemble of 51

models [

]. We omit a direct comparison as interpreting these models would be increasingly difﬁcult

for end users. The proposed scattering network, with signiﬁcantly fewer parameters, performs best

overall with a minimum RMSE of

9.342 kn

when using all available data for training. This is

12.35%

lower than the closest competitor, ResNet18, and

21.44%

and

19.92%

lower than MobileNetV3 and

Conv, respectively (Table 1). As such, the competing networks are more prone to overﬁt or lack

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Attention-BasedScatteringNetworkforSatelliteImageryJasonStock&CharlesAndersonComputerScienceColoradoStateUniversity{stock,anderson}@colostate.eduAbstractMulti-channelsatelliteimagery,fromstackedspectralbandsorspatiotemporaldata,havemeaningfulrepresentationsforvariousatmosphericproperties.Combiningth...

展开>> 收起<<

Attention-Based Scattering Network for Satellite Imagery Jason Stock Charles Anderson.pdf

共11页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Attention-Based Scattering Network for Satellite Imagery Jason Stock Charles Anderson

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: