figure out how the spatial heterogeneity impacts the models
while there is no "one-fits-all" rule for it. It is highly imperative
yet challenging to have some techniques that can automatically
learn from the data.
2) Difficulty in obtaining predictive
models for unseen locations without training data.
Due to
the spatial heterogeneity, the local models in different locations
can be very different in order to capture the relationships
between predictors and the target variable. When training data
is not provided in some locations, the method must have the
capacity to generalize to these unseen locations. This is as
difficult as zero-shot learning.
In order to address the above challenges, we propose a
generic framework for deep spatial domain generalization,
which generates the predictive models for any unseen spatial
domains. More specifically, to address the first challenge, we
propose a novel spatial interpolation graph neural network
(SIGNN) to learn the spatial embedding of each location and
the relationships between them in the training set and infer the
spatial embedding of unseen locations during the test phase.
The spatial embedding of the target location is then used to
decode the parameterized model directly without training data
on the target location. This solves the second challenge. Our
contribution includes
•We propose a framework for spatial domain gen-
eralization.
The framework doesn’t assume the data
distribution and learns the spatial embeddings for all the
locations in the training set in an end-to-end manner. It is
also compatible with general predictive task models such
as regression models and multi-layer perceptrons (MLP).
•We develop the spatial interpolation graph neural
network.
It handles spatial data as a graph and uses the
edge representation to learn the spatial embedding on each
node and their relationships by doing graph convolution
operations. It also interpolates the spatial embedding at
any location so our method can generalize to unseen
locations.
•We conduct extensive experiments.
We validated the
efficacy of our method on ten real-world datasets for clas-
sification and regression tasks. Our method outperforms
state-of-the-art models on most of the tasks.
II. RELATED WORK
In this section, we summarize the works in the field of
domain adaptation and domain generalization. Machine learning
systems often assume that training and test data follow the
same distribution, which, however, usually cannot be satisfied
in practice. Domain Adaptation (DA) aims to build the bridge
between source and target domains by characterizing the
transformation between the data from these domains [2],
[5], [6]. Domain Adaptation (DA) has received great attention
from researchers in the past decade [2], [5], [6]. Under the
big umbrella of DA, continuous domain adaptation considers
the problem of adapting to target domains where the domain
index is a continuous variable (temporal DA is a special case
when the domain index is 1D). Approaches to tackling such
problems can be broadly classified into three categories: (1)
biasing the training loss towards future data via transportation
of past data [7], (2) using time-sensitive network parameters
and explicitly controlling their evolution along time [8], (3)
learning representations that are time-invariant using adversarial
methods [9]. The first category augments the training data,
the second category reparameterizes the model, and the third
category redesigns the training objective. However, data may
not be available for the target domain, or it may not be possible
to adapt the base model, thus requiring Domain Generalization.
A diversity of DG methods have been proposed in recent
years. According to [10], existing DG methods can be cat-
egorized into the following three groups, namely: (1) Data
manipulation: This category of methods focuses on manipulat-
ing the inputs to assist in learning general representations. There
are two kinds of popular techniques along this line: a). Data
augmentation [11], which is mainly based on augmentation,
randomization, and transformation of input data; b). Data
generation [12], which generates diverse samples to help
generalization. (2) Representation learning: This category of
methods is the most popular in domain generalization. There are
two representative techniques: a). Domain-invariant representa-
tion learning [5], which performs kernel, adversarial training,
explicitly features alignment between domains, or invariant
risk minimization to learn domain-invariant representations;
b). Feature disentanglement [13], which tries to disentangle
the features into domain-shared or domain-specific parts for
better generalization. (3) Learning strategy: This category of
methods focuses on exploiting the general learning strategy to
promote the generalization capability.
III. METHODOLOGY
In this section, we first provide the problem formulation and
the challenges of the problem, then we introduce our proposed
framework and how it solves the challenges.
A. Problem formulation
In this paper, we denote a geo-location by its 2D coordinate
values
s∈R2
, and each
s
is associated with a spatial domain
(Xs× Ys)
, where we could have a set of samples
(xs,ys) =
{(xi, yi)∈(Xs× Ys)}Ns
i=1
where
xi∈ X
is
i
-th input sample
from the domain
Xs
, while
yi∈ Y
is the
i
-th output sample
from the domain
Ys
. For the classification problem,
yi
can be
further narrowed to a binary value.
In opposition to an assumption that the relationship
f
remains
unchanged among dependent variables
xi∈ Xs
and indepen-
dent variables
yi∈ Ys
in the space
R2
, spatial heterogeneity
describes a condition in which the relationships between some
sets of variables
{xi, yi}
are heterogeneous throughout space,
i.e.,
fs6=fs0
if
s6=s0
. A static global model cannot capture
the changes in relationships, thus Domain Generalization (DG)
models which could reflect the heterogeneous relationships
within the data play a vital role in spatial analysis.
Our goal in this paper is to build a model that proactively
captures the data concept drift across different geo-locations.
Given a set of data samples
{(xs,ys)}s∈S0
from seen domains,
where
S0
denotes the set of seen locations, we aim to learn the