
ArgoSSM A PREPRINT
2 A generative model of ocean float movement
To motivate the probabilistic framework of ArgoSSM, we start with a simple model of how Argo floats move through
the ocean. Profile data is collected at times
t1< t2< . . . tN
, approximately ten days apart. At each time
tn
, let
Xn
be
the geographic position in latitude and longitude at time
tn
. We take into account the elapsed time
∆tn=tn−tn−1
when updating the position from
Xn−1
to
Xn
. If the float’s position at
n−1
was
Xn−1
, we might expect that
Xn
will be close to
Xn−1
with noise proportional to the time passed. This can be written explicitly as a two-dimensional
random walk (RW) model:
Xn=Xn−1+X
n,(1)
where X
nfollows a multivariate Gaussian distribution with zero mean and covariance ∆tnΣX.
For a given index
n
, the expected value of
Xn
conditioned on
Xn−1
and
Xn+1
is a time-weighted average of
Xn−1
and
Xn+1
. Thus, the RW model is a generative model of float movement where the optimal predictor for an unseen
point is linear interpolation. Linear interpolation might work well for short gaps in time, but it breaks down for large
gaps in time that are seen in the Southern Ocean. This because linear interpolation ignores local information such as
momentum. If we know the current has carried the float from position
Xn−1
to position
Xn
, that same current will
likely carry the float further in the same direction. More specifically, each float has a velocity
Vn
that indicates where it
is headed next. Thus, we modify Equation 1 to take velocity into account:
Xn=Xn−1+ ∆tnVn−1+X
n.(2)
The velocity Vnalso changes over time according to an auto-regressive (AR) model:
Vn= (1 −α∆tn)v0+α∆tnVn−1+V
n,(3)
where
V
n
follows a multivariate Gaussian distribution with zero mean and covariance
∆tnΣV
. Two parameters govern
the velocity update: the long-run velocity of the float
v0
and the autoregressive term
α∈[0,1]
. The parameter
α
determines how quickly the velocity reverts to the long-run velocity v0.
We refer to Equations 2 and 3 collectively as the AR model. While the AR model is more realistic than the RW model,
it simplifies the true behavior of the floats. Notably, it ignores the float’s vertical movement as it rises or drops in the
ocean and intra-day movement on the ocean surface. Thus, the velocity state-variable should be interpreted as the
average direction of the float over several days rather than a local estimate of the instantaneous velocity. The AR model
is similar to the state-space model introduced in Chamberlain et al. [2018], though with key differences. The main
modeling difference is that Chamberlain et al. [2018] updates the velocity according to a random walk (corresponding to
α= 1
in Equation 3). In Section 5, we find for many floats that the inferred autoregressive parameter
α
is significantly
less than
1
. Moreover, Chamberlain et al. [2018] fixes all parameter values, whereas we estimate them alongside the
positions and velocities.
Information about the float’s position comes from GPS measurements Yncorresponding to each Xn:
Yn=Xn+Y
n,(4)
where
Y
n
is the measurement error that follows a multivariate Gaussian with zero mean and covariance
ΣY
. With the
Iridium satellite system, Argo GPS measurements are rated to be accurate to within eight meters [Wong et al., 2020], so
the variability of the measurement error
Y
n
in Equation 4 will typically be magnitudes lower than that of the transition
error X
nin Equation 2.
2.1 Missing due to ice cover
While GPS measurements accurately pin down the floats’ locations, they may not always be available. To represent this
availability, let
An
be an indicator variable that equals one if GPS is available at time
tn
and zero otherwise. In the
Southern Ocean, since the float only surfaces after three consecutive ice-free detections,
An
is mostly determined by
the ice-avoidance algorithm [Klatt et al., 2007], which depends on the concentration of ice in the area.
To model the availability indicator
An
, we first require an estimated probability of detecting ice. We have available
daily ice concentration estimates from Fetterer et al. [2017], which uses remotely-sensed data from microwave
instruments on satellites. Let
E(x, t)
be the concentration of ice at position
x
and time
t
. Accounting for imperfect
ice detection due to limited resolution, the probability that the float detects ice at position
x
and time
t
is
˜
E(x, t) =
pTPRE(x, t) + (1 −pTNR)E(x, t)
, where
pTPR
is the “true positive rate” (correctly detecting ice) and
pTNR
is the “true
negative rate” (correctly detecting no ice). We expect
pTPR
and
pTNR
to be close to
1
, but since detections are based on
the temperature of the water, we expect to see more false positives than false negatives (i.e. pTNR < pTPR).
3