
Assuring Safety of Vision-Based Swarm Formation Control
Chiao Hsieh1, Yubin Koh1, Yangge Li1and Sayan Mitra1
Abstract— Vision-based formation control systems are at-
tractive because they can use inexpensive sensors and can
work in GPS-denied environments. The safety assurance for
such systems is challenging: the vision component’s accuracy
depends on the environment in complicated ways, these errors
propagate through the system and lead to incorrect control
action, and there exists no formal specification for end-to-end
reasoning. We address this problem and propose a technique
for safety assurance of vision-based formation control: First, we
propose a scheme for constructing quantizers that are consistent
with vision-based perception. Next, we show how the conver-
gence analysis of a standard quantized consensus algorithm
can be adapted for the constructed quantizers. We use the
recently defined notion of perception contracts to create error
bounds on the actual vision-based perception pipeline using
sampled data from different ground truth states, environments,
and weather conditions. Specifically, we use a quantizer in
logarithmic polar coordinates, and we show that this quantizer
is sutiable for the constructed perception contracts for the
vision-based position estimation, where the error worsens with
respect to the absolute distance between agents. We build our
formation control algorithm with this nonuniform quantizer,
and we prove its convergence employing an existing result for
quantized consensus.
I. INTRODUCTION
Distributed consensus, flocking, and formation control
have been studied extensively, including in scenarios where
the participating agents only have partial state information
(see, for example [1]–[4]). With the advent of deep learn-
ing and powerful computer vision algorithms, it is now
feasible for agents to use vision-based state estimation for
formation control (See Figure 1). Such systems can be
attractive because they do not require expensive sensors and
localization systems, and also can be used in GPS-denied
environments [5]–[7]. However, deep learning and vision
algorithms are well-known to be fragile, which can break the
correctness and safety of the end-to-end formation control
system. Further, it is difficult to specify the correctness of
a vision-based state estimator, which gets in the way of
modular design and testing of the overall formation control
system [8]. In this paper, we address these challenges and
present the first end-to-end formal analysis of a vision-based
formation control system.
We present analyses for both convergence and safety
assurance of a vision-based swarm formation control system.
The computer vision pipeline (See Figure 2) uses feature
detection, feature matching, and geometry to estimate the
relative position of the participating drones. The estimated
relative poses are then used by a consensus-based formation
1The authors are with Coordinated Science Laboratory, University of Illi-
nois Urbana-Champaign, Champaign, IL, USA {chsieh16, yubink2,
li213, mitras}@illinois.edu
Fig. 1: Vision-based drone formation using downward facing
camera images in AirSim.
control algorithm. There are two key challenges in analyzing
the system: (1) The perception errors impact the behavior
of neighboring agents and propagates through the entire
swarm. (2) The magnitude of the perception error is highly
nonuniform, and depends on the ground truth values of the
relative position between neighboring agents. In general,
perception errors can get worse as the system approaches the
equilibrium (desired formation), and thus, make stabilization
difficult. Environmental variations (e.g., lighting, fog) are
other factors that can make the vision-based system unstable.
In addressing the problem, our idea is to view the vision-
based formation control system as a quantized consensus
protocol [9]. We start with the assumption that the impact
of the state estimation errors arising from the vision pipeline
can be encapsulated as quantization errors in a non-uniform
quantization scheme. That is, the quantization step size
can vary non-uniformly with respect to the state so that
the quantization errors can overapproximate state dependent
perception errors. To discharge this assumption, our analysis
has to meet two requirements. First, we have to propose
aspecific quantization scheme under which the formation
control system is indeed guaranteed convergence. For this,
we develop a quantized formation controller (Equation (1))
and a logarithmic polar quantizer (Equation (2) and (3)), and
we show in Theorem 1 that indeed the resulting quantized
formation control protocol converges, using sufficient condi-
tions from [9]. Secondly, we have to show that a quantizer
instantiated from the quantization scheme matches the error
characteristics of the vision pipeline. For this part, we utilize
the recently developed idea of perception contracts [10],
[11]. A perception contract (PC) for a vision-based state
estimator bounds the estimation error as a function of the
ground truth state. Earlier in [11], PCs have been used to
establish the safety of vision-based lane keeping systems.
For formation control, however, the PCs are dramatically
different because the error has a highly non-uniform de-
arXiv:2210.00982v2 [cs.MA] 27 Sep 2023