and bandwidth constraints [14]. However, these have focused on the ability to make a decision about a
single global hypothesis in a distributed fashion. In contrast, this paper studies the testing of multiple
hypotheses in a decentralized manner.
To be more specific, consider a graph with one agent at each node, and suppose that each agent
wishes to test one or more binary hypotheses that are local to its node, with each hypothesis corre-
sponding to the presence or absence of some signal. Of this collection of hypotheses, an unknown
subset corresponds to true null hypotheses (absence of signal), whereas the rest are false (presence of
signal), and should be rejected. Apart from having access to the data and hypotheses at its own node,
each agent may also directly communicate with neighboring nodes on the graph. Our setup is directly
related to, yet complementary to, the works of [9, 20, 27, 19].
For explicit context, one example of such an agent could be a low-power sensor with the ability to
perform local measurements and simple computations. Then, the graph may represent a multitude of
such sensors deployed over a widespread area, such as a forest, building or aircraft. Due to power con-
straints, the agent may communicate locally with agents at neighboring nodes, but only infrequently
communicate with a distant centralized server. A rejected hypothesis corresponds to a discovery of
some sort, such as a detected anomaly or a deviation from expected or recent measurements, and it is
only in such cases that a local node is allowed to communicate with the central server.
Each agent aims to test their local null hypotheses, and reject a subset of nulls, proclaiming them
to be false based on its local neighborhood information. We assume that each agent can calculate a
p-value for each of its local hypotheses, which provides evidence against the respective null. This
paper is agnostic to the exact methods that the node uses to calculate its local p-values; this topic has
received significant attention in past work, and can be done in various ways. Each agent must use the
local p-values, and possibly those of its neighbors, to decide which hypotheses to reject locally. The
challenge is that these local decisions must be made with the aim of controlling a global error metric.
In this paper, the error metric is chosen as the overall false discovery rate (FDR) across the whole
graph, and we require that the FDR must be controlled at a pre-defined level.
The false discovery rate (FDR) is an error metric that generalizes the notion of type-1 error, or
false positive error, to the setting of multiple hypotheses. It is defined as the expected ratio of the
number of false discoveries to the total number of discoveries. It was first proposed by Eklund [7],
and various heuristic methods were discussed by Eklund and Seeger [8, 21]. Its contemporary usage
has been popularized by the seminal paper of Benjamini and Hochberg [2] who rigorously analyzed
a procedure that is now known as the Benjamini-Hochberg (BH) procedure.
Algorithms to control the FDR make sense in settings where type-1 errors are possibly more
costly than type-2 errors. Since such an asymmetry arises naturally in many situations, the FDR
has nevertheless become one of the de-facto standard error metrics employed across a wide range of
applied scientific fields. Its most common application is in a batched, centralized setting, where all
the p-values are pooled, and a decision is made by thresholding them at some data-dependent level,
such as the level determined by the Benjamini-Hochberg (BH) procedure [2].
In this paper, we develop a novel family of algorithms for decentralized FDR control on undirected
graphs. Both the setting and the family of algorithms are new, to the best of our knowledge. Building
on recent theoretical advances, we prove that global FDR over the whole graph is controlled under
two types of dependence assumptions between the p-values at the different nodes; the setting of
independent p-values is easiest, but a general type of “positive dependence” termed PRDS can also
be handled.
A given instantiation of the QuTE algorithm is simple and scalable, and works as follows. First,