QuTE decentralized multiple testing on sensor networks with false discovery rate control Aaditya Ramdas1 Jianbo Chen2

2025-04-29 1 0 1.49MB 15 页 10玖币

侵权投诉

QuTE: decentralized multiple testing on sensor networks

with false discovery rate control*

Aaditya Ramdas1, Jianbo Chen2,

Martin J. Wainwright2,3, Michael I. Jordan2,3

aramdas@cmu.edu, jianbochen@berkeley.edu

{wainwrig,jordan}@berkeley.edu

1Department of Statistics and Data Science, Carnegie Mellon University

Departments of Statistics2and EECS3, UC Berkeley

Nov 14, 2018

Abstract

This paper designs methods for decentralized multiple hypothesis testing on graphs that are

equipped with provable guarantees on the false discovery rate (FDR). We consider the setting

where distinct agents reside on the nodes of an undirected graph, and each agent possesses p-

values corresponding to one or more hypotheses local to its node. Each agent must individually

decide whether to reject one or more of its local hypotheses by only communicating with its

neighbors, with the joint aim that the global FDR over the entire graph must be controlled at

a predeﬁned level. We propose a simple decentralized family of Query-Test-Exchange (QuTE)

algorithms and prove that they can control FDR under independence or positive dependence of the

p-values. Our algorithm reduces to the Benjamini-Hochberg (BH) algorithm when after graph-

diameter rounds of communication, and to the Bonferroni procedure when no communication

has occurred or the graph is empty. To avoid communicating real-valued p-values, we develop a

quantized BH procedure, and extend it to a quantized QuTE procedure. QuTE works seamlessly

in streaming data settings, where anytime-valid p-values may be continually updated at each

node. Last, QuTE is robust to arbitrary dropping of packets, or a graph that changes at every step,

making it particularly suitable to mobile sensor networks involving drones or other multi-agent

systems. We study the power of our procedure using a simulation suite of different levels of

connectivity and communication on a variety of graph structures, and also provide an illustrative

real-world example.

1 Introduction

Research on decentralized detection and hypothesis testing has a long history, dating back to seminal

work from the 1980s (e.g., [25, 24]), and with more recent work motivated in particular by wireless

sensor networks (e.g., [6, 1, 16, 26, 4]). A variety of issues have been addressed, including robust-

ness [11], quantization error [13], the sequential nature of data collection [10], battery/energy man-

agement of the sensors [23], tolerance of misbehaving nodes [22], nonparametric methods [12, 15],

*This paper appeared in the IEEE CDC’17 conference proceedings [18]. The last two sections were then developed in

2018, after which the paper stagnated, and it is being put on arXiv now simply to broaden access.

arXiv:2210.04334v1 [stat.ME] 9 Oct 2022

and bandwidth constraints [14]. However, these have focused on the ability to make a decision about a

single global hypothesis in a distributed fashion. In contrast, this paper studies the testing of multiple

hypotheses in a decentralized manner.

To be more speciﬁc, consider a graph with one agent at each node, and suppose that each agent

wishes to test one or more binary hypotheses that are local to its node, with each hypothesis corre-

sponding to the presence or absence of some signal. Of this collection of hypotheses, an unknown

subset corresponds to true null hypotheses (absence of signal), whereas the rest are false (presence of

signal), and should be rejected. Apart from having access to the data and hypotheses at its own node,

each agent may also directly communicate with neighboring nodes on the graph. Our setup is directly

related to, yet complementary to, the works of [9, 20, 27, 19].

For explicit context, one example of such an agent could be a low-power sensor with the ability to

perform local measurements and simple computations. Then, the graph may represent a multitude of

such sensors deployed over a widespread area, such as a forest, building or aircraft. Due to power con-

straints, the agent may communicate locally with agents at neighboring nodes, but only infrequently

communicate with a distant centralized server. A rejected hypothesis corresponds to a discovery of

some sort, such as a detected anomaly or a deviation from expected or recent measurements, and it is

only in such cases that a local node is allowed to communicate with the central server.

Each agent aims to test their local null hypotheses, and reject a subset of nulls, proclaiming them

to be false based on its local neighborhood information. We assume that each agent can calculate a

p-value for each of its local hypotheses, which provides evidence against the respective null. This

paper is agnostic to the exact methods that the node uses to calculate its local p-values; this topic has

received signiﬁcant attention in past work, and can be done in various ways. Each agent must use the

local p-values, and possibly those of its neighbors, to decide which hypotheses to reject locally. The

challenge is that these local decisions must be made with the aim of controlling a global error metric.

In this paper, the error metric is chosen as the overall false discovery rate (FDR) across the whole

graph, and we require that the FDR must be controlled at a pre-deﬁned level.

The false discovery rate (FDR) is an error metric that generalizes the notion of type-1 error, or

false positive error, to the setting of multiple hypotheses. It is deﬁned as the expected ratio of the

number of false discoveries to the total number of discoveries. It was ﬁrst proposed by Eklund [7],

and various heuristic methods were discussed by Eklund and Seeger [8, 21]. Its contemporary usage

has been popularized by the seminal paper of Benjamini and Hochberg [2] who rigorously analyzed

a procedure that is now known as the Benjamini-Hochberg (BH) procedure.

Algorithms to control the FDR make sense in settings where type-1 errors are possibly more

costly than type-2 errors. Since such an asymmetry arises naturally in many situations, the FDR

has nevertheless become one of the de-facto standard error metrics employed across a wide range of

applied scientiﬁc ﬁelds. Its most common application is in a batched, centralized setting, where all

the p-values are pooled, and a decision is made by thresholding them at some data-dependent level,

such as the level determined by the Benjamini-Hochberg (BH) procedure [2].

In this paper, we develop a novel family of algorithms for decentralized FDR control on undirected

graphs. Both the setting and the family of algorithms are new, to the best of our knowledge. Building

on recent theoretical advances, we prove that global FDR over the whole graph is controlled under

two types of dependence assumptions between the p-values at the different nodes; the setting of

independent p-values is easiest, but a general type of “positive dependence” termed PRDS can also

be handled.

A given instantiation of the QuTE algorithm is simple and scalable, and works as follows. First,

a node queries its neighbors for their p-values. Second, it conducts a local test on the local neighbor-

hood p-values at an appropriate level determined by the ratio of the size of its neighborhood to the

size of the graph. Third, if a node thinks that neighboring hypotheses should be rejected, it informs

the relevant neighbors. Lastly, each node rejects one or more of its local hypotheses as long as either

its local test suggested so, or if any of its neighbors suggested so. We run simulations on a variety

of graphs, including Erd¨

os-R´

enyi random graphs and planar grid graphs, to provide some intuition

about their achieved FDR and power. We also give an illustrative example of possible real-world

applications of our setup and algorithm using a public Intel Lab dataset involving low-power sensors

taking various kinds of measurements in a large open space.

Beyond guarantees on FDR control, our family of QuTE algorithms has another appealing prop-

erty. When the graph is either complete (contains all edges) or a star, there exists at least one node

that can access all the information in the network, in which case our algorithms reduce to the batch

Benjamini-Hochberg procedure. When the graph is empty (without edges), and each node tests only

one hypothesis without information about other nodes in the network, then our algorithms essentially

reduce to the conservative Bonferroni procedure. Hence, one can view our algorithm as bridging the

two extremes, depending on how much information is available at each node.

The rest of this paper is organized as follows. Section 2 presents the problem setup, including

deﬁnitions and assumptions. We formally describe our algorithms and their associated theoretical

guarantees in Section 3. We present the proof of our main theorem in Appendix A and Appendix B.

We run a variety of simulations in Section 4 and illustrate an application to real data in Section 5. A

discussion of our work and future directions can be found in Section ??.

2 Background and problem setup

Consider a graph G= (V,E), where each node a∈Vis associated with an agent. A given agent

amay communicate directly with any of its neighbors b, meaning nodes for which the edge (a,b)

belongs to the edge set. Associated with agent ais a collection Ha={Ha,i}na

i=1of nahypotheses to

be tested. The sum N=∑a∈Vnacorresponds to the the total number of hypotheses. Out of these,

let H0

arepresent the (unknown) subset of true null hypotheses at node a, and let H0=∪a∈VH0

represent the overall set of true null hypotheses. We assume that the agent at each node observes one

p-value corresponding to each hypothesis at that node, labeled P

a,1,...,P

a,na. It is understood that

when a null hypothesis is true, the corresponding p-value is stochastically larger than the uniform

distribution (super-uniform, for short), meaning that

for all Ha,i∈H0, we have Pr{P

a,i≤u}≤ufor all u∈[0,1].(1)

Given observations of the p-values at its node, each agent may communicate with its neighbors,

and must then decide which subset of its hypotheses to reject; each rejection is called a discovery,

and rejecting a true null hypothesis corresponds to a false discovery. Let Rbe the total number

of discoveries, and Vbe the total number of false discoveries. Even though each agent makes a

local decision with only local information, we might still desire to control a global measure of error.

However, even in the centralized setting, there is no single notion of type-1 error for multiple testing

problems; we discuss two common variants below.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

QuTE:decentralizedmultipletestingonsensornetworkswithfalsediscoveryratecontrol*AadityaRamdas1,JianboChen2,MartinJ.Wainwright2;3,MichaelI.Jordan2;3aramdas@cmu.edu,jianbochen@berkeley.edufwainwrig,jordang@berkeley.edu1DepartmentofStatisticsandDataScience,CarnegieMellonUniversityDepartmentsofStatistics...

展开>> 收起<<

QuTE decentralized multiple testing on sensor networks with false discovery rate control Aaditya Ramdas1 Jianbo Chen2.pdf

共15页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

QuTE decentralized multiple testing on sensor networks with false discovery rate control Aaditya Ramdas1 Jianbo Chen2

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: