QuTE decentralized multiple testing on sensor networks with false discovery rate control Aaditya Ramdas1 Jianbo Chen2

2025-04-29 0 0 1.49MB 15 页 10玖币
侵权投诉
QuTE: decentralized multiple testing on sensor networks
with false discovery rate control*
Aaditya Ramdas1, Jianbo Chen2,
Martin J. Wainwright2,3, Michael I. Jordan2,3
aramdas@cmu.edu, jianbochen@berkeley.edu
{wainwrig,jordan}@berkeley.edu
1Department of Statistics and Data Science, Carnegie Mellon University
Departments of Statistics2and EECS3, UC Berkeley
Nov 14, 2018
Abstract
This paper designs methods for decentralized multiple hypothesis testing on graphs that are
equipped with provable guarantees on the false discovery rate (FDR). We consider the setting
where distinct agents reside on the nodes of an undirected graph, and each agent possesses p-
values corresponding to one or more hypotheses local to its node. Each agent must individually
decide whether to reject one or more of its local hypotheses by only communicating with its
neighbors, with the joint aim that the global FDR over the entire graph must be controlled at
a predefined level. We propose a simple decentralized family of Query-Test-Exchange (QuTE)
algorithms and prove that they can control FDR under independence or positive dependence of the
p-values. Our algorithm reduces to the Benjamini-Hochberg (BH) algorithm when after graph-
diameter rounds of communication, and to the Bonferroni procedure when no communication
has occurred or the graph is empty. To avoid communicating real-valued p-values, we develop a
quantized BH procedure, and extend it to a quantized QuTE procedure. QuTE works seamlessly
in streaming data settings, where anytime-valid p-values may be continually updated at each
node. Last, QuTE is robust to arbitrary dropping of packets, or a graph that changes at every step,
making it particularly suitable to mobile sensor networks involving drones or other multi-agent
systems. We study the power of our procedure using a simulation suite of different levels of
connectivity and communication on a variety of graph structures, and also provide an illustrative
real-world example.
1 Introduction
Research on decentralized detection and hypothesis testing has a long history, dating back to seminal
work from the 1980s (e.g., [25, 24]), and with more recent work motivated in particular by wireless
sensor networks (e.g., [6, 1, 16, 26, 4]). A variety of issues have been addressed, including robust-
ness [11], quantization error [13], the sequential nature of data collection [10], battery/energy man-
agement of the sensors [23], tolerance of misbehaving nodes [22], nonparametric methods [12, 15],
*This paper appeared in the IEEE CDC’17 conference proceedings [18]. The last two sections were then developed in
2018, after which the paper stagnated, and it is being put on arXiv now simply to broaden access.
arXiv:2210.04334v1 [stat.ME] 9 Oct 2022
and bandwidth constraints [14]. However, these have focused on the ability to make a decision about a
single global hypothesis in a distributed fashion. In contrast, this paper studies the testing of multiple
hypotheses in a decentralized manner.
To be more specific, consider a graph with one agent at each node, and suppose that each agent
wishes to test one or more binary hypotheses that are local to its node, with each hypothesis corre-
sponding to the presence or absence of some signal. Of this collection of hypotheses, an unknown
subset corresponds to true null hypotheses (absence of signal), whereas the rest are false (presence of
signal), and should be rejected. Apart from having access to the data and hypotheses at its own node,
each agent may also directly communicate with neighboring nodes on the graph. Our setup is directly
related to, yet complementary to, the works of [9, 20, 27, 19].
For explicit context, one example of such an agent could be a low-power sensor with the ability to
perform local measurements and simple computations. Then, the graph may represent a multitude of
such sensors deployed over a widespread area, such as a forest, building or aircraft. Due to power con-
straints, the agent may communicate locally with agents at neighboring nodes, but only infrequently
communicate with a distant centralized server. A rejected hypothesis corresponds to a discovery of
some sort, such as a detected anomaly or a deviation from expected or recent measurements, and it is
only in such cases that a local node is allowed to communicate with the central server.
Each agent aims to test their local null hypotheses, and reject a subset of nulls, proclaiming them
to be false based on its local neighborhood information. We assume that each agent can calculate a
p-value for each of its local hypotheses, which provides evidence against the respective null. This
paper is agnostic to the exact methods that the node uses to calculate its local p-values; this topic has
received significant attention in past work, and can be done in various ways. Each agent must use the
local p-values, and possibly those of its neighbors, to decide which hypotheses to reject locally. The
challenge is that these local decisions must be made with the aim of controlling a global error metric.
In this paper, the error metric is chosen as the overall false discovery rate (FDR) across the whole
graph, and we require that the FDR must be controlled at a pre-defined level.
The false discovery rate (FDR) is an error metric that generalizes the notion of type-1 error, or
false positive error, to the setting of multiple hypotheses. It is defined as the expected ratio of the
number of false discoveries to the total number of discoveries. It was first proposed by Eklund [7],
and various heuristic methods were discussed by Eklund and Seeger [8, 21]. Its contemporary usage
has been popularized by the seminal paper of Benjamini and Hochberg [2] who rigorously analyzed
a procedure that is now known as the Benjamini-Hochberg (BH) procedure.
Algorithms to control the FDR make sense in settings where type-1 errors are possibly more
costly than type-2 errors. Since such an asymmetry arises naturally in many situations, the FDR
has nevertheless become one of the de-facto standard error metrics employed across a wide range of
applied scientific fields. Its most common application is in a batched, centralized setting, where all
the p-values are pooled, and a decision is made by thresholding them at some data-dependent level,
such as the level determined by the Benjamini-Hochberg (BH) procedure [2].
In this paper, we develop a novel family of algorithms for decentralized FDR control on undirected
graphs. Both the setting and the family of algorithms are new, to the best of our knowledge. Building
on recent theoretical advances, we prove that global FDR over the whole graph is controlled under
two types of dependence assumptions between the p-values at the different nodes; the setting of
independent p-values is easiest, but a general type of “positive dependence” termed PRDS can also
be handled.
A given instantiation of the QuTE algorithm is simple and scalable, and works as follows. First,
a node queries its neighbors for their p-values. Second, it conducts a local test on the local neighbor-
hood p-values at an appropriate level determined by the ratio of the size of its neighborhood to the
size of the graph. Third, if a node thinks that neighboring hypotheses should be rejected, it informs
the relevant neighbors. Lastly, each node rejects one or more of its local hypotheses as long as either
its local test suggested so, or if any of its neighbors suggested so. We run simulations on a variety
of graphs, including Erd¨
os-R´
enyi random graphs and planar grid graphs, to provide some intuition
about their achieved FDR and power. We also give an illustrative example of possible real-world
applications of our setup and algorithm using a public Intel Lab dataset involving low-power sensors
taking various kinds of measurements in a large open space.
Beyond guarantees on FDR control, our family of QuTE algorithms has another appealing prop-
erty. When the graph is either complete (contains all edges) or a star, there exists at least one node
that can access all the information in the network, in which case our algorithms reduce to the batch
Benjamini-Hochberg procedure. When the graph is empty (without edges), and each node tests only
one hypothesis without information about other nodes in the network, then our algorithms essentially
reduce to the conservative Bonferroni procedure. Hence, one can view our algorithm as bridging the
two extremes, depending on how much information is available at each node.
The rest of this paper is organized as follows. Section 2 presents the problem setup, including
definitions and assumptions. We formally describe our algorithms and their associated theoretical
guarantees in Section 3. We present the proof of our main theorem in Appendix A and Appendix B.
We run a variety of simulations in Section 4 and illustrate an application to real data in Section 5. A
discussion of our work and future directions can be found in Section ??.
2 Background and problem setup
Consider a graph G= (V,E), where each node aVis associated with an agent. A given agent
amay communicate directly with any of its neighbors b, meaning nodes for which the edge (a,b)
belongs to the edge set. Associated with agent ais a collection Ha={Ha,i}na
i=1of nahypotheses to
be tested. The sum N=aVnacorresponds to the the total number of hypotheses. Out of these,
let H0
arepresent the (unknown) subset of true null hypotheses at node a, and let H0=aVH0
a
represent the overall set of true null hypotheses. We assume that the agent at each node observes one
p-value corresponding to each hypothesis at that node, labeled P
a,1,...,P
a,na. It is understood that
when a null hypothesis is true, the corresponding p-value is stochastically larger than the uniform
distribution (super-uniform, for short), meaning that
for all Ha,iH0, we have Pr{P
a,iu}ufor all u[0,1].(1)
Given observations of the p-values at its node, each agent may communicate with its neighbors,
and must then decide which subset of its hypotheses to reject; each rejection is called a discovery,
and rejecting a true null hypothesis corresponds to a false discovery. Let Rbe the total number
of discoveries, and Vbe the total number of false discoveries. Even though each agent makes a
local decision with only local information, we might still desire to control a global measure of error.
However, even in the centralized setting, there is no single notion of type-1 error for multiple testing
problems; we discuss two common variants below.
摘要:

QuTE:decentralizedmultipletestingonsensornetworkswithfalsediscoveryratecontrol*AadityaRamdas1,JianboChen2,MartinJ.Wainwright2;3,MichaelI.Jordan2;3aramdas@cmu.edu,jianbochen@berkeley.edufwainwrig,jordang@berkeley.edu1DepartmentofStatisticsandDataScience,CarnegieMellonUniversityDepartmentsofStatistics...

收起<<
QuTE decentralized multiple testing on sensor networks with false discovery rate control Aaditya Ramdas1 Jianbo Chen2.pdf

共15页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:15 页 大小:1.49MB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 15
客服
关注