Probabilistic Model Incorporating Auxiliary Covariates to Control FDR

2025-05-02 0 0 4.24MB 5 页 10玖币

侵权投诉

Probabilistic Model Incorporating Auxiliary Covariates to

Control FDR

Lin Qiu

lin.qiu.stats@gmail.com

The Pennsylvania State University

State College, PA, USA

Nils Murrugarra-Llerena

nmurrugarrallerena@weber.edu

Weber State University

Ogden, UT, USA

Vítor Silva

vitor.silva.sousa@gmail.com

Snap Inc.

Santa Monica, CA, USA

Lin Lin

l.lin@duke.edu

Duke University

Durham, NC, USA

Vernon M. Chinchilli

vchinchi@psu.edu

The Pennsylvania State University

Hershey, PA, USA

ABSTRACT

Controlling False Discovery Rate (FDR) while leveraging the side

information of multiple hypothesis testing is an emerging research

topic in modern data science. Existing methods rely on the test-

level covariates while ignoring metrics about test-level covariates.

This strategy may not be optimal for complex large-scale problems,

where indirect relations often exist among test-level covariates and

auxiliary metrics or covariates. We incorporate auxiliary covari-

ates among test-level covariates in a deep Black-Box framework

(

named as NeurT-FDR

) which boosts statistical power and controls

FDR for multiple hypothesis testing. Our method parametrizes the

test-level covariates as a neural network and adjusts the auxiliary

covariates through a regression framework, which enables exible

handling of high-dimensional features as well as ecient end-to-

end optimization. We show that

NeurT-FDR

makes substantially

more discoveries in three real datasets compared to competitive

baselines.

CCS CONCEPTS

•Mathematics of computing →Probabilistic algorithms.

KEYWORDS

Social Media Content Understanding, Multiple Hypothesis Testing,

FDR Control

ACM Reference Format:

Lin Qiu, Nils Murrugarra-Llerena, Vítor Silva, Lin Lin, and Vernon M. Chin-

chilli. 2022. Probabilistic Model Incorporating Auxiliary Covariates to Con-

trol FDR. In Proceedings of the 31st ACM International Conference on In-

formation and Knowledge Management (CIKM ’22), October 17–21, 2022,

Atlanta, GA, USA. ACM, New York, NY, USA, 5 pages. https://doi.org/10.

1145/3511808.3557672

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for prot or commercial advantage and that copies bear this notice and the full citation

on the rst page. Copyrights for components of this work owned by others than ACM

must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,

to post on servers or to redistribute to lists, requires prior specic permission and/or a

fee. Request permissions from permissions@acm.org.

CIKM ’22, October 17–21, 2022, Atlanta, GA, USA

ACM ISBN 978-1-4503-9236-5/22/10. . . $15.00

https://doi.org/10.1145/3511808.3557672

1 INTRODUCTION

In modern statistics, from genetics, neuroimaging, to online ad-

vertising, researchers routinely test thousands or millions of hy-

potheses at a time [

] to discover unique data instances. Current

approaches [

] solve this problem via Multiple Hypothesis Testing

(MHT). MHT aims to maximize the number of discoveries while

controlling the False Discovery Rate (FDR). For example, in social

media, we may want to identify popular social media posts than

normal ones. Also, in biology, we may want to discover which

cancer cells respond positively to the treatment under a new drug.

Existing MHT approaches [

] only use covariate-adaptive

FDR procedures on top of test-level covariates to improve the detec-

tion power while maintaining the target FDR. Test-level covariates

only provide characteristics of the samples in the dataset, which can

be metadata of social media posts, or genomic proles for each cell.

However, depending on the domain, we can access complementary

information besides test-level covariates that can facilitate the work

of MHT approaches. For example, as shown in Figure 1, in the social

media domain, the goal is to nd engaging content, and the post

can be represented by visual tags and metadata information. Addi-

tionally, content consumption metrics, such as the number of views

and content view time, are available. These metrics encapsulate

information that facilitates MHT work. This additional information

is called auxiliary covariates and corresponds to the samples in

the dataset. More specically, content consumption metrics do not

correspond to characteristics of the sample, i.e., posted content,

but how users interact in the platform to access this content. Typi-

cally, such auxiliary covariates are of lower dimension than those

test-level covariates (e.g., visual tags), and are more structured.

In this paper, we present a hierarchical probabilistic black-box

method which incorporates test and auxiliary covariates to con-

trol the FDR, named NeurT-FDR. Our main contributions can be

summarized as follows:

•

We pioneer the use of both auxiliary and the test-level co-

variates for multiple hypothesis testing problems.

•

We developed a novel MHT model that jointly learns test-

level and auxiliary covariates through a neural network,

which enables ecient optimization and gracefully handles

high-dimensional hypothesis covariates.

arXiv:2210.03178v1 [stat.ML] 6 Oct 2022

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ProbabilisticModelIncorporatingAuxiliaryCovariatestoControlFDRLinQiulin.qiu.stats@gmail.comThePennsylvaniaStateUniversityStateCollege,PA,USANilsMurrugarra-Llerenanmurrugarrallerena@weber.eduWeberStateUniversityOgden,UT,USAVítorSilvavitor.silva.sousa@gmail.comSnapInc.SantaMonica,CA,USALinLinl.lin@duk...

展开>> 收起<<

Probabilistic Model Incorporating Auxiliary Covariates to Control FDR.pdf

共5页,预览1页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Probabilistic Model Incorporating Auxiliary Covariates to Control FDR

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: