Be Prospective Not Retrospective A Philosophy for Advancing Reproducibility in Modern Biological Research Grin Chure

2025-04-27 0 0 2.91MB 9 页 10玖币

侵权投诉

Be Prospective, Not Retrospective: A Philosophy for Advancing

Reproducibility in Modern Biological Research

Griﬃn Chure

Department of Biology, Stanford University, Stanford, CA, USA.

griffinchure@gmail.com

October 7, 2022

Abstract

The ubiquity of computation in modern scientiﬁc research inﬂicts new challenges for reproducibility. While

most journals now require code and data be made available, the standards for organization, annotation, and

validation remain lax, making the data and code often diﬃcult to decipher or practically use. I believe that

this is due to the documentation, collation, and validation of code and data only being done in retrospect. In

this essay, I reﬂect on my experience contending with these challenges and present a philosophy for prioritizing

reproducibility in modern biological research where balancing computational analysis and wet-lab experiments

is commonplace. Modern tools used in scientiﬁc workﬂows (such as GitHub repositories) lend themselves well

to this philosophy where reproducibility begins at project inception, not completion. To that end, I present

and provide a programming-language agnostic template architecture that can be immediately copied and made

bespoke to your next paper, whether your labwork is wet, dry, or somewhere in between.

Introduction

I entered graduate school in the Fall of 2013 determined to become a biophysicist even though my undergraduate

training was almost entirely focused on qualitative molecular biology and biochemistry. While I anticipated the

long nights of catching up with concepts from physics and the various mathematical methods that they required,

I had not anticipated how diﬃcult it would be to juggle my newfound love for computational research with

my expertise in wet-lab biology. I struggled in keeping my physical lab notebook—ﬁlled with images of gels,

marginal calculations of dilution factors, and the occasional stain of buﬀer—in logical sync with the code I would

use to process microscopy images or explore some aspects of my theoretical work. The code I wrote was cryptic

with sparse documentation, confusing variable names, and paths to directories that sometimes had never existed.

All of this ﬁnally came to a head in one night of misery before my PhD candidacy exam. Needing to rerun some

piece of analysis from the previous year, I opened my Python script only to ﬁnd that it was hard-coded to read

data from a folder on my Desktop that had been deleted several months prior. Frustrated and sleep deprived

at the time, I saw this as a personal scientiﬁc failure. I can now look back at that night with the certainty it

transformed the way I would approach my science for the rest of my PhD and beyond. After my candidacy

exam, I vowed I would never again burn myself (nor anybody else) by having my science be unorganized and

irreproducible.

Whether or not science is really experiencing a reproducibility crisis [1

–

3], accessing raw data or code from

other scientists is often a diﬃcult endeavor. This has been recently demonstrated by Gabelica et al. [4] who

attempted to obtain data listed as “available upon reasonable request” from

≈

1800 recent papers in the

biosciences. Of these articles, only

≈

7% ultimately shared their data, meaning that 93% of the studies could

not pass even the ﬁrst stage of reproduciblity. The reasons for this low response rate are varied, but are similar

to those published in another recent meta-analysis from Stodden et al. [5]. While more tightly focused in scope,

arXiv:2210.02593v1 [q-bio.OT] 5 Oct 2022

they had similar issues and were ultimately able to receive data and code from about 35% of their

≈

200 queries.

Even when data was provided, the authors were able to reproduce the scientiﬁc results from only

≈

60%. In

cases where data was not shared, the reasons varied from institutional/ethical restrictions to outright refusal

as their “code was not written with an eye toward distributing for other people to use.” (Ref. [5], p. 2585).

This can create a slew of problems. Trisovic et al. [6] recently demonstrated that only

≈

25 % of code released

alongside research papers could be run without error. This represents a view of computation held by many

scientists; it’s an exercise in personal research, never intended to be used by someone else. This pulls me back to

that fateful night in preparing for my candidacy exam. Not only did I write that code without an eye towards

sharing with others, I didn’t even write it for my future self.

Recent years have seen a ﬂurry of excellent papers outlining best practices for reproducible research, spanning

from scientiﬁc programming guidelines [7

–

10], to general and specialized data annotation [11,12], to instructions

for bundling entire projects as “reproducible packages” [13] and I encourage the reader to give them a look.

However, I take a diﬀerent approach in this essay and give my perspective as a practicing biologist who thinks

about how to maximize reproducibility alongside designing, executing, and analyzing experiments.

Data as modern scientiﬁc currency

I view research as a journey with the generation, manipulation, visualization, and interpretation of data as

the overarching themes. Here, I take very general deﬁnition of “data” to mean “a collection of qualitative or

quantitative facts” such that results from simulations, mathematical analysis, and bench-top experiments are

treated equivalently as data-generating processes. While we often remark that the “data speak for themselves”,

this is never truly the case. Not only do you give the data their voices, you give them the language they speak.

Reproducibility requires a Rosetta stone such that anyone can perform the translation and come to the same

results.

Consider the “typical” cycle of science as depicted in Figure 1. Beginning with hypotheses, experiments are

designed to thoroughly test and falsify them

, resulting in the generation of new data. These data, whether

they come from tangible or computational experiments, often need to be manipulated through processing,

cleaning, and analysis pipelines before they can be truly understood. In all cases, these data must be visualized

in a way where the experimenter can use their expertise and logical creativity to interpret the results, allowing

conclusions to be drawn and the hypothesis to be conﬁrmed, refuted, or reﬁned. In the modern scientiﬁc

enterprise, each of these steps require a combination of instructions that are physical and targeted to humans

(protocols, observations, notes, etc.) and digital records which are computer-readable (code, instrument settings,

accession numbers, etc.). In order for this process to be reproducible, each of these steps must have their

instructions meticulously kept and clearly documented. With enough care, these instructions come together to

serve as your Rosetta stone.

Philosophical pillars for reproducibility

“Making your research reproducible” is easier to say than to do. Through my years of experience in prioritizing

reproducibility in my own work, I’ve found four key principles to be critical to performing my research in a

reproducible manner [Figure 2(A)]. While the detailed structure or the questions I pose may not be appropriate

for your particular project or experiment, the philosophy behind it will likely still apply. This allows you to make

a tailor-made reproducible workﬂow from the ground up in a way that others can follow.

In exploratory research, experiments are designed to properly collect data from which hypotheses will be drawn. In meta-analyses,

the “experiments” may be collection of data from previously published papers or other resources. In either case, the cycle shown in

Figure 1 still applies.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

BeProspective,NotRetrospective:APhilosophyforAdvancingReproducibilityinModernBiologicalResearchGrinChureDepartmentofBiology,StanfordUniversity,Stanford,CA,USA.griffinchure@gmail.comOctober7,2022AbstractTheubiquityofcomputationinmodernscienticresearchinictsnewchallengesforreproducibility.Whilemostj...

展开>> 收起<<

Be Prospective Not Retrospective A Philosophy for Advancing Reproducibility in Modern Biological Research Grin Chure.pdf

共9页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Be Prospective Not Retrospective A Philosophy for Advancing Reproducibility in Modern Biological Research Grin Chure

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: