Be Prospective Not Retrospective A Philosophy for Advancing Reproducibility in Modern Biological Research Grin Chure

2025-04-27 0 0 2.91MB 9 页 10玖币
侵权投诉
Be Prospective, Not Retrospective: A Philosophy for Advancing
Reproducibility in Modern Biological Research
Griffin Chure
Department of Biology, Stanford University, Stanford, CA, USA.
griffinchure@gmail.com
October 7, 2022
Abstract
The ubiquity of computation in modern scientific research inflicts new challenges for reproducibility. While
most journals now require code and data be made available, the standards for organization, annotation, and
validation remain lax, making the data and code often difficult to decipher or practically use. I believe that
this is due to the documentation, collation, and validation of code and data only being done in retrospect. In
this essay, I reflect on my experience contending with these challenges and present a philosophy for prioritizing
reproducibility in modern biological research where balancing computational analysis and wet-lab experiments
is commonplace. Modern tools used in scientific workflows (such as GitHub repositories) lend themselves well
to this philosophy where reproducibility begins at project inception, not completion. To that end, I present
and provide a programming-language agnostic template architecture that can be immediately copied and made
bespoke to your next paper, whether your labwork is wet, dry, or somewhere in between.
Introduction
I entered graduate school in the Fall of 2013 determined to become a biophysicist even though my undergraduate
training was almost entirely focused on qualitative molecular biology and biochemistry. While I anticipated the
long nights of catching up with concepts from physics and the various mathematical methods that they required,
I had not anticipated how difficult it would be to juggle my newfound love for computational research with
my expertise in wet-lab biology. I struggled in keeping my physical lab notebook—filled with images of gels,
marginal calculations of dilution factors, and the occasional stain of buffer—in logical sync with the code I would
use to process microscopy images or explore some aspects of my theoretical work. The code I wrote was cryptic
with sparse documentation, confusing variable names, and paths to directories that sometimes had never existed.
All of this finally came to a head in one night of misery before my PhD candidacy exam. Needing to rerun some
piece of analysis from the previous year, I opened my Python script only to find that it was hard-coded to read
data from a folder on my Desktop that had been deleted several months prior. Frustrated and sleep deprived
at the time, I saw this as a personal scientific failure. I can now look back at that night with the certainty it
transformed the way I would approach my science for the rest of my PhD and beyond. After my candidacy
exam, I vowed I would never again burn myself (nor anybody else) by having my science be unorganized and
irreproducible.
Whether or not science is really experiencing a reproducibility crisis [1
3], accessing raw data or code from
other scientists is often a difficult endeavor. This has been recently demonstrated by Gabelica et al. [4] who
attempted to obtain data listed as “available upon reasonable request” from
1800 recent papers in the
biosciences. Of these articles, only
7% ultimately shared their data, meaning that 93% of the studies could
not pass even the first stage of reproduciblity. The reasons for this low response rate are varied, but are similar
to those published in another recent meta-analysis from Stodden et al. [5]. While more tightly focused in scope,
1
arXiv:2210.02593v1 [q-bio.OT] 5 Oct 2022
they had similar issues and were ultimately able to receive data and code from about 35% of their
200 queries.
Even when data was provided, the authors were able to reproduce the scientific results from only
60%. In
cases where data was not shared, the reasons varied from institutional/ethical restrictions to outright refusal
as their “code was not written with an eye toward distributing for other people to use.” (Ref. [5], p. 2585).
This can create a slew of problems. Trisovic et al. [6] recently demonstrated that only
25 % of code released
alongside research papers could be run without error. This represents a view of computation held by many
scientists; it’s an exercise in personal research, never intended to be used by someone else. This pulls me back to
that fateful night in preparing for my candidacy exam. Not only did I write that code without an eye towards
sharing with others, I didn’t even write it for my future self.
Recent years have seen a flurry of excellent papers outlining best practices for reproducible research, spanning
from scientific programming guidelines [7
10], to general and specialized data annotation [11,12], to instructions
for bundling entire projects as “reproducible packages” [13] and I encourage the reader to give them a look.
However, I take a different approach in this essay and give my perspective as a practicing biologist who thinks
about how to maximize reproducibility alongside designing, executing, and analyzing experiments.
Data as modern scientific currency
I view research as a journey with the generation, manipulation, visualization, and interpretation of data as
the overarching themes. Here, I take very general definition of “data” to mean “a collection of qualitative or
quantitative facts” such that results from simulations, mathematical analysis, and bench-top experiments are
treated equivalently as data-generating processes. While we often remark that the “data speak for themselves”,
this is never truly the case. Not only do you give the data their voices, you give them the language they speak.
Reproducibility requires a Rosetta stone such that anyone can perform the translation and come to the same
results.
Consider the “typical” cycle of science as depicted in Figure 1. Beginning with hypotheses, experiments are
designed to thoroughly test and falsify them
1
, resulting in the generation of new data. These data, whether
they come from tangible or computational experiments, often need to be manipulated through processing,
cleaning, and analysis pipelines before they can be truly understood. In all cases, these data must be visualized
in a way where the experimenter can use their expertise and logical creativity to interpret the results, allowing
conclusions to be drawn and the hypothesis to be confirmed, refuted, or refined. In the modern scientific
enterprise, each of these steps require a combination of instructions that are physical and targeted to humans
(protocols, observations, notes, etc.) and digital records which are computer-readable (code, instrument settings,
accession numbers, etc.). In order for this process to be reproducible, each of these steps must have their
instructions meticulously kept and clearly documented. With enough care, these instructions come together to
serve as your Rosetta stone.
Philosophical pillars for reproducibility
“Making your research reproducible” is easier to say than to do. Through my years of experience in prioritizing
reproducibility in my own work, I’ve found four key principles to be critical to performing my research in a
reproducible manner [Figure 2(A)]. While the detailed structure or the questions I pose may not be appropriate
for your particular project or experiment, the philosophy behind it will likely still apply. This allows you to make
a tailor-made reproducible workflow from the ground up in a way that others can follow.
1
In exploratory research, experiments are designed to properly collect data from which hypotheses will be drawn. In meta-analyses,
the “experiments” may be collection of data from previously published papers or other resources. In either case, the cycle shown in
Figure 1 still applies.
2
摘要:

BeProspective,NotRetrospective:APhilosophyforAdvancingReproducibilityinModernBiologicalResearchGrinChureDepartmentofBiology,StanfordUniversity,Stanford,CA,USA.griffinchure@gmail.comOctober7,2022AbstractTheubiquityofcomputationinmodernscienti cresearchinictsnewchallengesforreproducibility.Whilemostj...

展开>> 收起<<
Be Prospective Not Retrospective A Philosophy for Advancing Reproducibility in Modern Biological Research Grin Chure.pdf

共9页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:9 页 大小:2.91MB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 9
客服
关注