Reproducible workflows

A proposal to turn ideals into action

Research Associate | Williams Group

2022-12-13

Most science is bad

Thanks to bad incentives

“Today I wouldn’t get an academic job. It’s as simple as that. I don’t think I would be regarded as productive enough” Peter Higgs (The Guardian, 6 Dec 2013)

“I’ve been on a number of search committees. I don’t remember anybody looking at anybody’s papers. Number and IF [impact factor] of pubs are what counts.” Terry McGlynn (realscientists) (21 October 2015, 4:12 p.m. Tweet.)

So many examples

Moving beyond idealogy

  • I propose we work towards building a practical workflow for embedding reproducibility throughout all projects:
    • Built around a GitHub organisation (Either at the centre or group level)
    • Employ self-documented templates for consistent folder structure
    • Agree on standardised file naming conventions (descriptive, human and machine readable, sane default ordering and search)

An example - you have a new hypothesis

  1. Copy the template to a new repository
  2. Begin planning: Use Project boards for building timelines and assigning tasks/deadlines to specific users (students/post-docs)
  3. First aim for a registered report
  4. Use Issues to track ideas, tasks and troubleshooting (protocols)
  5. Use literate programming approach (Quarto, Jupyter Notebooks, Org-mode, etc) for writing all documentation (notes, papers, slides, etc)

Incorporating funding

  • If funding isn’t already in place, one could use the registered report and repo to showcase budgets, timelines, lay summaries on your own terms (no need to adapt to asinine funder-specific structures)
  • Begin experiments as normal, being sure to continue using the GitHub repo to update progress (tasks completed, delays, troubleshooting, etc.)
  • Keep project contributions of any size logged on the repo (acknowledgement for technicians/RAs and others)

Write up and publish

  • Write up paper and publish as normal (ideally avoiding for-profit journals, and sticking to diamond open access)
  • Share raw data either on GitHub repo, or provide links to location on other repositories (and ideally pre-processed data)
  • Provide container with computational environment used for analysis (Docker/Singularity)

Thanks for listening

Any feedback or suggestions welcome!

Any questions?

  • Or scan this QR code:

References

Allaire, J. J., Charles Teague, Carlos Scheidegger, Yihui Xie, and Christophe Dervieux. 2022. “Quarto.” https://doi.org/10.5281/zenodo.5960048.
Baker, Monya. 2016. “1,500 Scientists Lift the Lid on Reproducibility.” Nature 533 (7604): 452–54. https://doi.org/10.1038/533452a.
Border, Richard, Emma C. Johnson, Luke M. Evans, Andrew Smolen, Noah Berley, Patrick F. Sullivan, and Matthew C. Keller. 2019. “No Support for Historical Candidate Gene or Candidate Gene-by-Interaction Hypotheses for Major Depression Across Multiple Large Samples.” The American Journal of Psychiatry 176 (5): 376–87. https://doi.org/10.1176/appi.ajp.2018.18070881.
Lesné, Sylvain, Ming Teng Koh, Linda Kotilinek, Rakez Kayed, Charles G. Glabe, Austin Yang, Michela Gallagher, and Karen H. Ashe. 2006. “A Specific Amyloid-\(\beta\) Protein Assembly in the Brain Impairs Memory.” Nature 440 (7082): 352–57. https://doi.org/10.1038/nature04533.
Marek, Scott, Brenden Tervo-Clemmens, Finnegan J. Calabro, David F. Montez, Benjamin P. Kay, Alexander S. Hatoum, Meghan Rose Donohue, et al. 2022. “Reproducible Brain-Wide Association Studies Require Thousands of Individuals.” Nature 603 (7902): 654–60. https://doi.org/10.1038/s41586-022-04492-9.
Markowetz, Florian. 2015. “Five Selfish Reasons to Work Reproducibly.” Genome Biology 16 (1): 274. https://doi.org/10.1186/s13059-015-0850-7.
Quintans, Desi. 2021. Librarian: Install, Update, Load Packages from CRAN, GitHub, and Bioconductor in One Step. https://github.com/DesiQuintans/librarian.
R Core Team. 2022. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Smaldino, Paul E., and Richard McElreath. 2016. “The Natural Selection of Bad Science.” Royal Society Open Science 3 (9): 160384. https://doi.org/10.1098/rsos.160384.
Teh, Victor, and Thierry Onkelinx. 2021a. Qrcode: Generate QRcodes with r. https://CRAN.R-project.org/package=qrcode.
———. 2021b. Qrcode: Generate QRcodes with r. Version 0.1.4. https://doi.org/10.5281/zenodo.5040088.
Xie, Yihui. 2014. “Knitr: A Comprehensive Tool for Reproducible Research in R.” In Implementing Reproducible Computational Research, edited by Victoria Stodden, Friedrich Leisch, and Roger D. Peng. Chapman; Hall/CRC. http://www.crcpress.com/product/isbn/9781466561595.
———. 2015. Dynamic Documents with R and Knitr. 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC. https://yihui.org/knitr/.
———. 2022. Knitr: A General-Purpose Package for Dynamic Report Generation in r. https://yihui.org/knitr/.