Reproducibility

A very brief overview

Research Associate | Webber Group

2024-03-14

Follow the slides online

Academic journals suck

  • Journal status and paper quality are poorly correlated1,2
  • Space is dominated by 5 big publication houses - they are evil3
  • Strong bias against negative results
  • Do they spend their staggering profits checking for obvious signs of fraud, encourage replication or even just make science more readable and accessible? Of course not.4
  • Nice documentary on the business of scholarship here5

GUIs suck

  • Graphical user interface (GUI) tools like Excel, SPSS & Graphpad are very opaque and error prone, as our government learnt during COVID6
  • The Excel mistake heard around the world and the lasting economic repercussions7
  • Propriety software - many people can’t access it and therefore can’t replicate analysis
  • No obvious history of changes made or operations performed

Stats suck

  • Frequentist statistics is used almost exclusively for all science
    • It is extremely unintuitive and prone to abuse and is rarely done correctly in practise (p-hacking)8
    • Great video on why p-values are hella variable9
  • Bayesian statistics is a fundamentally different approach, no ground truth assumptions so no pvalues and no p-hacking

Experimental designs suck

  • Positive control? pretty pls?

Most published research is false

  • Whenever people look at this, things don’t look great…8,1012
  • Citations aren’t a good metric of quality either13
  • Small sample sizes are a big issue, especially in neuroscience14
  • That time a major paper that supposedly discovered A\(\beta\)*56 a oligermer species, turned out to be full of image manipulations, whoops15

An investigator cannot guarantee that the claims made in a study are correct

  • Reproducibility is important not because it ensures that the results are correct, but rather because it ensures transparency and gives us confidence in understanding exactly what was done.

Solutions

Registered reports

  • “But the big IF journals don’t do registered reports”
  • Yeah, they’re evil remember? Of course they won’t do anything to make science not suck
    • Also, Nature does do them now (they’re still evil though)

An idealised workflow - part 1

  1. Come up with neat project idea - create repository on DRI GitHub (can be private)
    • Use this to organise the project (add collaborators, make Gantt charts, etc.)
  2. pre-register study (can be embargoed) and write up registered report
    • Can do this on OSF
  3. publish registered report > get valuable feedback from wise and courteous reviewers (let me dream pls)
    • In a diamond open access journal ideally!

An idealised workflow - part 2

  1. Apply for funding if you don’t already have it
    • Some funders are finally pulling their heads out and formally incorporating this into applications
  2. Do study as you said you would (and do any additional work/exploratory stuff as it comes up!)
  3. Publish all the things!
    • Paper, data, code, container of computational environment (Can use Zendo to generate DOIs for non-paper bits!)

Let there be code

  • Code is just a set of instructions to tell a computer to do something
  • It’s an explicit bridge between your raw experimental data (which should be sacrosanct!) and your reported stats/figures
  • Great video guide here16
  • Version control: wouldn’t it be nice to have a detailed record of all the changes made to all the files in a project when and by whom? Use Git and the DRI GitHub!
  • Use whatever, as long as it’s open-source
  • Containerisation: encapsulate your computational environment17

Particualr tool suggestions/learning resources

Data sharing policy

Thanks for listening

References

1.
2.
Brembs, B. Prestigious science journals struggle to reach even average reliability. Frontiers in Human Neuroscience 12, (2018).
3.
Racimo, F. et al. Ethical publishing: How do we get there? Philosophy, Theory, and Practice in Biology 14, (2022).
4.
5.
6.
7.
Ryssdal, K. The excel mistake heard round the world. Marketplace (2013).
8.
Head, M. L., Holman, L., Lanfear, R., Kahn, A. T. & Jennions, M. D. The extent and consequences of p-hacking in science. PLOS Biology 13, e1002106 (2015).
9.
10.
Baker, M. 1,500 scientists lift the lid on reproducibility. Nature 533, 452–454 (2016).
11.
Ioannidis, J. P. A. Why most published research findings are false. PLOS Medicine 2, e124 (2005).
12.
Smaldino, P. E. & McElreath, R. The natural selection of bad science. Royal Society Open Science 3, 160384 (2016).
13.
Yang, Y., Youyou, W. & Uzzi, B. Estimating the deep replicability of scientific findings using human and artificial intelligence. Proceedings of the National Academy of Sciences 117, 10762–10768 (2020).
14.
Button, K. S. et al. Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience 14, 365–376 (2013).
15.
16.
Çetinkaya-Rundel, M. Improve your workflow for reproducible science. (2020).
17.
Nüst, D. et al. Ten simple rules for writing dockerfiles for reproducible data science. PLOS Computational Biology 16, e1008316 (2020).
18.
Hester, the S. 545. Ta., Jenny Bryan. Let’s git started | happy git and GitHub for the useR.
19.