Reproducibility

Follow the slides online

Slides are here: https://h-mateus.github.io/presentations_dri-ecr-vascular_reproducibility_2024-03-14/2024-02-26_ecr-vascular-meeting_reproducibility.html

Academic journals suck

Journal status and paper quality are poorly correlated^1,2
Space is dominated by 5 big publication houses - they are evil³
Strong bias against negative results
Do they spend their staggering profits checking for obvious signs of fraud, encourage replication or even just make science more readable and accessible? Of course not.⁴
Nice documentary on the business of scholarship here⁵

GUIs suck

Graphical user interface (GUI) tools like Excel, SPSS & Graphpad are very opaque and error prone, as our government learnt during COVID⁶
The Excel mistake heard around the world and the lasting economic repercussions⁷
Propriety software - many people can’t access it and therefore can’t replicate analysis
No obvious history of changes made or operations performed

Stats suck

Frequentist statistics is used almost exclusively for all science
- It is extremely unintuitive and prone to abuse and is rarely done correctly in practise (p-hacking)⁸
- Great video on why p-values are hella variable⁹
Bayesian statistics is a fundamentally different approach, no ground truth assumptions so no pvalues and no p-hacking

Experimental designs suck

Positive control? pretty pls?

Most published research is false

Whenever people look at this, things don’t look great…^8,10–12
Citations aren’t a good metric of quality either¹³
Small sample sizes are a big issue, especially in neuroscience¹⁴
That time a major paper that supposedly discovered A\(\beta\)*56 a oligermer species, turned out to be full of image manipulations, whoops¹⁵

An investigator cannot guarantee that the claims made in a study are correct

Reproducibility is important not because it ensures that the results are correct, but rather because it ensures transparency and gives us confidence in understanding exactly what was done.

Solutions

Registered reports

Largely solves two of the biggest issues - post-hoc hypothesising/data massaging and inability to publish negative results
Make it diamond open access and you can stick it to the evil publishers too!
- Check out PCI: https://rr.peercommunityin.org/

“But the big IF journals don’t do registered reports”
Yeah, they’re evil remember? Of course they won’t do anything to make science not suck
- Also, Nature does do them now (they’re still evil though)

An idealised workflow - part 1

Come up with neat project idea - create repository on DRI GitHub (can be private)
- Use this to organise the project (add collaborators, make Gantt charts, etc.)
pre-register study (can be embargoed) and write up registered report
- Can do this on OSF
publish registered report > get valuable feedback from wise and courteous reviewers (let me dream pls)
- In a diamond open access journal ideally!

An idealised workflow - part 2

Apply for funding if you don’t already have it
- Some funders are finally pulling their heads out and formally incorporating this into applications
Do study as you said you would (and do any additional work/exploratory stuff as it comes up!)
Publish all the things!
- Paper, data, code, container of computational environment (Can use Zendo to generate DOIs for non-paper bits!)

Let there be code

Code is just a set of instructions to tell a computer to do something
It’s an explicit bridge between your raw experimental data (which should be sacrosanct!) and your reported stats/figures
Great video guide here¹⁶
Version control: wouldn’t it be nice to have a detailed record of all the changes made to all the files in a project when and by whom? Use Git and the DRI GitHub!
Use whatever, as long as it’s open-source
Containerisation: encapsulate your computational environment¹⁷

Particualr tool suggestions/learning resources

R or Python for data analysis and use literate programming methods like Quarto and/or Jupyter notebooks
If you want to do Bayesian but still want a GUI: JASP
Great guide on using Git with R: https://happygitwithr.com/¹⁸
A video guide to Bayes in R here¹⁹

Data sharing policy

Thanks for listening

Please don’t let the perfect be the enemy of the good!
Anything you can do to get closer to more transparent and reproducible research is a huge win as far as I’m concerned
Here’s a link to the slides again: https://h-mateus.github.io/presentations_dri-ecr-vascular_reproducibility_2024-03-14/2024-02-26_ecr-vascular-meeting_reproducibility.html

References

1.

Szucs, D. & Ioannidis, J. P. A. Empirical assessment of published effect sizes and power in the recent cognitive neuroscience and psychology literature. PLOS Biology 15, e2000797 (2017).

2.

Brembs, B. Prestigious science journals struggle to reach even average reliability. Frontiers in Human Neuroscience 12, (2018).

3.

Racimo, F. et al. Ethical publishing: How do we get there? Philosophy, Theory, and Practice in Biology 14, (2022).

4.

Buranyi, S. Is the staggeringly profitable business of scientific publishing bad for science? The Guardian (2017).

5.

Paywall: The business of scholarship. (2018).

6.

Hern, A. Covid: How excel may have caused loss of 16,000 test results in england. The Guardian (2020).

7.

Ryssdal, K. The excel mistake heard round the world. Marketplace (2013).

8.

Head, M. L., Holman, L., Lanfear, R., Kahn, A. T. & Jennions, M. D. The extent and consequences of p-hacking in science. PLOS Biology 13, e1002106 (2015).

9.

Cumming, G. Intro statistics 9 dance of the p values. (2013).

10.

Baker, M. 1,500 scientists lift the lid on reproducibility. Nature 533, 452–454 (2016).

11.

Ioannidis, J. P. A. Why most published research findings are false. PLOS Medicine 2, e124 (2005).

12.

Smaldino, P. E. & McElreath, R. The natural selection of bad science. Royal Society Open Science 3, 160384 (2016).

13.

Yang, Y., Youyou, W. & Uzzi, B. Estimating the deep replicability of scientific findings using human and artificial intelligence. Proceedings of the National Academy of Sciences 117, 10762–10768 (2020).

14.

Button, K. S. et al. Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience 14, 365–376 (2013).

15.

Piller, C. Potential fabrication in research images threatens key theory of alzheimer’s disease. (2022).

16.

Çetinkaya-Rundel, M. Improve your workflow for reproducible science. (2020).

17.

Nüst, D. et al. Ten simple rules for writing dockerfiles for reproducible data science. PLOS Computational Biology 16, e1008316 (2020).

18.

Hester, the S. 545. Ta., Jenny Bryan. Let’s git started | happy git and GitHub for the useR.

19.

Amsterdam, R.-. L. R-ladies amsterdam: Intro to bayesian statistics in r by angelika stefan. (2021).