A Realistic Guide to Making Data Available Alongside Code to Improve Reproducibility.

Abstract

Data makes science possible. Sharing data improves visibility, and makes the research process transparent. This increases trust in the work, and allows for independent reproduction of results. However, a large proportion of data from published research is often only available to the original authors. Despite the obvious benefits of sharing data, and scientists' advocating for the importance of sharing data, most advice on sharing data discusses its broader benefits, rather than the practical considerations of sharing. This paper provides practical, actionable advice on how to actually share data alongside research. The key message is sharing data falls on a continuum, and entering it should come with minimal barriers.

Slide available here

Working paper available here

Take home messages

You don't have to do every single thing to publish your data
Take small steps - get the data somewhere first, add more detail as you go
Try and get a DOI from a service like Zenodo or Dryad

Thanks

Karthik Ram
Miles McBain
Anna Kystalli
Daniella Lowenberg
ACEMS International Mobility Programme
Helmsley Charitable Trust
Gordon and Betty Moore Foundation
Sloan Foundation

Resources

Colophon

Slides made using xaringan
Extended with xaringanthemer
Colours taken + modified from lorikeet theme from ochRe
Header font is Josefin Sans
Body text font is Montserrat
Code font is Fira Mono

Bio

Dr. Nicholas Tierney (PhD. Statistics, BPsySci (Honours)) is a Lecturer in Business Analytics and Statistics at Monash University, working with Professors Dianne Cook and Rob Hyndman. His research aims to improve data analysis workflow, and make data analysis more accessible. Crucial to this work is producing high quality software to accompany each research idea. Mostly recently, Nick's work is focussing on exploring longitudinal data (brolgar), and improving how we share data alongside research ( ddd). Other work has focussed on exploring data with the R package visdat, and on creating analysis principles and tools to simplify working with, exploring, and modelling missing data with the package naniar. Nick has experience working with decision trees (treezy), optimisation (maxcovr), Bayesian Data Analysis, and MCMC diagnostics (mmcc.

Nick is a member of the rOpenSci collective, which works to make science open using R, has been the lead organiser for the rOpenSci ozunconf events from 2016-2018 (2016, 2017, 2018), and co-hosts the rstats podcast "Credibly Curious" with Dr. Saskia Freytag. Outside of research, Nick likes to hike, rockclimb, make coffee, bake sourdough, (eventually) knit a hat, take photos, and explore new hobbies.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
slides		slides
.gitignore		.gitignore
README.md		README.md
numbat-data.Rproj		numbat-data.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Realistic Guide to Making Data Available Alongside Code to Improve Reproducibility.

Abstract

Slide available here

Working paper available here

Take home messages

Thanks

Resources

Colophon

Bio

About

Releases

Packages

Languages

njtierney/numbat-data

Folders and files

Latest commit

History

Repository files navigation

A Realistic Guide to Making Data Available Alongside Code to Improve Reproducibility.

Abstract

Slide available here

Working paper available here

Take home messages

Thanks

Resources

Colophon

Bio

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages