Advanced R by Hadley Wickham


How to write a reproducible example

You are most likely to get good help with your R problem if you provide a reproducible example. A reproducible example allows someone else to recreate your problem by just copying and pasting R code.

There are four things you need to include to make your example reproducible: required packages, data, code, and a description of your R environment.

  • Packages should be loaded at the top of the script, so it’s easy to see which ones the example needs.

  • The easiest way to include data in an email is to use dput() to generate the R code to recreate it. For example, to recreate the mtcars dataset in R, I’d perform the following steps:

  1. Run dput(mtcars) in R
  2. Copy the output
  3. In my reproducible script, type mtcars <- then paste.
  • Spend a little bit of time ensuring that your code is easy for others to read:

  • make sure you’ve used spaces and your variable names are concise, but informative

  • use comments to indicate where your problem lies

  • do your best to remove everything that is not related to the problem. The shorter your code is, the easier it is to understand.

  • Include the output of sessionInfo() as a comment. This summarises your R environment and makes it easy to check if you’re using an out-of-date package.

You can check you have actually made a reproducible example by starting up a fresh R session and pasting your script in.

Before putting all of your code in an email, consider putting it on http://gist.github.com/. It will give your code nice syntax highlighting, and you don’t have to worry about anything getting mangled by the email system.

Example

Here’s an illustration of how to create a reproducible example. First, have R print out your data in a format that can be copy-pasted:

# For this example, use the built-in BOD data set. Replace this with your data.
dput(BOD)
#> structure(list(Time = c(1, 2, 3, 4, 5, 7), demand = c(8.3, 10.3, 
#> 19, 16, 15.6, 19.8)), class = "data.frame", row.names = c(NA, 
#> -6L), reference = "A1.4, p. 270")

Then you can use that output to create a reproducible example:

library(ggplot2)

# Save the data structure in variable BOD
BOD <- structure(list(Time = c(1, 2, 3, 4, 5, 7), demand = c(8.3, 10.3,
19, 16, 15.6, 19.8)), .Names = c("Time", "demand"), row.names = c(NA,
-6L), class = "data.frame", reference = "A1.4, p. 270")

# Some example code that uses the data
ggplot(BOD, aes(x=Time, y=demand)) + geom_line()

Check that others can run this code by simply copying and pasting it in a new R sesion.