How to write a reproducible example
You are most likely to get good help with your R problem if you provide a reproducible example. A reproducible example allows someone else to recreate your problem by just copying and pasting R code.
There are four things you need to include to make your example reproducible: required packages, data, code, and a description of your R environment.
Packages should be loaded at the top of the script, so it’s easy to see which ones the example needs.
The easiest way to include data in an email is to use dput() to generate the R code to recreate it. For example, to recreate the mtcars dataset in R, I’d perform the following steps:
- Run
dput(mtcars)
in R - Copy the output
- In my reproducible script, type
mtcars <-
then paste.
Spend a little bit of time ensuring that your code is easy for others to read:
make sure you’ve used spaces and your variable names are concise, but informative
use comments to indicate where your problem lies
do your best to remove everything that is not related to the problem. The shorter your code is, the easier it is to understand.
Include the output of sessionInfo() as a comment. This summarises your R environment and makes it easy to check if you’re using an out-of-date package.
You can check you have actually made a reproducible example by starting up a fresh R session and pasting your script in.
Before putting all of your code in an email, consider putting it on http://gist.github.com/. It will give your code nice syntax highlighting, and you don’t have to worry about anything getting mangled by the email system.
Example
Here’s an illustration of how to create a reproducible example. First, have R print out your data in a format that can be copy-pasted:
# For this example, use the built-in BOD data set. Replace this with your data.
dput(BOD)
#> structure(list(Time = c(1, 2, 3, 4, 5, 7), demand = c(8.3, 10.3,
#> 19, 16, 15.6, 19.8)), class = "data.frame", row.names = c(NA,
#> -6L), reference = "A1.4, p. 270")
Then you can use that output to create a reproducible example:
library(ggplot2)
# Save the data structure in variable BOD
BOD <- structure(list(Time = c(1, 2, 3, 4, 5, 7), demand = c(8.3, 10.3,
19, 16, 15.6, 19.8)), .Names = c("Time", "demand"), row.names = c(NA,
-6L), class = "data.frame", reference = "A1.4, p. 270")
# Some example code that uses the data
ggplot(BOD, aes(x=Time, y=demand)) + geom_line()
Check that others can run this code by simply copying and pasting it in a new R sesion.