Advanced R by Hadley Wickham


Reference classes

R has three object oriented (OO) systems: [[S3]], [[S4]] and Reference Classes (where the latter were for a while referred to as [[R5]], yet their official name is Reference Classes). This page describes this new reference-based class system.

Reference Classes (or refclasses) are new in R 2.12. They fill a long standing need for mutable objects that had previously been filled by non-core packages like R.oo, proto and mutatr. While the core functionality is solid, reference classes are still under active development and some details will change. The most up-to-date documentation for Reference Classes can always be found in ?ReferenceClasses.

There are two main differences between reference classes and S3 and S4:

  • Refclass objects use message-passing OO
  • Refclass objects are mutable: the usual R copy on modify semantics do not apply

These properties makes this object system behave much more like Java and C#. Surprisingly, the implementation of reference classes is almost entirely in R code - they are a combination of S4 methods and environments. This is a testament to the flexibility of S4.

Particularly suited for: simulations where you’re modelling complex state, GUIs.

Note that when using reference based classes we want to minimise side effects, and use them only where mutable state is absolutely required. The majority of functions should still be “functional”, and side effect free. This makes code easier to reason about (because you don’t need to worry about methods changing things in surprising ways), and easier for other R programmers to understand.

Limitations: can’t use enclosing environment - because that’s used for the object.

Classes and instances

Creating a new reference based class is straightforward: you use setRefClass. Unlike setClass from S4, you want to keep the results of that function around, because that’s what you use to create new objects of that type:

# Or keep reference to class around.
Person <- setRefClass("Person")
Person$new()

A reference class has three main components, given by three arguments to setRefClass:

  • contains, the classes which the class inherits from. These should be other reference class objects:

    setRefClass("Polygon")
    setRefClass("Regular")
    
    # Specify parent classes
    setRefClass("Triangle", contains = "Polygon")
    setRefClass("EquilateralTriangle", 
      contains = c("Triangle", "Regular"))
  • fields are the equivalent of slots in S4. They can be specified as a vector of field names, or a named list of field types:

    setRefClass("Polygon", fields = c("sides"))
    setRefClass("Polygon", fields = list(sides = "numeric"))

The most important property of refclass objects is that they are mutable, or equivalently they have reference semantics:

    Polygon <- setRefClass("Polygon", fields = c("sides"))
    square <- Polygon$new(sides = 4)
    
    triangle <- square
    triangle$sides <- 3
    
    square$sides        
  • methods are functions that operate within the context of the object and can modify its fields. These can also be added after object creation, as described below.

    setRefClass("Dist")
    setRefClass("DistUniform", c("a", "b"), "Dist", methods = list(
      mean <- function() {
        (a + b) / 2
      }
    ))

You can also add methods after creation:

# Instead of creating a class all at once:
Person <- setRefClass("Person", methods = list(
  say_hello = function() message("Hi!")
))

# You can build it up piece-by-piece
Person <- setRefClass("Person")
Person$methods(say_hello = function() message("Hi!"))

It’s not currently possible to modify fields because adding fields would invalidate existing objects that didn’t have those fields.

The object returned by setRefClass (or retrieved later by getRefClass) is called a generator object. It has methods:

  • new for creating new objects of that class. The new method takes named arguments specifying initial values for the fields

  • methods for modifying existing or adding new methods

  • help for getting help about methods

  • fields to get a list of fields defined for class

  • lock locks the named fields so that their value can only be set once

  • accessors a convenience method that automatically sets up accessors of the form getXXX and setXXX.

Methods

Refclass methods are associated with objects, not with functions, and are called using the special syntax obj$method(arg1, arg2, ...). (You might recall we’ve seen this construction before when we called functions stored in a named list). Methods are also special because they can modify fields. This is different

We’ve also seen this construct before, when we used closures to create mutable state. Reference classes work in a similar manner but give us some extra functionality:

  • inheritance
  • a way of documenting methods
  • a way of specifying fields and their types

Modify fields with <<-. Will call accessor functions if defined.

Special fields: .self (Don’t use fields with names starting with . as these may be used for special purposes in future versions.)

initialize

Common methods

Because all refclass classes inherit from the same superclass, envRefClass, they a have common set of methods:

  • obj$callSuper:

  • obj$copy: creates a copy of the current object. This is necessary because Reference Classes classes don’t behave like most R objects, which are copied on assignment or modification.

  • obj$field: named access to fields. Equivalent to slots for S4. obj$field("xxx") the same as obj$xxx. obj$field("xxx", 5) the same as obj$xxx <- 5

  • obj$import(x) coerces into this object, and obj$export(Class) coerces a copy of obj into that class. These should be super classes.

  • obj$initFields

Documentation

Python style doc-strings. obj$help().

In packages

Note: collation Note: namespaces and exporting

In the wild

Rook package. Scales package?