forked from hadley/adv-r
-
Notifications
You must be signed in to change notification settings - Fork 0
/
S4.Rmd
672 lines (472 loc) · 28.2 KB
/
S4.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
# S4
```{r setup, include = FALSE}
source("common.R")
source("emoji.R")
# Hide annoying output
setMethod <- function(...) invisible(methods::setMethod(...))
setGeneric <- function(...) invisible(methods::setGeneric(...))
setValidity <- function(...) invisible(methods::setValidity(...))
code <- function(...) paste0("`", ..., "`")
```
## Introduction
S4 provides a formal approach to functional OOP. The underlying ideas are similar to S3 (the topic of Chapter \@ref(s3)), but implementation is much stricter and makes use of specialised functions for creating classes (`setClass()`), generics (`setGeneric()`), and methods (`setMethod()`). Additionally, S4 provides both multiple inheritance (i.e. a class can have multiple parents) and multiple dispatch (i.e. method dispatch can use the class of multiple arguments).
An important new component of S4 is the __slot__, a named component of the object that is accessed using the specialised subsetting operator `@` (pronounced at). The set of slots, and their classes, forms an important part of the definition of an S4 class.
### Outline {-}
* Section \@ref(s4-basics) gives a quick overview of the main components of S4:
classes, generics, and methods.
* Section \@ref(s4-classes) dives into the details of S4 classes, including
prototypes, constructors, helpers, and validators.
* Section \@ref(s4-generics) shows you how to create new S4 generics, and
how to supply those generics with methods. You'll also learn about
accessor functions which are designed to allow users to safely inspect and
modify object slots.
* Section \@ref(s4-dispatch) dives into the full details of method dispatch
in S4. The basic idea is simple, the it rapidly gets more complex once
multiple inheritance and multiple dispatch are combined.
* Section \@ref(s4-s3) discusses the interaction between S4 and S3, showing
you how to use them together.
### Learning more {-}
Like the other OO chapters, the focus here will be on S4 works, not how to deploy it most effectively. If you do want to use it in practice, there are two main challenges:
* There is no one reference that will answer all your questions about S4.
* R's built-in documentation sometimes clashes with community best practices.
As you move towards more advanced usage, you will need to piece together needed information by carefully reading the documentation, asking questions on StackOverflow, and performing experiments. Some recommendations:
* The Bioconductor community is a long-term user S4 and have produced much of
best material about its effective use. Start with [S4 classes and
methods][bioc-s4-class] taught by Martin Morgan and Hervé Pagès, or
check for a newer version at [Bioconductor course materials][bioc-courses].
Martin Morgan is a member of R-core and the project lead of Bioconductor.
He's a world expert on the practical use of S4, and I recommend reading
anything he has written about it, starting with the questions he has
answered on [stackoverflow][SO-Morgan].
* John Chambers is the author of the S4 system, and provides an overview
of its motivation and historical context in "Object-oriented programming,
functional programming and R" [@chambers-2014]. For a fuller exploration
of S4, see his book "Software for Data Analysis" [@s4da].
### Prerequisites {-}
All functions related to S4 live in the methods package. This package is always available when you're running R interactively, but may not be available when running R in batch mode, i.e. from `Rscript`[^Rscript]. For this reason, it's a good idea to call `library(methods)` whenever you use S4. This also signals to the reader that you'll be using the S4 object system.
[^Rscript]: This is a historical quirk introduced because the methods package used to take a long time to load and `Rscript` is optimised for fast command line invocation.
```{r}
library(methods)
```
## Basics {#s4-basics}
We'll start with a quick overview of the main components of S4. You define a S4 class by calling `setClass()` with the class name and a definition of its slots, the names and classes of the class data:
```{r}
setClass("Person",
slots = c(
name = "character",
age = "numeric"
)
)
```
Once the class is defined, you can construct new objects from it by calling `new()` with the name of the class and a value for each slot:
```{r}
john <- new("Person", name = "John Smith", age = NA_real_)
```
\indexc{"@}
\indexc{slot()}
\index{subsetting!S4}
\index{S4!subsetting}
Given an S4 object you can see it's class with `is()` and access slots with `@` (equivalent to `$`) and `slot()` (equivalent to `[[`):
```{r}
is(john)
john@name
slot(john, "age")
```
Generally, you should only use `@` in your methods. If you're working with someone else's class, look for __accessor__ functions that allow you to safely set and get slot values. As the developer of a class, you should also provide your own accessor functions. Accessors are typically S4 generics allowing multiple classes to share the same external interface.
Here we'll create a setter and getter for the `age` slot by first creating generics with `setGeneric()`:
```{r}
setGeneric("age", function(x) standardGeneric("age"))
setGeneric("age<-", function(x, value) standardGeneric("age<-"))
```
And then defining methods with `setMethod()`:
```{r}
setMethod("age", "Person", function(x) x@age)
setMethod("age<-", "Person", function(x, value) {
x@age <- value
x
})
age(john) <- 50
age(john)
```
If you're using an S4 class defined in a package, you can get help on it with `class?Person`. To get help for a method, put `?` in front of a call (e.g. `?age(john)`) and `?` will use the class of the arguments to figure out which help file you need.
Finally, you can use sloop functions to identify S4 objects and generics found in the wild:
```{r}
sloop::otype(john)
sloop::ftype(age)
```
### Exercises
1. `lubridate::period()` returns an S4 class. What slots does it have?
What class is each slot? What accessors does it provide?
1. What other ways can you find help for a method? Read `?"?"` and
summarise the details.
## Classes {#s4-classes}
\index{classes!S4}
\index{S4!classes}
\indexc{setClass()}
To define an S4 class, call `setClass()` with three arguments:
* The class __name__. By convention, S4 class names use `UpperCamelCase`.
* A named character vector that describes the names and classes of the
__slots__ (fields). For example, a person might be represented by a character
name and a numeric age: `c(name = "character", age = "numeric")`. The
pseudo-class `ANY` allows a slot to accept objects of any type.
* A __prototype__, a list of default values for each slot. Technically,
the prototype is optional[^prototype-docs], but you should always provide it.
[^prototype-docs]: `?setClass` recommends that you avoid the `prototype` argument, but this is generally considered to be bad advice.
The code below illustrates the three arguments by creating a `Person` class with character `name` and numeric `age` slots.
```{r, cache = FALSE}
setClass("Person",
slots = c(
name = "character",
age = "numeric"
),
prototype = list(
name = NA_character_,
age = NA_real_
)
)
me <- new("Person", name = "Hadley")
str(me)
```
### Inheritance
\index{S4!inheritance}
\index{inheritance!S4}
There is one other important argument to `setClass()`: `contains`. This specifies a class (or classes) to inherit slots and behaviour from. For example, we can create an `Employee` class that inherits from the `Person` class, adding an extra slot describes their `boss`.
```{r}
setClass("Employee",
contains = "Person",
slots = c(
boss = "Person"
),
prototype = list(
boss = new("Person")
)
)
str(new("Employee"))
```
`setClass()` has 9 other arguments but they are either deprecated or not recommended.
### Introspection
\index{S4!introspection}
\indexc{is()}
To determine what classes an object inherits from, use `is()`:
```{r}
is(new("Person"))
is(new("Employee"))
```
To test if an object inherits from a specific class, use the second argument of `is()`:
```{r}
is(john, "person")
```
### Redefinition
In most programming languages, class definition occurs at compile-time and object construction occurs later, at run-time. In R, however, both definition and construction occur at run time. When you call `setClass()`, you are registering a class definition in a (hidden) global variable. As with all state-modifying functions you need to use `setClass()` with care. It's possible to create invalid objects if you redefine a class after already having instantiated an object:
```{r, error = TRUE}
setClass("A", slots = c(x = "numeric"))
a <- new("A", x = 10)
setClass("A", slots = c(a_different_slot = "numeric"))
a
```
This can cause confusion during interactive creation of new classes. (R6 classes have the same problem, as described in Section \@ref(r6-important-methods).)
### Helper
\index{helpers!S4}
\index{constructors!S4}
\index{S4!helpers}
\indexc{new()}
`new()` is a low-level constructor suitable for use by you, the developer. User-facing classes should always be paired with a user-friendly helper. A helper should always:
* Have the same name as the class, e.g. `myclass()`.
* Have a thoughtfully crafted user interface with carefully chosen default
values and useful conversions.
* Create carefully crafted error messages tailored towards an end-user.
* Finish by calling by calling `methods::new()`.
The `Person` class is so simple so a helper is almost superfluous, but we can use it to clearly define the contract: `age` is optional but `name` is required. We'll also coerce age to a double so the helper also works when passed an integer.
```{r}
Person <- function(name, age = NA) {
age <- as.double(age)
new("Person", name = name, age = age)
}
Person("Hadley")
```
### Validator
\index{validators!S4}
\index{S4!validators}
\indexc{setValidity()}
The constructor automatically checks that the slots have correct classes:
```{r, error = TRUE}
Person(mtcars)
```
You will need to implement more complicated checks (i.e. checks that involve lengths, or multiple slots) yourself. For example, we might want to make it clear that the Person class is a vector class, and can store data about multiple people. That's not currently clear because `@name` and `@age` can be different lengths:
```{r}
Person("Hadley", age = c(30, 37))
```
To enforce these additional constraints we write a validator with `setValidity()`. It takes class and a function that returns `TRUE` if the input is valid, and otherwise returns a character vector describing the problem(s):
```{r}
setValidity("Person", function(object) {
if (length(object@name) != length(object@age)) {
"@name and @age must be same length"
} else {
TRUE
}
})
```
Now we can no longer create an invalid object:
```{r, error = TRUE}
Person("Hadley", age = c(30, 37))
```
NB: The validity method is only called automatically by `new()`, so you can still create an invalid object by modifying it:
```{r}
alex <- Person("Alex", age = 30)
alex@age <- 1:10
```
\indexc{validObject()}
You can explicitly check the validity yourself by calling `validObject()`:
```{r, error = TRUE}
validObject(alex)
```
In Section \@ref(accessors), we'll use `validObject()` to create accessors can not create invalid objects.
### Exercises
1. Extend the Person class with fields to match `utils::person()`.
Think about what slots you will need, what class each slot should have,
and what you'll need to check in your validity method.
1. What happens if you define a new S4 class that doesn't have any slots?
(Hint: read about virtual classes in `?setClass`.)
1. Imagine you were going to reimplement factors, dates, and data frames in
S4. Sketch out the `setClass()` calls that you would use to define the
classes. Think about appropriate `slots` and `prototype`.
## Generics and methods {#s4-generics}
\index{S4!generics}
\index{generics!S4}
\indexc{setGeneric()}
\indexc{standardGeneric()}
The job of a generic is to perform method dispatch, i.e. find the specific implementation for the combination of classes passed to the generic. Here you'll learn how to define S4 generics and methods, then in the next section we'll explore precisely how S4 method dispatch works.
To create a new S4 generic, call `setGeneric()` with a function that calls `standardGeneric()`:
```{r}
setGeneric("myGeneric", function(x) standardGeneric("myGeneric"))
```
By convention, new S4 generics should use `lowerCamelCase`.
It is bad practice to use `{}` in the generic as it triggers a special case that is more expensive, and generally best avoided.
```{r}
# Don't do this!
setGeneric("myGeneric", function(x) {
standardGeneric("myGeneric")
})
```
### Signature
\index{signature}
Like `setClass()`, `setGeneric()` has many other arguments. There is only one that you need to know about: `signature`. This allows you to control the arguments that are used for method dispatch. If `signature` is not supplied, all arguments (apart from `...`) are used. It is occasionally useful to remove arguments from dispatch. This allows you to require that methods provide arguments like `verbose = TRUE` or `quiet = FALSE`, but they don't take part in dispatch.
```{r}
setGeneric("myGeneric",
function(x, ..., verbose = TRUE) standardGeneric("myGeneric"),
signature = "x"
)
```
### Methods
\indexc{setMethod()}
\index{methods!S4}
\index{S4!methods}
A generic isn't useful without some methods, and in S4 you define methods with `setMethod()`. There are three important arguments: the name of the generic, the name of the class, and the method itself.
```{r}
setMethod("myGeneric", "Person", function(x) {
# method implementation
})
```
More formally, the second argument to `setMethod()` is called the __signature__. In S4, unlike S3, the signature can include multiple arguments. This makes method dispatch in S4 substantially more complicated, but avoids having to implement double-dispatch as specially. We'll talk more about multiple dispatch in the next section. `setMethod()` has other arguments, but you should never use them.
To list all the methods that belong to a generic, or that are associated with a class, use `methods("generic")` or `methods(class = "class")`; to find the implementation of a specific method, use `selectMethod("generic", "class")`.
### Show method
\indexc{show()}
\index{S4!show()@\texttt{show()}}
The most commonly defined S4 method controls printing is `show()`, which controls how the object appears when it is printed. To define a method for an existing generic, you must first determine the arguments. You can get those from the documentation or by looking at the `args()` of the generic:
```{r}
args(getGeneric("show"))
```
Our show method needs to have a single argument `object`:
```{r}
setMethod("show", "Person", function(object) {
cat(is(object)[[1]], "\n",
" Name: ", object@name, "\n",
" Age: ", object@age, "\n",
sep = ""
)
})
john
```
### Accessors
\index{S4!accessors}
Slots should be considered an internal implementation detail: they can change without warning and user code should avoid accessing them directly. Instead, all user-accessible slots should be accompanied by a pair of __accessors__. If the slot is unique to the class, this can just be a function:
```{r}
person_name <- function(x) x@name
```
Typically, however, you'll define a generic so that multiple classes can use the same interface:
```{r}
setGeneric("name", function(x) standardGeneric("name"))
setMethod("name", "Person", function(x) x@name)
name(john)
```
If the slot is also writeable, you should provide a setter function. You should always `validObject()` in the setter to prevent the user from creating invalid objects.
```{r, error = TRUE}
setGeneric("name<-", function(x, value) standardGeneric("name<-"))
setMethod("name<-", "Person", function(x, value) {
x@name <- value
validObject(x)
x
})
name(john) <- "Jon Smythe"
name(john)
name(john) <- letters
```
(If the `name<-` notation is unfamiliar, review Section \@ref(function-forms).)
### Exercises
1. Add `age()` accessors for the `Person` class.
1. In the definition of the generic, why is it necessary to repeat the
name of the generic twice?
1. Why does the `show()` method defined in Section \@ref(show-method) use
`is(object)[[1]]`. (Hint: try printing the employee subclass.)
1. What happens if you define a method with different argument names to
the generic?
## Method dispatch {#s4-dispatch}
\index{S4!method dispatch}
\index{method dispatch!S4}
S4 dispatch is complicated because S4 has two important features:
* Multiple inheritance, i.e. a class can have multiple parents,
* Multiple dispatch, i.e. a generic can use multiple arguments to pick a method.
These features make S4 very powerful, but can also make it hard to understand which method will get selected for a given combination of inputs. In practice, keep method dispatch as simple as possible by avoiding multiple inheritance, and reserving multiple dispatch only for where it is absolutely necessary.
But it's important to describe the full details, so here we'll start simple with single inheritance and single dispatch, and work our way up to the more complicated cases. To illustrate the ideas without getting bogged down in the details, we'll use an imaginary __class graph__ based on emoji:
```{r, echo = FALSE, out.width = NULL}
knitr::include_graphics("diagrams/s4/emoji.png")
```
Emoji give us very compact class names that evoke the relationships between the classes. It should be straightforward to remember that `r emoji("stuck_out_tongue_winking_eye")` inherits from `r emoji("wink")` which inherits from `r emoji("no_mouth")`, and that `r emoji("sunglasses")` inherits from both `r emoji("dark_sunglasses")` and `r emoji("slightly_smiling_face")`.
### Single dispatch
\index{S4!single dispatch}
Let's start with the simplest case: a generic function that dispatches on a single class with a single parent. The method dispatch here is simple so it's a good place to define the graphical conventions we'll use for the more complex cases.
```{r, echo = FALSE, out.width = NULL}
knitr::include_graphics("diagrams/s4/single.png")
```
There are two parts to this diagram:
* The top part, `f(...)`, defines the scope of the diagram. Here we have a
generic with one argument, that has a class hierarchy that is three levels
deep.
* The bottom part is the __method graph__ and displays all the possible methods
that could be defined. Methods that exist, i.e. that have been defined with
`setMethod()`, have a grey background.
To find the method that gets called, you start with the most specific class of the actual arguments, then follow the arrows until you find a method that exists. For example, if you called the function with an object of class `r emoji("wink")` you would follow the arrow right to find the method defined for the more general `r emoji("no_mouth")` class. If no method is found, method dispatch has failed and an error is thrown. In practice, this means that you should alway define methods defined for the terminal nodes, i.e. those on the far right.
\index{S4!pseudo-classes}
\indexc{ANY}
There are two __pseudo-classes__ that you can define methods for. These are called pseudo-classes because they don't actually exist, but allow you to define useful behaviours. The first pseudo-class is `ANY` which matches any class[^s3-default]. For technical reasons that we'll get to later, the link to the `ANY` method is longer than the links between the other classes:
[^s3-default]: The S4 `ANY` pseudoclass plays the same same role as the S3 `default` pseudo-class.
```{r, echo = FALSE, out.width = NULL}
knitr::include_graphics("diagrams/s4/single-any.png")
```
\indexc{MISSING}
The second pseudo-class is `MISSING`. If you define a method for this pseudo-class, it will match whenever the argument is missing. It's not useful for single dispatch, but is important for functions like `+` and `-` that use double dispatch and behave differently depending on whether they have one or two arguments.
### Multiple inheritance
\index{S4!multiple inheritance}
\index{multiple inheritance}
Things get more complicated when the class has multiple parents.
```{r, echo = FALSE, out.width = NULL}
knitr::include_graphics("diagrams/s4/multiple.png")
```
The basic process remains the same: you start from the actual class supplied to the generic, then follow the arrows until you find a defined method. The wrinkle is that now there are multiple arrows to follow, so you might find multiple methods. If that happens, you pick the method that is closest, i.e. requires travelling the fewest arrows.
NB: while the method graph is a powerful metaphor for understanding method dispatch, implementing it in this way would be rather inefficient, so the actual approach that S4 uses is somewhat different. You can read the details in `?Methods_Details`
What happens if methods are the same distance? For example, imagine we've defined methods for `r emoji("dark_sunglasses")` and `r emoji("slightly_smiling_face")`, and we call the generic with `r emoji("sunglasses")`. Note that no method can be found for the `r emoji("no_mouth")` class, which I'll highlight with a red double outline.
```{r, echo = FALSE, out.width = NULL}
knitr::include_graphics("diagrams/s4/multiple-ambig.png")
```
This is called an __ambiguous__ method, and in diagrams I'll illustrate it with a thick dotted border. When this happens in R, you'll get a warning, and the method for the class that comes earlier in the alphabet will be picked (this is effectively random and should not be relied upon). When you discover ambiguity you should always resolve it by providing a more precise method:
```{r, echo = FALSE, out.width = NULL}
knitr::include_graphics("diagrams/s4/multiple-ambig-2.png")
```
The fallback `ANY` method still exists but the rules are little more complex. As indicated by the wavy dotted lines, the `ANY` method is always considered further away than a method for a real class. This means that it will never contribute to ambiguity.
```{r, echo = FALSE, out.width = NULL}
knitr::include_graphics("diagrams/s4/multiple-any.png")
```
With multiple inheritances it is hard to simultaneously prevent ambiguity, ensure that every terminal method has an implementation, and minimise the number of defined methods (in order to benefit from OOP). For example, of the six ways to define only two methods for this call, only one is free from problems. For this reason, I recommend using multiple inheritance with extreme care: you will need to carefully think about the method graph and plan accordingly.
```{r, echo = FALSE, out.width = NULL}
knitr::include_graphics("diagrams/s4/multiple-all.png")
```
### Multiple dispatch
\index{S4!multiple dispatch}
\index{multiple dispatch}
Once you understand multiple inheritance, understanding multiple dispatch is straightforward. You follow multiple arrows in the same way as previously, but now each method is specified by two classes (separated by a comma).
```{r, echo = FALSE, out.width = NULL}
knitr::include_graphics("diagrams/s4/single-single.png")
```
I'm not going to show examples of dispatching on more than two arguments, but you can follow the basic principles to generate your own method graphs.
The main difference between multiple inheritance and multiple dispatch is that there are many more arrows to follow. The following diagram shows four defined methods which produce two ambiguous cases:
```{r, echo = FALSE, out.width = NULL}
knitr::include_graphics("diagrams/s4/single-single-ambig.png")
```
Multiple dispatch tends to be less tricky to work with than multiple inheritance because there are usually fewer terminal class combinations. In this example, there's only one. That means, at a minimum, you can define a single method and have default behaviour for all inputs.
### Multiple dispatch and multiple inheritance
Of course you can combine multiple dispatch with multiple inheritance:
```{r, echo = FALSE, out.width = NULL}
knitr::include_graphics("diagrams/s4/single-multiple.png")
```
A still more complicated case dispatches on two classes, both of which have multiple inheritance:
```{r, echo = FALSE, out.width = NULL}
knitr::include_graphics("diagrams/s4/multiple-multiple.png")
```
As the method graph gets more and more complicated it gets harder and harder to predict which method will get called given a combination of inputs, and it gets harder and harder to make sure that you haven't introduced ambiguity. If you have to draw diagrams to figure out what method is actually going to be called, it's a strong indication that you should go back and simplify your design.
### Exercises
1. Draw the method graph for
`r paste0(code("f("), emoji("sweat_smile"), ", ", emoji("kissing_cat"), code(")"))`.
1. Draw the method graph for
`r paste0(code("f("), emoji("smiley"), ", ", emoji("wink"), ", ", emoji("kissing_smiling_eyes"), code(")"))`.
1. Take the last example which shows multiple dispatch over two classes that
use multiple inheritance. What happens if you define a method for all
terminal classes? Why does method dispatch not save us much work here?
## S4 and S3 {#s4-s3}
\index{S4!working with S3}
\index{S3!working with S4}
When writing S4 code, you'll often need to interact with existing S3 classes and generics. This section describes how S4 classes, methods, and generics interact with existing code.
### Classes
\indexc{setOldClass()}
In `slots` and `contains` you can use S4 classes, S3 classes, or the implicit class (Section \@ref(implicit-class)) of a base type. To use an S3 class, you must first register it with `setOldClass()`. You call this function once for each S3 class, giving it the class attribute. For example, the following definitions are already provided by base R:
```{r, eval = FALSE}
setOldClass("data.frame")
setOldClass(c("ordered", "factor"))
setOldClass(c("glm", "lm"))
```
However, it's generally better to be more specific and provide a full S4 definition with `slots` and a `prototype`:
```{r, eval = FALSE}
setClass("factor",
contains = "integer",
slots = c(
levels = "character"
),
prototype = structure(
integer(),
levels = character()
)
)
setOldClass("factor", S4Class = "factor")
```
Generally, these definitions should be provided by the creator of the S3 class. If you're trying to build an S4 class on top of an S3 class provided by a package, you should request that the package maintainer add this call to their package, rather than adding it to your own code.
If an S4 object inherits from an S3 class or a base type, it will have a special virtual slot called `.Data`. This contains the underlying base type or S3 object: \indexc{.Data}
```{r}
RangedNumeric <- setClass(
"RangedNumeric",
contains = "numeric",
slots = c(min = "numeric", max = "numeric"),
prototype = structure(numeric(), min = NA_real_, max = NA_real_)
)
rn <- RangedNumeric(1:10, min = 1, max = 10)
rn@min
rn@.Data
```
It is possible to define S3 methods for S4 generics, and S4 methods for S3 generics (provided you've called `setOldClass()`). However, it's more complicated than it might appear at first glance, so make sure you thoroughly read `?Methods_for_S3`.
### Generics
\indexc{setGeneric()}
As well as creating a new generic from scratch, it's also possible to convert an existing S3 generic to an S4 generic:
```{r}
setGeneric("mean")
```
In this case, the existing function becomes the default (`ANY`) method:
```{r}
selectMethod("mean", "ANY")
```
NB: `setMethod()` will automatically call `setGeneric()` if the first argument isn't already a generic, enabling you to turn any existing function into an S4 generic. It is ok to convert an existing S3 generic to S4, but you should avoid converting regular functions to S4 generics in packages because requires careful coordination if done by multiple packages.
### Exercises
1. What would a full `setOldClass()` definition look like for an ordered
factor? (i.e. add `slots` and `prototype` the definition above)
1. Define a `length` method for the `Person` class.
[SO-Morgan]: http://stackoverflow.com/search?tab=votes&q=user%3a547331%20%5bs4%5d%20is%3aanswe
[bioc-courses]: https://bioconductor.org/help/course-materials/
[bioc-s4-class]: https://bioconductor.org/help/course-materials/2017/Zurich/S4-classes-and-methods.html
[bioc-s4-overview]: https://bioconductor.org/packages/devel/bioc/vignettes/S4Vectors/inst/doc/S4QuickOverview.pdf