Rcpp is not using `R_PreserveObject`/R_ReleaseObject` as intended

This is based on an issue found by @bhive01 and discussed with him
and @clauswilke in wilkelab/gridtext#9.

Running the code included at the end in R versions up through 4.0.0
with the C stack set to 1M will segfault with a C stack overflow.  The
same happens (for me on Ubuntu at least) with the default 8M stack if
you change the value of N to 1000.

Running this code as is (i.e. N == 500) and with an 8M stack will not
fail, but takes 14 minutes due to several multi-minute-long pauses in
the second loop.

The problem is that this code, via calls to `R_PreserveObject` in
Rcpp, puts around 300K objects into the preserved object list. This is
a simple linked list, so has to be searched linearly to remove objects
pushed on early. That is the reason for the long pauses. The segfault
happens because the code for `R_ReleaseObject` was using recursion to
work down the list. R-devel and R-patched now use iteration and no
longer segfault, but that doesn't help anyone running R 4.0.0 or
earlier.

The _R Extensions Manual_ says about the
`R_PreserveObject`/`R_ReleaseObject` mechanism:

> It is less efficient than the normal protection mechanism, and
> should be used sparingly.

The linked list data structure, and even the recursive delete, are
reasonable for this usage but are not suited for handling 300K objects
that may need to be removed in random or first in/first-out order.

Assuming you need to keep alive R objects, a better way to do this is
to create an appropriate data structure that you control, and protect
only that data structure with `R_PreserveObject` (and maybe release it
on unload with `R_ReleseObject`). That way you can use a more
appropriate data structure, such as a hash table. Using a hash table
the code that currently takes 14 minutes would only take about 1.5
minutes. The improvement is even larger for N == 1000.

Even if you don't want to go to a hash table now it would be a good
idea to switch to using your own list so users of R 4.0.0 or earlier
don't run the risk of a segfault.

For using your own list you need something like this 

```C
static SEXP Rcpp_precious = NULL;
	
void R_init_Rcpp(DllInfo* dllinfo) {
    Rcpp_precious = CONS(R_NilValue, R_NilValue);
	R_PreserveObject(Rcpp_precious);
   ...
}
```

Then replace `R_PreserveObject` and `R_ReleaseObject` with

```c
void Rcpp_PreserveObject(SEXP object)
{
    SETCDR(Rcpp_precious, CONS(object, CDR(Rcpp_precious)));
}

void Rcpp_ReleaseObject(SEXP object)
{
    SETCDR(Rcpp_precious, DeleteFromList(object, CDR(Rcpp_precious)));
}
```

where (this is now in R_devel/R_patched)

```c
static SEXP DeleteFromList(SEXP object, SEXP list)
{
    if (CAR(list) == object)
		return CDR(list);
    else {
        SEXP last = list;
	    for (SEXP head = CDR(list); head != R_NilValue; head = CDR(head)) {
	        if (CAR(head) == object) {
                SETCDR(last, CDR(head));
                return list;
	        }
	        else last = head;
	    }
	    return list;
    }
}
```

This maintains your own list, which is protected by being in the `CDR`
field of the protected `Rcpp_precious` list cell.  This will give you
the same reliability and performance across all R versions.  [Code is
not tested, so there may be typos, but that is design I recommend].

If you want better performance for a use pattern as exhibited in the
example here you could look at using a hash table stored in a `VECSXP`
placed in one of the cells of `Rcpp_precious`. In a quick and dirty
experiment a simple hash table did much better, but the code is of
course more complicated.

All that said, I have a feeling that it should be possible to do
better by using a combination of the protected fields in external
pointer objects and weak references. If you have a document describing
the design of your memory management I could take a look and see if I
can suggest something.

Here is the code to run, from @clauswilke in wilkelab/gridtext#9.
 
```r
library(ggplot2)
library(ggtext)
library(grid)

plot_grob <- function(df) {
  p <- ggplot(data.frame(x = 1, y = 1)) +
    geom_point(aes(x, y)) +
    labs(caption = "<br>Pink dots represent outliers and are removed from downstream analyses.<br>Error bars represent the standard error of the mean.<br>ANOVA model significance *p* < 0.05.<br>Treatments with the same letter are not significantly different at *α* = 0.05 according to Tukey's HSD. ") + 
    theme(plot.caption =  element_textbox_simple())
  ggplotGrob(p)
}

N <- 500

l <- list()
for (i in 1:N) {
  cat(i, " ")
  g <- plot_grob()
  grid.newpage()
  grid.draw(g)
  l[[i]] <- g
}

l <- NULL
l <- list()
for (i in 1:N) {
  cat(i, " ")
  g <- plot_grob()
  grid.newpage()
  grid.draw(g)
  l[[i]] <- g
}
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Rcpp is not using `R_PreserveObject`/R_ReleaseObject` as intended #1081

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Rcpp is not using R_PreserveObject/R_ReleaseObject` as intended #1081

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Rcpp is not using `R_PreserveObject`/R_ReleaseObject` as intended #1081