Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid materializing ALTREP vectors by subsetting them first #499

Closed
wants to merge 1 commit into from

Conversation

dfalbel
Copy link
Contributor

@dfalbel dfalbel commented Sep 3, 2024

Addresses posit-dev/positron#4537

When calling STRING_ELT in an ALTREP vector, it might try to allocate the entire vector. With this PR, we first subset the vector, which will call Extract_subset ALTREP method before trying to allocate the entire vector.

Also added some unit tests for variables display values.

… some unit tests for variables display values.
Comment on lines +279 to +282
// For ALTREP objects we first subset the first `MAX_DISPLAY_VALUE_LENGTH` values
// and only then try to format. This is due to a possible bug in the implementation of
// deferred string, for which once you try to acess a single element, it tries to allocate
// the full vector.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure its not a bug

When they hit the ALTREP Elt method the first time, they allocate a STRSXP of the final size but with all CHARSXP elements set to NULL (the C pointer value). Then they just coerce and update the 1 Elt value they need to extract.

The reason they do it this way is because someone has to PROTECT() the CHARSXP element that gets generated. It ends up being the STRSXP they allocate the first time Elt is called.

https://github.com/wch/r-source/blob/eba90abe96c8693f6d2c1fc77f44b61fa65747a8/src/main/altclasses.c#L685-L695

It's different for Extract_subset because they decide that when they extract a subset, they actually extract the subset from the underlying object they are delaying the conversion to string on, and then they wrap that result in another deferred string. It's up to the caller to protect that new deferred string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, the Error: vector memory limit of 32.0 Gb reached, see mem.maxVSize() made me think it was materializing the vector, but they are actually just allocating an empty STRSXP, which shouldn't be that problematic.

@dfalbel dfalbel closed this Sep 4, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Sep 4, 2024
@DavisVaughan DavisVaughan deleted the bugfix/altrep-materialization branch October 14, 2024 12:42
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants