Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid materializing ALTREP vectors by subsetting them first #499

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 87 additions & 0 deletions crates/ark/src/variables/variable.rs
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,7 @@ impl WorkspaceVariableDisplayValue {
CLOSXP => Self::from_closure(value),
ENVSXP => Self::from_env(value),
_ if r_is_matrix(value) => Self::from_matrix(value),
_ if r_is_altrep(value) => Self::from_altrep(value),
_ => Self::from_default(value),
}
}
Expand Down Expand Up @@ -274,6 +275,22 @@ impl WorkspaceVariableDisplayValue {
Self::new(display_value, is_truncated)
}

fn from_altrep(value: SEXP) -> Self {
// For ALTREP objects we first subset the first `MAX_DISPLAY_VALUE_LENGTH` values
// and only then try to format. This is due to a possible bug in the implementation of
// deferred string, for which once you try to acess a single element, it tries to allocate
// the full vector.
Comment on lines +279 to +282
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure its not a bug

When they hit the ALTREP Elt method the first time, they allocate a STRSXP of the final size but with all CHARSXP elements set to NULL (the C pointer value). Then they just coerce and update the 1 Elt value they need to extract.

The reason they do it this way is because someone has to PROTECT() the CHARSXP element that gets generated. It ends up being the STRSXP they allocate the first time Elt is called.

https://github.com/wch/r-source/blob/eba90abe96c8693f6d2c1fc77f44b61fa65747a8/src/main/altclasses.c#L685-L695

It's different for Extract_subset because they decide that when they extract a subset, they actually extract the subset from the underlying object they are delaying the conversion to string on, and then they wrap that result in another deferred string. It's up to the caller to protect that new deferred string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, the Error: vector memory limit of 32.0 Gb reached, see mem.maxVSize() made me think it was materializing the vector, but they are actually just allocating an empty STRSXP, which shouldn't be that problematic.

let subset = RFunction::new("utils", "head")
.add(value)
.add(MAX_DISPLAY_VALUE_LENGTH as i32)
.call();

match subset {
Ok(value) => Self::from_default(value.sexp),
Err(err) => Self::from_error(err),
}
}

fn from_default(value: SEXP) -> Self {
let formatted = unwrap!(FormattedVector::new(value), Err(err) => {
return Self::from_error(err);
Expand Down Expand Up @@ -1264,3 +1281,73 @@ pub fn plain_binding_force_with_rollback(binding: &Binding) -> anyhow::Result<RO
_ => Err(anyhow!("Unexpected binding type")),
}
}

#[cfg(test)]
mod tests {
use regex::Regex;

use crate::test::r_test;
use crate::variables::variable::WorkspaceVariableDisplayValue;

fn get_display_value(code: &str) -> String {
WorkspaceVariableDisplayValue::from(harp::parse_eval_base(code).unwrap().sexp).display_value
}

fn expect_display_value(code: &str, expected: &str) {
let display = get_display_value(code);
assert_eq!(display, expected.to_string());
}

#[test]
fn test_simple_display_values() {
r_test(|| {
expect_display_value("1", "1");
expect_display_value("1L", "1");
expect_display_value("'a'", "\"a\"");
expect_display_value("NULL", "NULL");
expect_display_value("TRUE", "TRUE");
expect_display_value("FALSE", "FALSE");
expect_display_value("1i", "0+1i");
})
}

#[test]
fn test_data_frame_display_value() {
r_test(|| {
expect_display_value("datasets::mtcars", "[32 rows x 11 columns] <data.frame>");
expect_display_value("matrix(1:4, ncol=2)", "[[1 2], [3 4]]");
})
}

#[test]
fn test_list_display_value() {
r_test(|| {
expect_display_value("list(x=1:4)", "[x = 1 2 3 4]");
expect_display_value("list(1:4)", "[1 2 3 4]");
})
}

#[test]
fn test_functions_display_value() {
r_test(|| {
expect_display_value("function() NULL", "function () ");
expect_display_value("function(a) NULL", "function (a) ");
expect_display_value("function(a, b) NULL", "function (a, b) ");
expect_display_value("function(a = 1, b) NULL", "function (a = 1, b) ");
})
}

#[test]
fn test_altrep_is_not_materialized() {
r_test(|| {
// This will panic! if the variables pane tries to materialize the full altrep vector.
let display = get_display_value("1:1e10");
assert!(Regex::new(r"^1 2 3.*").unwrap().is_match(display.as_str()));

let display = get_display_value("local({x = 1:1e10; names(x) = x; names(x)})");
assert!(Regex::new(r#"^"1" "2" "3""#)
.unwrap()
.is_match(display.as_str()));
})
}
}
Loading