-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(r): Optimize conversion from sfc to ArrowArray #76
Conversation
r/geoarrow/R/sf-compat.R
Outdated
return(NextMethod()) | ||
} | ||
|
||
if (class(x)[1] %in% c("sfc_POINT", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use wk::wk_vector_meta() here?
if (class(x)[1] %in% c("sfc_POINT", | |
# or use the labels instead | |
if (wk::wk_vector_meta(x)$geometry_type %in% 1:6) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call!
r/geoarrow/R/sf-compat.R
Outdated
|
||
# Let the default method handle M values (the optimized path doesn't | ||
# handle mixed XYZ/XYZM/XYM but can deal with mixed XY and XYZ) | ||
if (!is.null(attr(x, "m_range"))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use wk::wk_vector_meta() here?
if (!is.null(attr(x, "m_range"))) { | |
if (wk::wk_vector_meta(x)$has_m) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call!
This PR implements an optimized path for sf objects to
ArrowArray
when we know in advance all elements of the sfc are the same type and dimension (very common). This is ~2.3-3x faster than the visitor-based approach and results in very few datasets that are large enough for anybody to notice (~100ms). This version can also drop dimensions (e.g., for when you send a 3D geometry to S2 where the Z values will be ignored anyway).I spent a reasonable amount of time attempting a two-pass conversion (one pass to count elements to perfectly preallocate, one pass to fill the buffers). This is slower than overallocating because (1) we have a pretty good idea of how many elements we have in advance (most multilinestrings have at least one element and at least one coordinate) and (2) the cost of reallocating is on par with the cost of calling the R API for every element in the sfc twice. The version here is more readable, too (e.g., can use
GeoArrowBuilderOffsetAppend()
instead ofbuilder->view.buffers[i + 1].data.as_int32...
).In all cases this is faster than generating WKB, although that might just be limitations of R or Rcpp that could be optimized in sf:
Created on 2023-11-20 with reprex v2.0.2