-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallel support improvements. #124
Conversation
I just added a commit with a bunch of documentation stuff for #95 here as well. I can move these commits / contributions around if we decide not to go with this approach to parallelism. |
@keller-mark -- are you ok merging this in? I'm back working with this stuff again and would be stoked to get it moved toward CRAN. |
@@ -16,71 +16,70 @@ ZarrArray <- R6::R6Class("ZarrArray", | |||
# store Array store, already initialized. | |||
#' @keywords internal | |||
store = NULL, | |||
#' chunk_store Separate storage for chunks. If not provided, `store` will be used for storage of both chunks and metadata. | |||
# chunk_store Separate storage for chunks. If not provided, `store` will be used for storage of both chunks and metadata. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are the #'
to #
changes needed? I am not an expert in Roxygen comments but I see the apostrophe format at https://roxygen2.r-lib.org/articles/rd-other.html#r6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember precisely, but for internal methods, this was causing documentation to get polluted with stuff that didn't make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
R/zarr-array.R
Outdated
|
||
parts <- indexer$iter() | ||
part1_results <- apply_func(parts, function(proj, cl = NA) { | ||
part1_results <- ps$FUN(parts, function(proj, cl = NA) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
part1_results <- ps$FUN(parts, function(proj, cl = NA) { | |
part1_results <- ps$apply_func(parts, function(proj, cl = NA) { |
Keeping apply
somewhere in the naming of this function would make it clear that it is doing an lapply
-type operation over parts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah... I'll work that up.
R/zarr-array.R
Outdated
|
||
parts <- indexer$iter() | ||
part1_results <- apply_func(parts, function(proj, cl = NA) { | ||
part1_results <- ps$FUN(parts, function(proj, cl = NA) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
part1_results <- ps$FUN(parts, function(proj, cl = NA) { | |
part1_results <- ps$FUN(parts, function(proj) { |
It seems like since cl = cl
has been changed to cl = ps$cl
below, this cl = NA
parameter is never used, correct? Do we still need to keep the cl = NA
in the function signature (here, in the writing case, and within get_parallel_settings
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll double check this and implement in a follow up commit.
@@ -1317,3 +1262,82 @@ ZarrArray <- R6::R6Class("ZarrArray", | |||
as.array.ZarrArray = function(x, ...) { | |||
x$as.array() | |||
} | |||
|
|||
get_parallel_settings <- function(on_windows = (.Platform$OS.type == "windows"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_parallel_settings <- function(on_windows = (.Platform$OS.type == "windows"), | |
get_parallel_settings <- function(on_windows = (.Platform$OS.type == "windows"), |
Would you be able to add documentation comments to document the parameters and return value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For sure.
|
||
get_parallel_settings <- function(on_windows = (.Platform$OS.type == "windows"), | ||
parallel_option = getOption("pizzarr.parallel_read_enabled", FALSE), | ||
progress = getOption("pizzarr.progress_bar", FALSE)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since pizzarr.progress_bar
is a new option being introduced here, can it also be added to https://github.com/keller-mark/pizzarr/blob/main/R/options.R for initialization and to support controlling via environment variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done now.
|
||
get_parallel_settings <- function(on_windows = (.Platform$OS.type == "windows"), | ||
parallel_option = getOption("pizzarr.parallel_read_enabled", FALSE), | ||
progress = getOption("pizzarr.progress_bar", FALSE)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there are now the 3 different parallel backends (pbapply, future.apply, and parallel), would it make more sense for this option to be a categorical/string instead of a binary flag? The naming of the option suggests that it will control whether the user sees a progress bar during processing (as opposed to something to do with parallelism or the pbapply package), so can we name it something like parallel_backend
with values "pbapply"
, "future.apply"
, and "parallel"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to be minimally invasive so didn't think to do something like this. I'll look at what it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so we have a mess of options / considerations here. I think I like where it stands.
There are actually only 2 parallel back ends and the wrapper pbapply
. pbapply
works with either parallel back end and we call it if we want a progress bar.
parallel_option
accepts TRUE/FLASE or an integer, or a cluster object, or the string "future". "future" works on any platform but parallel has limited functionality on Windows.
I'll document these and make sure things are clear in options.R.
Merge branch 'main' of https://github.com/keller-mark/pizzarr # Conflicts: # tests/testthat/test-01-parallel.R
Merge branch 'parallel' of https://github.com/dblodgett-usgs/pizzarr # Conflicts: # tests/testthat/test-01-parallel.R
@@ -1,21 +1,29 @@ | |||
# Adapted from https://github.com/IRkernel/IRkernel/blob/master/R/options.r | |||
|
|||
#' pizzarr_option_defaults |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@keller-mark -- see additions here. "parallel_backend" is now the master control on parallelization and there is a separate flag for whether to use it for writing.
Thanks! |
This PR includes the commits in #122 and is related to issue #123
I've been working through lingering edge cases related to #105 and ended up with what's here.
I feel like this is actually fairly robust but also am really not sure I like how much complexity I had to bring in to support what should be pretty straight forward. Thoughts?