Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel support improvements. #124

Merged
merged 13 commits into from
Feb 20, 2025
Merged

Conversation

dblodgett-usgs
Copy link
Collaborator

This PR includes the commits in #122 and is related to issue #123

I've been working through lingering edge cases related to #105 and ended up with what's here.

I feel like this is actually fairly robust but also am really not sure I like how much complexity I had to bring in to support what should be pretty straight forward. Thoughts?

@dblodgett-usgs
Copy link
Collaborator Author

I just added a commit with a bunch of documentation stuff for #95 here as well. I can move these commits / contributions around if we decide not to go with this approach to parallelism.

@dblodgett-usgs dblodgett-usgs mentioned this pull request Dec 20, 2024
7 tasks
@dblodgett-usgs
Copy link
Collaborator Author

@keller-mark -- are you ok merging this in? I'm back working with this stuff again and would be stoked to get it moved toward CRAN.

@@ -16,71 +16,70 @@ ZarrArray <- R6::R6Class("ZarrArray",
# store Array store, already initialized.
#' @keywords internal
store = NULL,
#' chunk_store Separate storage for chunks. If not provided, `store` will be used for storage of both chunks and metadata.
# chunk_store Separate storage for chunks. If not provided, `store` will be used for storage of both chunks and metadata.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are the #' to # changes needed? I am not an expert in Roxygen comments but I see the apostrophe format at https://roxygen2.r-lib.org/articles/rd-other.html#r6

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember precisely, but for internal methods, this was causing documentation to get polluted with stuff that didn't make sense.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... so with the #' on private methods, it does strange stuff.

image

R/zarr-array.R Outdated

parts <- indexer$iter()
part1_results <- apply_func(parts, function(proj, cl = NA) {
part1_results <- ps$FUN(parts, function(proj, cl = NA) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
part1_results <- ps$FUN(parts, function(proj, cl = NA) {
part1_results <- ps$apply_func(parts, function(proj, cl = NA) {

Keeping apply somewhere in the naming of this function would make it clear that it is doing an lapply-type operation over parts

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... I'll work that up.

R/zarr-array.R Outdated

parts <- indexer$iter()
part1_results <- apply_func(parts, function(proj, cl = NA) {
part1_results <- ps$FUN(parts, function(proj, cl = NA) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
part1_results <- ps$FUN(parts, function(proj, cl = NA) {
part1_results <- ps$FUN(parts, function(proj) {

It seems like since cl = cl has been changed to cl = ps$cl below, this cl = NA parameter is never used, correct? Do we still need to keep the cl = NA in the function signature (here, in the writing case, and within get_parallel_settings)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll double check this and implement in a follow up commit.

@@ -1317,3 +1262,82 @@ ZarrArray <- R6::R6Class("ZarrArray",
as.array.ZarrArray = function(x, ...) {
x$as.array()
}

get_parallel_settings <- function(on_windows = (.Platform$OS.type == "windows"),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
get_parallel_settings <- function(on_windows = (.Platform$OS.type == "windows"),
get_parallel_settings <- function(on_windows = (.Platform$OS.type == "windows"),

Would you be able to add documentation comments to document the parameters and return value?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sure.


get_parallel_settings <- function(on_windows = (.Platform$OS.type == "windows"),
parallel_option = getOption("pizzarr.parallel_read_enabled", FALSE),
progress = getOption("pizzarr.progress_bar", FALSE)) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since pizzarr.progress_bar is a new option being introduced here, can it also be added to https://github.com/keller-mark/pizzarr/blob/main/R/options.R for initialization and to support controlling via environment variable?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done now.


get_parallel_settings <- function(on_windows = (.Platform$OS.type == "windows"),
parallel_option = getOption("pizzarr.parallel_read_enabled", FALSE),
progress = getOption("pizzarr.progress_bar", FALSE)) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there are now the 3 different parallel backends (pbapply, future.apply, and parallel), would it make more sense for this option to be a categorical/string instead of a binary flag? The naming of the option suggests that it will control whether the user sees a progress bar during processing (as opposed to something to do with parallelism or the pbapply package), so can we name it something like parallel_backend with values "pbapply", "future.apply", and "parallel"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to be minimally invasive so didn't think to do something like this. I'll look at what it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so we have a mess of options / considerations here. I think I like where it stands.

There are actually only 2 parallel back ends and the wrapper pbapply. pbapply works with either parallel back end and we call it if we want a progress bar.

parallel_option accepts TRUE/FLASE or an integer, or a cluster object, or the string "future". "future" works on any platform but parallel has limited functionality on Windows.

I'll document these and make sure things are clear in options.R.

Merge branch 'main' of https://github.com/keller-mark/pizzarr

# Conflicts:
#	tests/testthat/test-01-parallel.R
Merge branch 'parallel' of https://github.com/dblodgett-usgs/pizzarr

# Conflicts:
#	tests/testthat/test-01-parallel.R
@@ -1,21 +1,29 @@
# Adapted from https://github.com/IRkernel/IRkernel/blob/master/R/options.r

#' pizzarr_option_defaults
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keller-mark -- see additions here. "parallel_backend" is now the master control on parallelization and there is a separate flag for whether to use it for writing.

@keller-mark
Copy link
Owner

Thanks!

@keller-mark keller-mark merged commit 2a8ea4d into keller-mark:main Feb 20, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants