Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Futures: output and conditions are now relayed, but with glitch #59

Open
HenrikBengtsson opened this issue Dec 8, 2022 · 5 comments
Open

Comments

@HenrikBengtsson
Copy link

With the support for futures, you now get automatic support for relaying of the standard output, messages, warnings, and other conditions, e.g. with

library(pbapply)
future::plan("multisession")

my_sqrt <- function(x) {
  if (x %% 10 == 0) message("x = ", x)
  sqrt(x)
}

we get:

> pboptions(type = "none")
> y <- pbsapply(1:100, FUN = my_sqrt, cl = "future")
x = 10
x = 20
x = 30
x = 40
x = 50
x = 60
x = 70
x = 80
x = 90
x = 100
> str(y)
 num [1:100] 1 1.41 1.73 2 2.24 ...

However, it looks like your progress bar is not prepared for having output generated in between the 0% step and the completion, e.g.

> pboptions(type = "txt")
> y <- pbsapply(1:100, FUN = my_sqrt, cl = "future")
  |++++                                              |   8%x = 10
  |++++++++                                          |  15%x = 20
  |++++++++++++                                      |  23%x = 30
  |+++++++++++++++                                   |  31%x = 40
  |+++++++++++++++++++++++                           |  46%x = 50
  |+++++++++++++++++++++++++++                       |  54%x = 60
  |+++++++++++++++++++++++++++++++                   |  62%x = 70
  |+++++++++++++++++++++++++++++++++++               |  69%x = 80
  |++++++++++++++++++++++++++++++++++++++++++        |  85%x = 90
  |++++++++++++++++++++++++++++++++++++++++++++++    |  92%x = 100
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100%
> 

The quick fix to avoid this, is to disable relaying of stdout and conditions in futures, e.g.

rval[i] <- list(future.apply::future_lapply(X[Split[[i]]], FUN, ..., future.stdout = FALSE, future.conditions = character(0L)))

That is how all other parallel frameworks work, i.e. they silently swallow any output.

Now, to actually relaying them, which is more useful, you have to buffer standard output and all conditions in:

pbapply/R/pblapply.R

Lines 73 to 76 in 53aa541

for (i in seq_len(B)) {
rval[i] <- list(future.apply::future_lapply(X[Split[[i]]], FUN, ...))
setpb(pb, i)
}

I'm thinking something like:

captureStdoutAndConditions <- function(expr, envir = parent.frame()) {
  expr <- substitute(expr)
  conditions <- list()
  withCallingHandlers({
    stdout <- utils::capture.output({
      value <- eval(expr, envir = envir)
    })
  }, condition = function(cond) {
    conditions <<- c(conditions, list(cond))
  })
  list(value = value, stdout = stdout, conditions = conditions)
}

relayStdoutAndConditions <- function(res) {
  cat(res$stdout)
  for (condition in res$conditions) {
    if (inherits(condition, "warning")) {
        warning(condition)
    } else if (inherits(condition, "message")) {
        message(condition)
    } else if (inherits(condition, "condition")) {
        signalCondition(condition)
    }
  }
  invisible(res)
}

and then use:

for (i in seq_len(B)) {
    res <- captureStdoutAndConditions({
      future.apply::future_lapply(X[Split[[i]]], FUN, ...)
    })
    rval[i] <- list(res$value)
    hide_pb(pb)
    relayStdoutAndConditions(res)
    unhide_pb(pb)
    setpb(pb, i)
}

PS. I've got future.mapreduce on the roadmap. One goal is to provide an API for others to build on and to avoid having to do manually do the above.

@psolymos
Copy link
Owner

Thanks @HenrikBengtsson for the outline, this is pretty cool. The progress bar is written to the main process's STDOUT (which is buffered, that's why we need to flush). Thus collecting output from the future processes makes a lot of sense, otherwise it can be quite hard to troubleshoot.

I'll add the silencer as a quick fix and play around with the other option.

@psolymos
Copy link
Owner

This looks like a more general issue, not just with future. Future prints the warning (STDERR), however, parallel::mclapply prints the messages to STDOUT, which is not much different regarding the output, but might require a different mechanism to capture that. However, clusters do not send info back to the main process:

Screenshot 2022-12-09 at 9 20 17 PM

@psolymos
Copy link
Owner

It looks like that parallel::mclapply write conditions to STDOUT of the main process, only the STDOUT of the forked processes can be silenced by setting mc.silent = TRUE.

To make the two options (future and mclapply) behave similarly, I only set future.stdout = FALSE but I did not modify how conditions are handled.

psolymos added a commit that referenced this issue Dec 10, 2022
Signed-off-by: Peter Solymos <psolymos@gmail.com>
psolymos added a commit that referenced this issue Dec 10, 2022
Signed-off-by: Peter Solymos <psolymos@gmail.com>
@HenrikBengtsson
Copy link
Author

It looks like that parallel::mclapply write conditions to STDOUT of the main process

Actually not. See mclapply example in https://henrikbengtsson.github.io/future-tutorial-user2022/appendix.html#appendix-standard-output-by-other-parallel-map-reduce-apis

@psolymos
Copy link
Owner

Hmm. This is an eye opening read. Basically, if you want consistency, the "recommended" tools (what is in parallel) are the worst choice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants