Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Within-target parallelism section #69

Closed
pat-s opened this issue Mar 10, 2019 · 5 comments
Closed

Within-target parallelism section #69

pat-s opened this issue Mar 10, 2019 · 5 comments

Comments

@pat-s
Copy link
Contributor

pat-s commented Mar 10, 2019

Questions

  • In mclapply(), set the mc.set.seed argument to FALSE. If your computations require pseudo-random numbers (rnorm(), runif(), etc.) you will need to manually set a different seed for each parallel process, e.g.

Why? I use it in my code and haven't faced any problems yet.

  • In make(), set the lock_envir argument to FALSE. This approach deactivates important reproducibility guardrails, so use with caution.

I also do not do this. What is the reason behind this?

Suggestions

Devote an own section to "within-target parallelism". Create subsections for

  • General notes
  • Execution on a single machine
  • Execution on a HPC
@pat-s pat-s mentioned this issue Mar 10, 2019
3 tasks
@wlandau
Copy link
Member

wlandau commented Mar 13, 2019

Why? I use it in my code and haven't faced any problems yet.

From the "Random numbers" section of the help file of parallel::mcparallel():

If ‘mc.set.seed = FALSE’, the child process has the same initial
random number generator (RNG) state as the current R session. If
the RNG has been used (or ‘.Random.seed’ was restored from a saved
workspace), the child will start drawing random numbers at the
same point as the current session.

rnorm(1)
#> [1] 0.4506939
parallel::mclapply(c(1, 1), rnorm, mc.set.seed = FALSE, mc.cores = 2)
#> [[1]]
#> [1] -0.3738903
#> 
#> [[2]]
#> [1] -0.3738903

Created on 2019-03-12 by the reprex package (v0.2.1)

I also do not do this. What is the reason behind this?

Just if people are encountering ropensci/drake#675 and need a quick workaround.

@pat-s
Copy link
Contributor Author

pat-s commented Mar 13, 2019

From the "Random numbers" section of the help file of parallel::mcparallel():

Uff, either I always interpreted that wrong or something changed recently. I remember explicitly setting this to HAVE the same RNG across all processes.
That's ofc a bummer then. But according to the documentation I was wrong. Hmm

@wlandau
Copy link
Member

wlandau commented Mar 15, 2019

I just added new writing based on ropensci/drake#777 (comment).

@pat-s
Copy link
Contributor Author

pat-s commented Mar 15, 2019

No worries. I would suggest to make one part of the persistent worker section extra clear:

The number of workers chosen in prework apply to all future_*() functions in the whole plan. I.e. the user needs to choose the lowest value that works for all instances in the plan.

@wlandau
Copy link
Member

wlandau commented Mar 17, 2019

Yeah, I think something like that is worth a mention. But for the defaults, I agree with @HenrikBengtsson in ropensci/drake#777 (comment). I think the first examples should rely on future::availableCores(). See 9cb1527.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants