-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default seeding and replicates issues #94
Comments
@gaow I thought we agreed that every module instance would be run (by default) with a unique seed. This would take care of one of the issues you have raised. I think it is okay (and still important!) to attempt to provide a default method for setting the seed, even if it is not guaranteed to work all the time. R is a simpler case because the standard method is to use |
@stephens999 Sorry I think what we have agreed on has a problem. This theme, ie,
Then module B, the method module, will get different seed set at each replicate, which is not good because we do not want variation in a method across replicates. So it goes back to the setting where we only allow for replicates with different seeds set at the first module, the beginning of pipeline ; and other modules will get their “hash-based” seed. Eg.
Then my initial post on this ticket raises a case when replicate is needed in the 2nd module ... Of course we can still proceed with the above theme (2nd code block), but we will have to explain all the caveats in a dedicated document in the wiki. Is my understanding correct? Is there a different proposal than my 2nd code block above? @pcarbo to fill you in, Matthew and I have discussed this and we agree while there exists various caveats we would still prefer to offer limited build-in support to deal with replicates, and we will dedicate a documentation page to explain exactly what we do for the currently supported languages, along with cautions. |
@gaow @stephens999 My understanding was that the default seed would work like this: # replicate 1
set.seed(seed1)
module(A)
set.seed(seed2)
module(B)
# replicate 2
set.seed(seed3)
module(A)
set.seed(seed4)
module(B) The point of a "default" is not to solve all the cases; it is only to provide a reasonable behaviour that will work in many cases. I still think that providing a default is much better than not providing a default (for reproducibility). |
I don't see a problem.
If someone really wants their module to always behave the same they should
be responsible for that by
setting a seed at the start of the module.
eg set.seed(123)
Matthew
…On Mon, Mar 5, 2018 at 9:02 PM, Peter Carbonetto ***@***.***> wrote:
@gaow <https://github.com/gaow> @stephens999
<https://github.com/stephens999> My understanding was that the default
seed would work like this:
# replicate 1
set.seed(seed 1)
module(A)
set.seed(seed 2)
module(B)# replicate 2
set.seed(seed 3)
module(A)
set.seed(seed 4)
module(B)
The point of a "default" is not to solve all the cases; it is only to
provide a reasonable behaviour that will work in many cases. I still think
that providing a default is much better than not providing a default (for
reproducibility).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#94 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABt4xdluoss7l1HjMb5zRe6EU4pbSyOMks5tbfxMgaJpZM4Sa7Uv>
.
|
Great! I have implemented it; a seed for a module is now: |
In #70 @stephens999 has proposed that
The main focus here is reproducibility, and an important feature is that users should not think or worry about setting it.
I was in fact still against the idea that DSC should take care of seeding.
It is difficult to set default seed for every module simply because we cannot properly do it for all languages. In
R
it is most probably configured viaset.seed()
. For Python and Shell programs the procedure are not unique (Python vianumpy
,random
, or other packages? for Shelladmixture ... --seed
I will have no idea that program accept such flag!). For those languages users will have to take care of this anyways. That is, behavior is going to be inconsistent between languages.Even if we can work out 1, how do we handle replicates? We cannot assume that only the first module of a pipeline need seeds. For example what if the first module is just:
then the 2nd module needs replicates:
In that case, setting default, global seeds dedicated to replicates is not enough because we need to know when to apply them.
For these reasons, in fact in my proposal in #70 essentially I still rely on users to set seeds. In DSC Road Map I proposed to just spend an entire tutorial discussing it. eg, some tips:
That said, I believe I can fully appreciate what @stephens999 has in mind for R users. I hate this from engineering prospective but how about something like:
so that we automatically set the first line of
R
code toset.seed(999)
?The text was updated successfully, but these errors were encountered: