-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Seed issue: module seed or pipeline seed or both? #70
Comments
It is hard to think of useful situations in which you would want to define multiple seed values for a given pipeline (i.e., for a given sequence of block evaluations). So I think a "global" seed is okay, and is probably "best practice". But I suppose there may situations in which a user may want to do this; e.g., to simulate multiple data sets and take all combinations of data sets with different seeds. Having block-specific seeds certainly allows for more flexibility. |
I think that to ensure reproducibility we have to set a seed before
executing any module.
So a module instance will want to include a record of the seed value that
was set (for reproducibility).
This seed value may well be the same for all module instances in a pipeline
instance. But I think
it will make sense conceptually (and be more flexible in future)
to store the value for each module instance as part of the value
of that module instance.
Matthew
…On Tue, Mar 7, 2017 at 4:28 PM, Peter Carbonetto ***@***.***> wrote:
It is hard to think of useful situations in which you would want to define
multiple seed values for a given pipeline (i.e., for a given sequence of
block evaluations). So I think a "global" seed is okay, and is probably
"best practice". But I suppose there may situations in which a user may
want to do this; e.g., to simulate multiple data sets and take all
combinations of data sets with different seeds. Having block-specific seeds
certainly allows for more flexibility.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#70 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABt4xUlTXb6IejlvK815kEv_T19KbSifks5rjdn6gaJpZM4MUb4F>
.
|
The question of how to store and access the seeds seems slightly independent of this discussion. The more general question is whether we store values of variables used in every module instance (i.e., all the local variables and inputs), or do we only record the environment in which the module instance was evaluated. The latter is potentially more efficient, especially if the variables are associated with complex data structures, because it means that values do not have to be stored more than once, but maybe less convenient. |
here "local variables" means "parameters" in my document?
…On Tue, Mar 7, 2017 at 9:52 PM, Peter Carbonetto ***@***.***> wrote:
The question of how to store and access the seeds seems slightly
independent of this discussion.
The more general question is whether we store values of variables used in
every module instance (i.e., all the local variables and inputs), or do we
only record the environment in which the module instance was evaluated. The
latter is potentially more efficient, especially if the variables are
associated with complex data structures, because it means that values do
not have to be stored more than once, but maybe less convenient.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#70 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABt4xRFkKEuPaXOocxnPOgzoSw1KfIKRks5rjiYTgaJpZM4MUb4F>
.
|
I'm not sure which document you are referring to, but my general impression is that "parameter" has meant slightly different things in our discussions. Variable and parameter can be used interchangeably, although parameter is often used to specifically refer to the function/module inputs. Consider this function in R: f <- function (x, y) {
e <- 0.01
a <- fit.model(x,y,e)
return(a)
}
|
@pcarbo: this document,
https://github.com/stephenslab/dsc-wiki/blob/master/development/initial_thoughts_on_terminology_and_extraction.md
(i have updated to note your comment about no side effects and also
clarify that parameters are local to a module)
…On Wed, Mar 8, 2017 at 8:22 AM, Peter Carbonetto ***@***.***> wrote:
I'm not sure which document you are referring to, but my general
impression is that "parameter" has meant slightly different things in our
discussions. Variable and parameter can be used interchangeably, although
parameter is often used to specifically refer to the function/module inputs.
Consider this function in R:
f <- function (x, y) {
e <- 0.01
a <- fit.model(x,y,e)
return(a)
}
-
x, y, e, a are all variables. Specifically, they are "local variables"
in that they have no meaning outside function f.
-
x, y are also *input parameters*; they are associated with values when
the function is evaluated in a given environment.
-
variables x, y and a are determined by the environment in which the
function is evaluated (in DSC, this is what we are calling "dependencies"
since the environment depends on evaluation of other modules).
-
e is a variable that does not depend on the evaluation environment. We
could call this a "free variable".
-
a is also an output; that is, it is the only variable in which its
value is accessible outside the function.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#70 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABt4xWnMrhuRWgPqkZ_LIQD40nKJy8a3ks5rjrm4gaJpZM4MUb4F>
.
|
I think what @gaow is doing now is to store the parameter values
for each module instance explicitly (in the "table" for that module).
see for example the datamaker table here:
#71
where min_pi0, n, etc are parameter values
In contrast the pipeline output variable values are stored separately in a
file,
and linked to the module instance record via the "returns" column.
On Wed, Mar 8, 2017 at 8:05 AM, Matthew Stephens <stephens999@gmail.com>
wrote:
… here "local variables" means "parameters" in my document?
On Tue, Mar 7, 2017 at 9:52 PM, Peter Carbonetto ***@***.***
> wrote:
> The question of how to store and access the seeds seems slightly
> independent of this discussion.
>
> The more general question is whether we store values of variables used in
> every module instance (i.e., all the local variables and inputs), or do we
> only record the environment in which the module instance was evaluated. The
> latter is potentially more efficient, especially if the variables are
> associated with complex data structures, because it means that values do
> not have to be stored more than once, but maybe less convenient.
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> <#70 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/ABt4xRFkKEuPaXOocxnPOgzoSw1KfIKRks5rjiYTgaJpZM4MUb4F>
> .
>
|
Here I propose an interface to set pipeline seed:
So we have |
i'm inclined to think the modules should not have access to the seeds. Here is how I propose the seeds be dealt with: |
Seed serves 2 goals: ensure reproducibility and generating replicates. What if I have this example, in current syntax where seed is explicit, to make my point: data:
input:
seed: $(seed)
...
mcmc:
input:
seed: $(seed)
DSC:
variable:
seed: R(1:5) Then we will run 5 data-sets and 5 MCMC rounds, generating 25 different output? Instead one might simply want: data:
input:
seed: $(seed)
...
mcmc:
input:
seed: 999
DSC:
variable:
seed: R(1:5) ie, setting a single, yet fixed seed for Notice that previously I made data:
seed: R(1:5)
input:
... |
We have finally reached an agreement on this issue. We will stick to this thread. The ticket is closed for now until the design changes otherwise -- implementation request has been added to project TODO list. |
See this post:
https://github.com/stephenslab/dsc-wiki/blob/master/development/initial_thoughts_on_terminology_and_extraction.md
I suggest let's not have pipeline seed. We'll only use module seeds. A module may require its own seed. The idea of pipeline seed is already reflected by
dsc -x .. --seeds
option in dsc command interface, that is to reset all seeds in a pipeline to the same specified values for a particular execution (that overwrites the default).The text was updated successfully, but these errors were encountered: