Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initialization of seed for Params virtual class #132

Closed
wenjie2wang opened this issue Jan 21, 2022 · 3 comments
Closed

Initialization of seed for Params virtual class #132

wenjie2wang opened this issue Jan 21, 2022 · 3 comments

Comments

@wenjie2wang
Copy link
Contributor

Thanks for developing this useful package.

I have a quick question: is it expected or by design that two Params child class objects always have the same seed by default? For example, the obj1 and obj2 are equivalent in the following code chunk. However, I was expecting them to be different due to the preceding set.seed().

set.seed(1)
obj1 <- splatter::newSplatPopParams()
splatter::getParam(obj1, "seed")
#> [1] 602245

set.seed(2)
obj2 <- splatter::newSplatPopParams()
splatter::getParam(obj2, "seed")
#> [1] 602245

all.equal(obj1, obj2)
#> [1] TRUE

More generally, I find that the behavior may depend on the class initialization method. Two simple examples are given below.

## example of using `prototype` argument
set.seed(3)
sample(seq_len(1e6), 1)
#> [1] 315066

set.seed(3)
setClass("Foo",
         contains = "VIRTUAL",
         slots = c(seed = "numeric"),
         prototype = prototype(seed = sample(seq_len(1e6), 1)))

setClass("Bar", contains = "Foo")

(bar_obj1 <- new("Bar"))
#> An object of class "Bar"
#> Slot "seed":
#> [1] 315066
(bar_obj2 <- new("Bar"))
#> An object of class "Bar"
#> Slot "seed":
#> [1] 315066

## example of using `initialize` method
setClass("Alpha",
         contains = "VIRTUAL",
         slots = c(seed = "numeric"))
setMethod("initialize", "Alpha", function(.Object, ...) {
    .Object@seed <- sample(seq_len(1e6), 1)
    .Object
})
setClass("Beta", contains = "Alpha")

set.seed(4)
(tmp_alpha1 <- new("Beta"))
#> An object of class "Beta"
#> Slot "seed":
#> [1] 459339
(tmp_alpha2 <- new("Beta"))
#> An object of class "Beta"
#> Slot "seed":
#> [1] 149251

set.seed(4)
(tmp_alpha3 <- new("Beta"))
#> An object of class "Beta"
#> Slot "seed":
#> [1] 459339

In my opinion, using the initialize() method gives a better behavior. If one wants to repeat the simulation multiple times, he/she has to specify the seed argument when initializing the parameter objects because the current behavior does not respect the random number seed specified by set.seed().

@lazappi
Copy link
Collaborator

lazappi commented Jan 24, 2022

Hi @wenjie2wang

Thanks for asking the question! You are correct that the intention is for the user to set the seed manually, either when creating the object (params <- newSplatParams(seed = 2)) or by modifying the existing object (params <- setParam(params, seed = 3)). That two objects created one after the other get the same default seed is something I have noticed before but never really looked into. Thank you for showing the example with initialize() 🎉 ! This is something I wasn't aware of and should probably incorporate into {splatter}.

(There are also issues with how the seeds are used during the simulation which I am aware of but haven't got around to addressing yet. This is a good reminder to look into those as well.)

@wenjie2wang
Copy link
Contributor Author

Hi @lazappi

Thanks for your positive feedback!

This is something I wasn't aware of and should probably incorporate into {splatter}.

I created a simple PR to resolve this issue.

(There are also issues with how the seeds are used during the simulation which I am aware of but haven't got around to addressing yet. This is a good reminder to look into those as well.)

I think how the random number seed is used internally is out of the scope of the package. It is probably not necessary to set the seed internally. Users should (be able to) set the seed on their own by a preceding set.seed() call. If all the internal functions that generate the random numbers respect the set.seed() call, users will get identical results from the same seed. I am not clear about the other issues related to the seed. However, is it possible to avoid those issues by not setting the seed internally if seed = NULL?

@lazappi
Copy link
Collaborator

lazappi commented Feb 2, 2022

Thanks for the PR! I have made a couple of quick comments on that but looks pretty good.

Part of the design philosophy for {splatter} is that you should be able to recreate the simulation from the parameters object so that in theory you could just share that rather than the whole simulated dataset. For this reason it seemed like a good idea to store the seed and set it during the simulation. Obviously that causes some issues though. I would like to keep this behaviour and I think I know how to do that but I haven't got around to working on it yet. If that is something you would be interested in helping out with let me know and I could explain what I was planning to do.

lazappi added a commit that referenced this issue Mar 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants