-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Added "seed" parameter to subsample-ids
#317
base: dev
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VinzentRisch, I thought I left a comment on this earlier today, but I see to have lost it so I'm thinking maybe it didn't save and I navigated away from the page. But, apologies if this feedback is showing up twice.
I completely agree that we need to be storing random seeds in provenance, but the approach implemented here will be problematic. The vast majority of users won't notice or override this parameter setting, in which case the result will be that the same seed is used all the time.
We have support for passing random seeds on the command line in q2-boots - see how we handle this here:
caporaso-lab/q2-boots@9195f9c
Could you update this PR to use this approach?
I created a framework issue to develop an approach for better supporting this:
qiime2/qiime2#823
Oh, I see - my previous comment was on #321. This feedback is relevant to both of the PRs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VinzentRisch, could you update this PR to mirror the approaches we've taken to this functionality and the tests in #321?
|
||
|
||
def subsample_ids(table: biom.Table, subsampling_depth: int, | ||
axis: str) -> biom.Table: | ||
axis: str, seed: Union[int, str] = 1) -> biom.Table: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
axis: str, seed: Union[int, str] = 1) -> biom.Table: | |
axis: str, seed: int = None) -> biom.Table: |
# Generate a random seed if seed is None | ||
if seed == "random": | ||
rng = np.random.default_rng() | ||
seed = rng.integers(0, 2 ** 32 - 1) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should all be able to go away since this happens in Table.subsample
- right?
# Generate a random seed if seed is None | |
if seed == "random": | |
rng = np.random.default_rng() | |
seed = rng.integers(0, 2 ** 32 - 1) |
@@ -61,7 +61,9 @@ | |||
function=q2_feature_table.subsample_ids, | |||
inputs={'table': FeatureTable[Frequency]}, | |||
parameters={'subsampling_depth': Int % Range(1, None), | |||
'axis': Str % Choices(['sample', 'feature'])}, | |||
'axis': Str % Choices(['sample', 'feature']), | |||
'seed': Int % Range(0, 2**32) | Str % Choices(["random"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'seed': Int % Range(0, 2**32) | Str % Choices(["random"]) | |
'seed': Int % Range(0, None) |
'seed': Int % Range(0, 2**32) | Str % Choices(["random"]) | |
'seed': Int % Range(0, 2**32) | Str % Choices(["random"]) |
'seed': ('Set the seed for the subsampling. Using the same seed with ' | ||
'the same table will always lead to the same result. Using ' | ||
'"random", sets the seed to a random number. The random ' | ||
'seed will not be logged in provenance.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edits to the text. I'd leave the bit about provenance out, since we'll ideally be fixing that very shortly and it's common to all actions that take seeds right now.
'seed': ('Set the seed for the subsampling. Using the same seed with ' | |
'the same table will always lead to the same result. Using ' | |
'"random", sets the seed to a random number. The random ' | |
'seed will not be logged in provenance.') | |
'seed': ('Set the seed for the subsampling. Using the same seed with ' | |
'the same table will always lead to the same result. By ' | |
'default a random seed will be used.') |
@@ -21,7 +21,7 @@ def test_subsample_samples(self): | |||
t = Table(np.array([[0, 1, 3], [1, 1, 2]]), | |||
['O1', 'O2'], | |||
['S1', 'S2', 'S3']) | |||
a = subsample_ids(t, 2, 'sample') | |||
a = subsample_ids(t, 2, 'sample', 'random') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you adapt these tests to be similar to the ones we put together for #321?
solves #320
subsample-ids
function.