ENH: Added "seed" parameter to `subsample-ids` #317

VinzentRisch · 2024-10-23T09:46:35Z

solves #320

Adds seed parameter to subsample-ids function.
Seed is per default set to 1. It can be set to "random" what creates a random 32 bit int as the seed.
Setting the seed to default 1 was done to ensure that the seed is logged in provenance. This improved reproducability.
At this point it is not possible to have a random seed that is created in the function that will be recorded in provenance.

q2_feature_table/_subsample_ids.py

gregcaporaso

@VinzentRisch, I thought I left a comment on this earlier today, but I see to have lost it so I'm thinking maybe it didn't save and I navigated away from the page. But, apologies if this feedback is showing up twice.

I completely agree that we need to be storing random seeds in provenance, but the approach implemented here will be problematic. The vast majority of users won't notice or override this parameter setting, in which case the result will be that the same seed is used all the time.

We have support for passing random seeds on the command line in q2-boots - see how we handle this here:
caporaso-lab/q2-boots@9195f9c

Could you update this PR to use this approach?

I created a framework issue to develop an approach for better supporting this:
qiime2/qiime2#823

q2_feature_table/_subsample_ids.py

gregcaporaso · 2024-12-16T18:07:42Z

Oh, I see - my previous comment was on #321. This feedback is relevant to both of the PRs.

gregcaporaso

@VinzentRisch, could you update this PR to mirror the approaches we've taken to this functionality and the tests in #321?

gregcaporaso · 2025-01-23T17:04:42Z

q2_feature_table/_subsample_ids.py



 def subsample_ids(table: biom.Table, subsampling_depth: int,
-                  axis: str) -> biom.Table:
+                  axis: str, seed: Union[int, str] = 1) -> biom.Table:


Suggested change

axis: str, seed: Union[int, str] = 1) -> biom.Table:

axis: str, seed: int = None) -> biom.Table:

gregcaporaso · 2025-01-23T17:05:31Z

q2_feature_table/_subsample_ids.py

+    # Generate a random seed if seed is None
+    if seed == "random":
+        rng = np.random.default_rng()
+        seed = rng.integers(0, 2 ** 32 - 1)
+


This should all be able to go away since this happens in Table.subsample - right?

Suggested change

# Generate a random seed if seed is None

if seed == "random":

rng = np.random.default_rng()

seed = rng.integers(0, 2 ** 32 - 1)

gregcaporaso · 2025-01-23T17:06:12Z

q2_feature_table/plugin_setup.py

@@ -61,7 +61,9 @@
    function=q2_feature_table.subsample_ids,
    inputs={'table': FeatureTable[Frequency]},
    parameters={'subsampling_depth': Int % Range(1, None),
-                'axis': Str % Choices(['sample', 'feature'])},
+                'axis': Str % Choices(['sample', 'feature']),
+                'seed': Int % Range(0, 2**32) | Str % Choices(["random"])


Suggested change

'seed': Int % Range(0, 2**32) | Str % Choices(["random"])

'seed': Int % Range(0, None)

Suggested change

'seed': Int % Range(0, 2**32) | Str % Choices(["random"])

'seed': Int % Range(0, 2**32) | Str % Choices(["random"])

gregcaporaso · 2025-01-23T17:07:44Z

q2_feature_table/plugin_setup.py

+        'seed': ('Set the seed for the subsampling. Using the same seed with '
+                 'the same table will always lead to the same result. Using '
+                 '"random", sets the seed to a random number. The random '
+                 'seed will not be logged in provenance.')


Edits to the text. I'd leave the bit about provenance out, since we'll ideally be fixing that very shortly and it's common to all actions that take seeds right now.

Suggested change

'seed': ('Set the seed for the subsampling. Using the same seed with '

'the same table will always lead to the same result. Using '

'"random", sets the seed to a random number. The random '

'seed will not be logged in provenance.')

'seed': ('Set the seed for the subsampling. Using the same seed with '

'the same table will always lead to the same result. By '

'default a random seed will be used.')

gregcaporaso · 2025-01-23T17:08:14Z

q2_feature_table/tests/test_subsample.py

@@ -21,7 +21,7 @@ def test_subsample_samples(self):
        t = Table(np.array([[0, 1, 3], [1, 1, 2]]),
                  ['O1', 'O2'],
                  ['S1', 'S2', 'S3'])
-        a = subsample_ids(t, 2, 'sample')
+        a = subsample_ids(t, 2, 'sample', 'random')


Can you adapt these tests to be similar to the ones we put together for #321?

VinzentRisch added 2 commits October 23, 2024 11:40

added seed parameter with tests

7a1e3bd

lint

8e75633

gregcaporaso self-assigned this Nov 14, 2024

ebolyen linked an issue Dec 12, 2024 that may be closed by this pull request

Add "seed" parameter to subsample-ids #320

Open

gregcaporaso mentioned this pull request Dec 16, 2024

develop approach for storing random seeds in proveance qiime2/qiime2#823

Open

gregcaporaso reviewed Dec 16, 2024

View reviewed changes

q2_feature_table/_subsample_ids.py Show resolved Hide resolved

gregcaporaso requested changes Dec 16, 2024

View reviewed changes

gregcaporaso reviewed Dec 16, 2024

View reviewed changes

q2_feature_table/_subsample_ids.py Show resolved Hide resolved

gregcaporaso assigned VinzentRisch Dec 16, 2024

gregcaporaso mentioned this pull request Dec 16, 2024

ENH: Adds seed parameter to rarefy #321

Merged

gregcaporaso removed their assignment Dec 17, 2024

gregcaporaso requested changes Jan 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Added "seed" parameter to `subsample-ids` #317

ENH: Added "seed" parameter to `subsample-ids` #317

VinzentRisch commented Oct 23, 2024 •

edited

Loading

gregcaporaso left a comment

gregcaporaso commented Dec 16, 2024

gregcaporaso left a comment

gregcaporaso Jan 23, 2025

gregcaporaso Jan 23, 2025

gregcaporaso Jan 23, 2025

gregcaporaso Jan 23, 2025

gregcaporaso Jan 23, 2025

	axis: str, seed: Union[int, str] = 1) -> biom.Table:
	axis: str, seed: int = None) -> biom.Table:

	'seed': Int % Range(0, 2**32) \| Str % Choices(["random"])
	'seed': Int % Range(0, None)

ENH: Added "seed" parameter to subsample-ids #317

Are you sure you want to change the base?

ENH: Added "seed" parameter to subsample-ids #317

Conversation

VinzentRisch commented Oct 23, 2024 • edited Loading

gregcaporaso left a comment

Choose a reason for hiding this comment

gregcaporaso commented Dec 16, 2024

gregcaporaso left a comment

Choose a reason for hiding this comment

gregcaporaso Jan 23, 2025

Choose a reason for hiding this comment

gregcaporaso Jan 23, 2025

Choose a reason for hiding this comment

gregcaporaso Jan 23, 2025

Choose a reason for hiding this comment

gregcaporaso Jan 23, 2025

Choose a reason for hiding this comment

gregcaporaso Jan 23, 2025

Choose a reason for hiding this comment

ENH: Added "seed" parameter to `subsample-ids` #317

ENH: Added "seed" parameter to `subsample-ids` #317

VinzentRisch commented Oct 23, 2024 •

edited

Loading