fix umi-tools dependency on python hash function values #470
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The umi-tools algorithm has an unfortunate dependency
on the actual hash values python generates on strings
With the anti-DOS mitigations from python 3.3,
these change with every run.
This means, unless the user set's PYTHON_HASHSEED=0,
in addition to --random-seed, umi-tools output is non deterministic.
That is surprising, and also, undocumented except in github issues.
This PR forces the set to use the equivalent of PYTHON_HASHSEED=0
in the one place that matters to the test cases.
I have not performed performance benchmarks - couldn't find such a facility
in the repo.
There is one remaining test case failing 'group_adjacency',
but it also fails with PYTHON_HASHSEED=0 on a clean checkout
of either master or 1.1.1 - there is something else going on there.