Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove dependence upon PYTHONHASHSEED #541

Closed
mbauman opened this issue Apr 12, 2017 · 8 comments
Closed

Remove dependence upon PYTHONHASHSEED #541

mbauman opened this issue Apr 12, 2017 · 8 comments

Comments

@mbauman
Copy link
Contributor

mbauman commented Apr 12, 2017

The result from the search for predicates depends upon the initial order of the set of predicates. That order depends upon the value of PYTHONHASHSEED since it's using a set. I've not thoroughly looked for other places where this causes a dependency, but it'd be nice to support fully reproducible runs with just random.seed and numpy.random.seed.

@fgregg
Copy link
Contributor

fgregg commented Apr 12, 2017

what would that look like?

@mbauman
Copy link
Contributor Author

mbauman commented Apr 12, 2017

Something like orderedset would be sufficient — it just uses insertion order. Of course the predicate selection algorithm still shuffles it, but the initial state is now deterministic. It's not a big deal, but it'd make reproducible runs a little easier since it wouldn't require starting a new python instance.

@fgregg
Copy link
Contributor

fgregg commented Apr 12, 2017

what's the performance cost for that?

@mbauman
Copy link
Contributor Author

mbauman commented Apr 12, 2017

I'm not sure; I've not evaluated any potential solutions yet. Theoretically it may not have a major cost — in fact Python's builtin dictionaries recently moved to an ordered implementation since they found its performance to be advantageous.

@fgregg
Copy link
Contributor

fgregg commented Apr 12, 2017 via email

@mbauman
Copy link
Contributor Author

mbauman commented Apr 12, 2017

Do you have a benchmark suite? Or any micro-benchmarks? Or would you just be interested in the overall run time for a complete deduplication run?

@fgregg
Copy link
Contributor

fgregg commented Apr 12, 2017 via email

@fgregg
Copy link
Contributor

fgregg commented Dec 28, 2017

no response on this, so I'm closing

@fgregg fgregg closed this as completed Dec 28, 2017
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 8, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants