Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(DataQuality): Reproducible randomness in all engines #16

Merged
merged 5 commits into from
Sep 9, 2021

Conversation

jfsantos-ds
Copy link
Contributor

Base classes and all engines have an additional argument, random state.
Set by default to 42 it will assure all results are reproducible.
By setting this argument to None, true randomness (irreproducible) is restored

@jfsantos-ds jfsantos-ds self-assigned this Sep 8, 2021
@UrbanoFonseca UrbanoFonseca added feature A new feature stability Changes to improve robustness labels Sep 8, 2021
@UrbanoFonseca
Copy link
Contributor

Nice work! this will help a lot on regression testing and the overall robustness of the engines

Key takeaways:

  • We should default random_state to None in the code and to fixed value in the tutorials
  • We either fix the random_state in the needed tutorials (all tests with deterministic results) or add this task to the backlog

src/ydata_quality/core/data_quality.py Outdated Show resolved Hide resolved
src/ydata_quality/core/engine.py Outdated Show resolved Hide resolved
src/ydata_quality/core/engine.py Show resolved Hide resolved
src/ydata_quality/drift/engine.py Outdated Show resolved Hide resolved
src/ydata_quality/core/engine.py Outdated Show resolved Hide resolved
src/ydata_quality/duplicates/engine.py Outdated Show resolved Hide resolved
src/ydata_quality/utils/auxiliary.py Show resolved Hide resolved
src/ydata_quality/utils/auxiliary.py Show resolved Hide resolved
tutorials/bias_fairness.ipynb Outdated Show resolved Hide resolved
@UrbanoFonseca
Copy link
Contributor

btw, why the change of convention in the branch naming?

src/ydata_quality/core/data_quality.py Outdated Show resolved Hide resolved
src/ydata_quality/core/data_quality.py Outdated Show resolved Hide resolved
src/ydata_quality/core/engine.py Outdated Show resolved Hide resolved
src/ydata_quality/duplicates/engine.py Outdated Show resolved Hide resolved
Copy link
Contributor

@UrbanoFonseca UrbanoFonseca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final minor remarks. Feel free to merge afterwards 🚀

@jfsantos-ds jfsantos-ds merged commit a949853 into master Sep 9, 2021
@jfsantos-ds jfsantos-ds deleted the feat/ReproducibleRandomness branch September 9, 2021 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A new feature stability Changes to improve robustness
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants