-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added seed library variable #271
Conversation
dataprofiler/profilers/utils.py
Outdated
seed_value = int(settings._seed) | ||
rng = np.random.default_rng(seed_value) | ||
except ValueError as e: | ||
warnings.warn("Seed should be an integer", RuntimeWarning) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should reference that it should be >= 0
dataprofiler/profilers/utils.py
Outdated
if settings._seed is not None: | ||
try: | ||
seed_value = int(settings._seed) | ||
rng = np.random.default_rng(seed_value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in this case, probably don't need to set an intermediate variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seed_value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, we should assume that settings._seed is in its correct form. either None
or int >= 0
@@ -3,6 +3,7 @@ | |||
import random | |||
|
|||
import numpy as np | |||
from dataprofiler import settings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as above with the imports, should be a space between these imports since numpy is a 3rd party lib
np.random.seed(seed) | ||
random.seed(seed) | ||
settings._seed = seed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should also set the seed for np.random and random.seed in addition to the settings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or maybe we should remove this and the set_seed func in the init should set these as well.
In this case, we would likely move the set_seed into its own file and import set_seed from the file so that it could be accessed directly via dp.set_seed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@granteden @ChrisWallace2020 thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.g. a utils
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also utils shouldn't directly alter the _seed
but utilize the dp.set_seed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be a solution in one place, rather than top level in init and here. Because if you are just setting _seed
here then what is the use case for the one in __init__
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, now that i think about it, i think we leave it out of __init__
reason being, the user can choose to specify those externally. If we alter it globally, we could be altering it for someone else.
dataprofiler/__init__.py
Outdated
@@ -22,3 +23,6 @@ | |||
'\tsudo apt-get -y install libsnappy-dev`\n', | |||
ImportWarning | |||
) | |||
|
|||
def set_seed(seed=None): | |||
settings._seed = seed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need a blank new line at the end of the file
dataprofiler/settings.py
Outdated
@@ -0,0 +1 @@ | |||
_seed = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need an empty new line at the end of the file
Head branch was pushed to by a user without write access
Head branch was pushed to by a user without write access
dataprofiler/__init__.py
Outdated
|
||
def set_seed(seed=None): | ||
# also check it's an integer | ||
if seed is not None and (seed < 0 or int(seed)!=float(seed)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if seed were a {}
it would not have a <
operator. hence i think we need to alter this to check specifically that it is an int
e.g. if seed is not None and (not isinstance(seed, int) or seed < 0):
dataprofiler/__init__.py
Outdated
if seed is not None and (seed < 0 or int(seed)!=float(seed)): | ||
raise ValueError("Seed should be a non-negative integer.") | ||
settings._seed = seed | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line might have whitespaces in it
dataprofiler/profilers/utils.py
Outdated
|
||
rng = np.random.default_rng() | ||
|
||
rng = np.random.default_rng(int(settings._seed)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we already check in dp.set_seed
, remove the int
from here
Head branch was pushed to by a user without write access
@@ -7,6 +7,7 @@ | |||
UnstructuredDataLabeler | |||
from .validators.base_validators import Validator | |||
from .version import __version__ | |||
from . import settings | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two spaces between imports and code
dataprofiler/tests/labelers/utils.py
Outdated
|
||
def set_seed(seed=None): | ||
""" | ||
Sets the see for all possible random state libraries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo, I'm guessing this is supposed to be "sets the seed"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its also replicated in the other set_seed
function
@@ -13,6 +15,7 @@ def set_seed(seed=None): | |||
""" | |||
np.random.seed(seed) | |||
random.seed(seed) | |||
dp.set_seed(seed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this written for profilers and labelers when its the same function? Is there a reason this isn't more centrally located?
rng = np.random.default_rng() | ||
if 'DATAPROFILER_SEED' in os.environ: | ||
|
||
rng = np.random.default_rng(settings._seed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we want to make this so that there is only want rng call instead of potentially 2?
dataprofiler/tests/labelers/utils.py
Outdated
np.random.seed(seed) | ||
random.seed(seed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason these things aren't also set in the top level __init__
function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then these test utils just call that one? I understand this was a whole discussion previously, I am just unaware of the conclusion and the structure that was decided upon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want to globally set the seed for other libraries.
Head branch was pushed to by a user without write access
* Added seed library var * Checking testing environment * Added tests and removed unnecessary seed calls * If user sets seed don't look for one in the environment * Added file changes * Made small changes
No description provided.