Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added seed library variable #271

Merged
merged 8 commits into from
Jun 21, 2021
Merged

Added seed library variable #271

merged 8 commits into from
Jun 21, 2021

Conversation

gautomdas
Copy link
Contributor

No description provided.

@CLAassistant
Copy link

CLAassistant commented Jun 16, 2021

CLA assistant check
All committers have signed the CLA.

seed_value = int(settings._seed)
rng = np.random.default_rng(seed_value)
except ValueError as e:
warnings.warn("Seed should be an integer", RuntimeWarning)
Copy link
Contributor

@JGSweets JGSweets Jun 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should reference that it should be >= 0

if settings._seed is not None:
try:
seed_value = int(settings._seed)
rng = np.random.default_rng(seed_value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this case, probably don't need to set an intermediate variable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seed_value

Copy link
Contributor

@JGSweets JGSweets Jun 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, we should assume that settings._seed is in its correct form. either None or int >= 0

@@ -3,6 +3,7 @@
import random

import numpy as np
from dataprofiler import settings
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as above with the imports, should be a space between these imports since numpy is a 3rd party lib

Comment on lines 14 to 15
np.random.seed(seed)
random.seed(seed)
settings._seed = seed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should also set the seed for np.random and random.seed in addition to the settings

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or maybe we should remove this and the set_seed func in the init should set these as well.

In this case, we would likely move the set_seed into its own file and import set_seed from the file so that it could be accessed directly via dp.set_seed

Copy link
Contributor

@JGSweets JGSweets Jun 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. a utils

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also utils shouldn't directly alter the _seed but utilize the dp.set_seed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be a solution in one place, rather than top level in init and here. Because if you are just setting _seed here then what is the use case for the one in __init__?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, now that i think about it, i think we leave it out of __init__ reason being, the user can choose to specify those externally. If we alter it globally, we could be altering it for someone else.

@JGSweets JGSweets enabled auto-merge (squash) June 16, 2021 17:07
@@ -22,3 +23,6 @@
'\tsudo apt-get -y install libsnappy-dev`\n',
ImportWarning
)

def set_seed(seed=None):
settings._seed = seed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need a blank new line at the end of the file

@@ -0,0 +1 @@
_seed = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need an empty new line at the end of the file

auto-merge was automatically disabled June 17, 2021 16:41

Head branch was pushed to by a user without write access

@JGSweets JGSweets enabled auto-merge (squash) June 17, 2021 19:04
auto-merge was automatically disabled June 17, 2021 19:27

Head branch was pushed to by a user without write access

@JGSweets JGSweets enabled auto-merge (squash) June 17, 2021 19:31

def set_seed(seed=None):
# also check it's an integer
if seed is not None and (seed < 0 or int(seed)!=float(seed)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if seed were a {} it would not have a < operator. hence i think we need to alter this to check specifically that it is an int
e.g. if seed is not None and (not isinstance(seed, int) or seed < 0):

if seed is not None and (seed < 0 or int(seed)!=float(seed)):
raise ValueError("Seed should be a non-negative integer.")
settings._seed = seed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line might have whitespaces in it


rng = np.random.default_rng()

rng = np.random.default_rng(int(settings._seed))
Copy link
Contributor

@JGSweets JGSweets Jun 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we already check in dp.set_seed, remove the int from here

auto-merge was automatically disabled June 17, 2021 19:45

Head branch was pushed to by a user without write access

@@ -7,6 +7,7 @@
UnstructuredDataLabeler
from .validators.base_validators import Validator
from .version import __version__
from . import settings

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two spaces between imports and code

@JGSweets JGSweets added High Priority Dramatic improvement, inaccurate calculation(s) or bug / feature making the library unusable New Feature A feature addition not currently in the library labels Jun 17, 2021
@JGSweets JGSweets enabled auto-merge (squash) June 21, 2021 15:27
JGSweets
JGSweets previously approved these changes Jun 21, 2021

def set_seed(seed=None):
"""
Sets the see for all possible random state libraries
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo, I'm guessing this is supposed to be "sets the seed"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its also replicated in the other set_seed function

@@ -13,6 +15,7 @@ def set_seed(seed=None):
"""
np.random.seed(seed)
random.seed(seed)
dp.set_seed(seed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this written for profilers and labelers when its the same function? Is there a reason this isn't more centrally located?

rng = np.random.default_rng()
if 'DATAPROFILER_SEED' in os.environ:

rng = np.random.default_rng(settings._seed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we want to make this so that there is only want rng call instead of potentially 2?

Comment on lines 16 to 17
np.random.seed(seed)
random.seed(seed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason these things aren't also set in the top level __init__ function?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then these test utils just call that one? I understand this was a whole discussion previously, I am just unaware of the conclusion and the structure that was decided upon.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to globally set the seed for other libraries.

auto-merge was automatically disabled June 21, 2021 16:02

Head branch was pushed to by a user without write access

@JGSweets JGSweets enabled auto-merge (squash) June 21, 2021 16:05
@JGSweets JGSweets merged commit 5e0fb30 into capitalone:main Jun 21, 2021
stevensecreti pushed a commit to stevensecreti/DataProfiler that referenced this pull request Jun 15, 2022
* Added seed library var

* Checking testing environment

* Added tests and removed unnecessary seed calls

* If user sets seed don't look for one in the environment

* Added file changes

* Made small changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
High Priority Dramatic improvement, inaccurate calculation(s) or bug / feature making the library unusable New Feature A feature addition not currently in the library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants