Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add random_seed parameter #28

Merged
merged 3 commits into from
May 12, 2022
Merged

Add random_seed parameter #28

merged 3 commits into from
May 12, 2022

Conversation

ArnoldYSYeung
Copy link
Contributor

Allows user to input a random_seed for repeating runs. If None, does not use random_seed.
Code has been fixed to apply to every instance of np.random and rd.random. (Seed must be defined prior to every use of np.random and rd.random.)

Code has been tested on smogn/examples/smogn_example_1_beg.ipynb . Using the same random_seed value always output the same stats.

housing_smogn = smogn.smoter(
    
    data = housing,  ## pandas dataframe
    y = 'SalePrice',  ## string ('header name')
    random_seed = 1
)
smogn.box_plot_stats(housing_smogn['SalePrice'])['stats']

Allows user to input a random_seed for repeating runs. If None, does not use random_seed.
@ArnoldYSYeung
Copy link
Contributor Author

@nickkunz The modifications here have been tested with the smogn/examples/smogn_example_1_beg.ipynb notebook. Please test with more complex cases as well. Thanks.

@Muti23
Copy link

Muti23 commented Mar 22, 2022

@ArnoldYSYeung Thanks for your sharing. In fact I also used the same strategy to deal with my problem. I try to implant random seed for several rows in smoter/over_sampling files. Unfortunately it doesn't solve the problem completely, and only increase the reproducibility of test results. I don't know if you have the same situation? And I will use your modifications to do the same test.

@polinamamo
Copy link

@Muti23 my understanding after reading the code, that you need to fix seed for both np.random.seed(seed)
rd.seed(seed) at the same time, as both methods are used for random sampling.

Copy link
Owner

@nickkunz nickkunz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ArnoldYSYeung can you remove the locally generated pycache files?

smogn/over_sampling.py Outdated Show resolved Hide resolved
@nickkunz
Copy link
Owner

@ArnoldYSYeung @Muti23 @polinamamo thank you all for your input. This is very helpful for everyone using the package.

@ArnoldYSYeung
Copy link
Contributor Author

ArnoldYSYeung commented Apr 12, 2022

@Muti23 my understanding after reading the code, that you need to fix seed for both np.random.seed(seed) rd.seed(seed) at the same time, as both methods are used for random sampling.

@polinamamo Are you suggesting we should include rd.seed(seed) wherever we have np.random.seed(seed) and vice versa?

Right now, my PR only includes one when the following lines use that specific random seed. (E.g., only rd.seed(seed) is included before rd.choices(...) and only np.random.seed(seed) is included before np.random.choice(...)) From my test case, it seems this is sufficient for reproducing the results, but please let me know if that's not the case, and I can include both together whenever using either random seed 🙏🙏

- Rename `random_seed` variable to `seed`
- Remove pycache files
@ArnoldYSYeung
Copy link
Contributor Author

@nickkunz Renamed random_seed to seed and removed pycache files. Please take a look and let me know of any other changes 🙏

Copy link
Owner

@nickkunz nickkunz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ArnoldYSYeung Can we merge this to the dev branch before we merging to master?

@ArnoldYSYeung ArnoldYSYeung changed the base branch from master to dev May 12, 2022 20:40
@ArnoldYSYeung
Copy link
Contributor Author

@nickkunz Changed base branch from master to dev

@nickkunz
Copy link
Owner

@ArnoldYSYeung Thank you. I really appreciate your contribution. This needed to be done.

@nickkunz nickkunz merged commit 76c9b4c into nickkunz:dev May 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants