-
Notifications
You must be signed in to change notification settings - Fork 320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
generating different synthetic data while training the model multiple times. #299
Comments
csala
added
question
General question about the software
and removed
feature request
Request for a new feature
pending review
labels
Jan 21, 2021
Interesting question @Amanhelloworld In most cases, you can ensure reproducibility by fixing the numpy and torch seeds, as follows: np.random.seed(SEED_VALUE)
torch.manual_seed(SEED_VALUE) Here's an example: In [1]: import numpy as np
...: import torch
...: from sdv.demo import load_tabular_demo
...: from sdv.tabular import CTGAN
...:
...: data = load_tabular_demo('student_placements')
In [2]: torch.manual_seed(0)
...: np.random.seed(0)
...: model = CTGAN(epochs=10)
...: model.fit(data)
...: model.sample(5)
Out[2]:
student_id gender second_perc high_perc high_spec degree_perc degree_type work_experience experience_years employability_perc mba_spec mba_perc salary placed start_date end_date duration
0 17433 M 69.588847 55.257127 Commerce 68.440825 Comm&Mgmt False 0 56.604584 Mkt&HR 51.130539 37699.395301 False 2020-09-10 2020-12-09 NaN
1 17395 M 46.089210 61.477286 Commerce 75.891097 Others False 1 61.801707 Mkt&HR 58.038007 32413.742727 True NaT 2020-08-22 3.0
2 17301 F 72.407853 58.146130 Arts 85.528594 Comm&Mgmt True 0 48.795626 Mkt&Fin 67.373889 NaN True 2020-02-19 2020-06-11 3.0
3 17323 M 70.313107 45.468931 Commerce 57.623638 Comm&Mgmt True 0 41.398895 Mkt&HR 55.129773 28015.573652 True 2020-03-07 2019-12-12 NaN
4 17483 M 56.702416 91.571410 Science 76.770451 Comm&Mgmt False 1 73.093578 Mkt&HR 59.265596 42264.083767 True 2020-02-12 2020-08-02 6.0
In [3]: torch.manual_seed(0)
...: np.random.seed(0)
...: model = CTGAN(epochs=10)
...: model.fit(data)
...: model.sample(5)
Out[3]:
student_id gender second_perc high_perc high_spec degree_perc degree_type work_experience experience_years employability_perc mba_spec mba_perc salary placed start_date end_date duration
0 17433 M 69.588847 55.257127 Commerce 68.440825 Comm&Mgmt False 0 56.604584 Mkt&HR 51.130539 37699.395301 False 2020-09-10 2020-12-09 NaN
1 17395 M 46.089210 61.477286 Commerce 75.891097 Others False 1 61.801707 Mkt&HR 58.038007 32413.742727 True NaT 2020-08-22 3.0
2 17301 F 72.407853 58.146130 Arts 85.528594 Comm&Mgmt True 0 48.795626 Mkt&Fin 67.373889 NaN True 2020-02-19 2020-06-11 3.0
3 17323 M 70.313107 45.468931 Commerce 57.623638 Comm&Mgmt True 0 41.398895 Mkt&HR 55.129773 28015.573652 True 2020-03-07 2019-12-12 NaN
4 17483 M 56.702416 91.571410 Science 76.770451 Comm&Mgmt False 1 73.093578 Mkt&HR 59.265596 42264.083767 True 2020-02-12 2020-08-02 6.0 |
Closing this, as the question was already responded long ago. |
This was referenced Sep 9, 2021
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi team,
Thank you very much for a great package like this. I am using this package for one of my project where I am showing the results on synthetic data.
I have some issue while training the model.
1 . I wanted to train my model to generated synthetic data sample for example I am using CTGAN. The quality of data is
different every time if I re-trained my model.
2. If I retrain the model again there is huge difference in the generated synthetic data and when I use this synthetic data for
other tasks, there is large performance gaps in the results.
So, is there any way to make the model consistent across multiple run. I could use seed or saving the model so that my model wont change much, but the problem is if someone else what to do the same experiment then on his machine they will get the different results which wont match with mine.
if you could tell me how it can be solve, It would be very helpful for my project.
The text was updated successfully, but these errors were encountered: