Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synthetic data jorge pedroza #9

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

RenHook
Copy link
Collaborator

@RenHook RenHook commented Nov 12, 2024

No description provided.

Synthetic data generation from the file: ENB2012_data.csv

1. Loads the original ENB2012 dataset
2. Generates synthetic data that preserves:

- Statistical distributions of each feature
- Correlations between features
- Domain constraints (e.g., positive values for areas)


Includes:
- Validation to ensure the synthetic data matches the original distribution
- Outputs the data in CSV format ready for your MLops pipeline

To use this script:

1. Place your ENB2012_data.csv file in the same directory as the script
2. Run the script to generate synthetic_ENB2012_data.csv
3. The script will print validation metrics showing how well the synthetic data matches the original

The synthetic data preserves important properties like:

* Relative Compactness staying between 0 and 1
* Orientation values at 45-degree intervals
* Positive values for areas and loads
* Discrete values for Glazing Area Distribution
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant