Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

text/string generators become one + removing to_csv from dataset_generator #317

Merged
merged 6 commits into from
Jul 28, 2023

Conversation

lizlouise1335
Copy link
Contributor

  • text and string generators are now just one overarching string generator
  • all files affected by this change have been remedied
  • to_csv has been removed from dataset_generator
  • testing files have been updated to reflect all above changes

from synthetic_data.distinct_generators.text_generator import random_string, random_text
from synthetic_data.distinct_generators.text_generator import random_string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice



def convert_data_to_df(
np_data: np.array,
path: Optional[str] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

Comment on lines -93 to -102
"010100011010001000001100000001111101111111110110001011101100001010000100101100001010101000100010101010101011000110010110110000011101001011101110011010000011111010001100000000110001100100000000000111001110110110101010110010010000101101110001000101010110101101001011101011110110100110100000111101010101",
"000111111101111101100010110011100000110010011100001110011100001010011001100010111000101101001010101010101111000010110111110111001000110100010100001111011111011110000100111000000100100011010110111001010011001011100110110010100000000001011111110110001001100001010100100011001110011000100101000011011111",
"111100000010101010101001100110110011000010100100101110111110001001010001000010000100110011000101000000010011011001101011011100101000100001001011011110111010010100001110001101101110110011010110111010000110000000011000100011111101001110010011110000001011100100100111010001011000101110011100110111001001",
"101010000111110001001111100000101111000100000111001100001001001110101111111010111011011100010101100001001010111010010110110100101010010101000100001001110000001010111100010100101110011100011100111000110110011110111110001000111011110010000111100000110010001110101100101110111111000011001100111111000011",
"011110101101010111010100100001101000000101000001000100110011100011000100011111101100100101111000111000101111101101000010100011110010010111110011011010000000001111111101101111110001110110110100010111111001000000101101101101000000001000001001101000010001011111011001111011101011011000100010001001010111",
"111100110001011101000101011000110001001100101100101100000110010011001011010110001010111010010111111111100011101110010011001101101000000011000100101001110010110101010001011111001110110111001111111110001000110110101000010001111001100111100110110101101100110100010011011011100110010110100001010100001000",
"111000010000010000001100001101000011001001010110100100111000101100101110110111000010011101010110101011111101011011110110111100111011100110011011111011111001110001101000101001100000101010000010111100101110100001000011011011000001101010000010000001110111010010001101011100100101101110001111101001000111",
"010001100011001001101011000010000111011010011000110000111110000000000000101101111011101000011001010111110100000010100000000100110001001000110010010011110001111011101111101001011111000000000011000110100000011010111001000001000110111011111000011111010011011000111100000001111100011000011111000000001000",
"000011011010101010010011011001001111001000000001111110111101010000011101101000000110111001000101110001011100101110001000100001110101001011110110101000110101000100100010011011000010000111101001111000000011011000100011010100001111111111110010011101110010101010010010110011110011001010100000111111110001",
"101111001110001101001010001110100100010010001011110100110000100001000100010100110001110010000111100100010010011010011111000101001110111001111000011011011011111100101010110111100110000111010000100001100111111111010001011100010010100111101010010100011011110110101110111111111000000100110001110011010000",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so beautiful ❤️

tests/test_dataset_generator.py Outdated Show resolved Hide resolved
auto-merge was automatically disabled July 28, 2023 17:39

Head branch was pushed to by a user without write access

@taylorfturner taylorfturner enabled auto-merge July 28, 2023 17:40
auto-merge was automatically disabled July 28, 2023 17:46

Head branch was pushed to by a user without write access

@taylorfturner taylorfturner enabled auto-merge July 28, 2023 17:59
@@ -89,52 +81,18 @@ def test_generate_custom_dataset(self):
]
),
np.array(
[
"010100011010001000001100000001111101111111110110001011101100001010000100101100001010101000100010101010101011000110010110110000011101001011101110011010000011111010001100000000110001100100000000000111001110110110101010110010010000101101110001000101010110101101001011101011110110100110100000111101010101",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

love this removal

),
np.array(
["01", "010", "110", "0111", "1001", "001", "0100", "0111", "11", "101"]
["10", "0001", "0100", "10", "000", "100", "00", "01", "1110", "1111"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changing bc string isn't as long

@@ -1,49 +0,0 @@
import unittest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would suggest doing text over string bc it matches DP

@JGSweets JGSweets disabled auto-merge July 28, 2023 18:11
@JGSweets JGSweets merged commit d3a115d into capitalone:feature/simple-tabular-generator Jul 28, 2023
taylorfturner pushed a commit to taylorfturner/synthetic-data that referenced this pull request Aug 10, 2023
…rator (capitalone#317)

* Getting rid of csv stuff

* merged string/txt generators and fixed all files affected by that merge

* docstring update and test loop deleted

* renamed file

* forgot this

* more renaming
taylorfturner added a commit that referenced this pull request Aug 23, 2023
* Datetime generator and tests

* mock added

* clean up comments

* fix: add feature to test workflow

* git ignore and rm DSstore (#291)

* git ignore and rm DSstore

* Update .gitignore

Co-authored-by: Taylor Turner <taylorfturner@gmail.com>

---------

Co-authored-by: Taylor Turner <taylorfturner@gmail.com>

* pre-made list

* start and end type specification

* removing unneeded space

* better name for a function

* better format catch from michael

* better format catch from michael

* changed to equate better across languages

* docstring fix and update

* default values at declaration

* del space

* testing for format usage

* testing for format usage

* Float generator + tests (#292)

* fFloat generator

* extra line

* another line

* Update tests/test_float_generator.py

Co-authored-by: Michael Davis <36012613+micdavis@users.noreply.github.com>

* another line

* readability per michael's request

* clean up

* assertGreaterEqual

* better test_sig_figs

* sig_fig protection

* clearer assert

---------

Co-authored-by: Michael Davis <36012613+micdavis@users.noreply.github.com>

* categorical test + gen

* space and ensuring num_rows does not exceed nbr of categories

* space fix

* second row test

* fixed merge conflicts

* random int generator and test

* refactored implementation of int generator and tests

* added space between class and imports. added more test cases

* add eof

* new eof line

* one new line eof

* distinct generator test folder

* add module to distinct_generators

* rename folder

* text generator and test (#295)

* text generator and test

* fixed style and 256 to 255 error

* fixed eof

* new eof line

* one line eof

* updated num_rows test with multiple row counts

* removed whitespace in eof

* moved text_generator test into distinct_gen test folder

* pre-commit BLACK (#297)

* pre-commit BLACK

* added pre-commit to makefile and README and requirement-test

* added eof

* passed pre-commit checks for black hook

* file move

* just moving files that have already been merged into right folders

* directory change in files

* variable name change

* Update tests/distinct_generators/test_float_generator.py

Co-authored-by: Taylor Turner <taylorfturner@gmail.com>

* fixes

* black

* added isort

* fixed overriding issues with iSort and black hooks

* added eof whitespace

* changed exclude

* removed skip in isort setup.cfg

* included venv skip for isort

* fixed the skip folder name

* precommit stuff

* docs

* docs

* docs

* initial fixes for flake8 errors

* flake8 fixes

* Fixes to some docstrings and the bare except

* fixed docstrings

* removed noqa and a few other fixes

* check-manifest
Fixed conflicts by rebasing changes with flake8 changes.

* removed setup and added eof line

* added setup.cfg

* removed setup

* Categorical Generator w Probabilities (#308)

* Now handles probabilities

* Update tests/distinct_generators/test_categorical_generator.py

Co-authored-by: Taylor Turner <taylorfturner@gmail.com>

* no prints, new tests, and yeah

* regex

* pre commit stuff and line

* type change and doc update

* docs

* pre commit

---------

Co-authored-by: Taylor Turner <taylorfturner@gmail.com>

* whitespace hook added
fixed merge conflict with previous merge of check-manifest

* Revert "whitespace hook added"

This reverts commit 842feeb.

* added examples to exclude for general fixers hook

* added pyupgrade hook

* added autoflake hook (#312)

* added autoflake hook

* removed passes

* lots of stuff

* formatting

* isort

* removing ordered stuff until richard is done

* regex

* pr requests

* Update synthetic_data/dataset_generator.py

Co-authored-by: Taylor Turner <taylorfturner@gmail.com>

* len assert

* new test for path

* start/end date generator none type test

* start/end date generator none type test

* updates

* doc string

* whoops

* no more successful value error

* last commit? plz?

* generator parameter order updates plus DS generator update

* whoops

* error change

* datetime test update

* revert

* Empty colums_to_generate triggers warning, tests for **col_

* trying to fix github

* ahhh

* update

* whoops

* Name option for columns

* WIP tests

* fixed tests

* Finished tests

* ok now the tests are actually done

* test update per taylor's request

* small change to test

* last commit? plz?

* last commit? ;__;

* ok now it's last one

* text/string generators become one + removing to_csv from dataset_generator (#317)

* Getting rid of csv stuff

* merged string/txt generators and fixed all files affected by that merge

* docstring update and test loop deleted

* renamed file

* forgot this

* more renaming

* update string --> text (#321)

* update string --> text

* variable

* update var name

* connected tabular generator with dataset_generator

* tests

* empty

* updated tests and synthetic_data

* added get_ordered_column

* renamed vars, added more datetime formats in tests, added integration test between dataset_generator and get_ordered_column

* renamed data to actual_data

* added log error for not correct sorting option. added docstrings to get_ordered_col

* refactored log

* edge case for when sort is none, don't log error

* Tests for logging

* removed pass

* fixed typo

* removed redundant test

* renamed variables and removed redundant test case

* renamed variables again

* major datetime_test overhaul

* pre-commit failed

* minor fixes

* refactored tests

* changed assert and removed passes

* empty

* added get_ordered_column

* renamed vars, added more datetime formats in tests, added integration test between dataset_generator and get_ordered_column

* renamed data to actual_data

* added log error for not correct sorting option. added docstrings to get_ordered_col

* refactored log

* edge case for when sort is none, don't log error

* Tests for logging

* removed pass

* fixed typo

* removed redundant test

* renamed variables and removed redundant test case

* renamed variables again

* major datetime_test overhaul

* pre-commit failed

* minor fixes

* refactored tests

* changed assert and removed passes

* pre-commits

* connected dataset_generator to tabular_generator. Small fix to int generator

* pre-commits

* empty

* check what's happenignwith float

* added float

* added test cases

* refactored tests, fixed edge cases, and refactored synthesize method

* fixed issue with generate_columns, made tests DRYer, and edited test case

* removed unnecessary data and renamed var

* pre-commit

* updated test and col_data var. added tests to generators for edgecase

* changed tests for text and int gen and changed var name in test_generators

* readded test

* major refactor to tabular generator

* fixed pre-commits

* fixed distinct generator tests

* fixed edge case in distinct gens, docs, edge case for none generator, refactored uncorrelated_synthesize function, and implemented parameters test

* fixed a few test cases, removed default param values, and made uncorrelated_synthesize private

* Revert "fixed a few test cases, removed default param values, and made uncorrelated_synthesize private"

This reverts commit caabdbf.

* added fixes from prev reverted commit

* removed prints

* broken test updates:

* categorical fix

* int string error

* tests for get_ordered_column_integration, uncorrelated_synthesize outputs, and params_build

* remove print statements

* fixed precision edge case of int

* reintegrated outdated tests

* added test case for None columns

* removed print

* changed test to f string

* fixed docstrings for datetime generator

* empty commit

---------

Co-authored-by: lizlouise1335 <liz.smith@richmond.edu>
Co-authored-by: Jeremy Goodsitt <jeremy.goodsitt@gmail.com>
Co-authored-by: Richard Bann <87214439+drahc1R@users.noreply.github.com>
Co-authored-by: Michael Davis <36012613+micdavis@users.noreply.github.com>
Co-authored-by: TCH323 <richard@bann.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants