-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update string --> text #321
update string --> text #321
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename whole file
@@ -12,7 +12,7 @@ | |||
from synthetic_data.distinct_generators.datetime_generator import random_datetimes | |||
from synthetic_data.distinct_generators.float_generator import random_floats | |||
from synthetic_data.distinct_generators.int_generator import random_integers | |||
from synthetic_data.distinct_generators.string_generator import random_string | |||
from synthetic_data.distinct_generators.text_generator import random_text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix imports
"string": random_string, | ||
"text": random_text, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix mapping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename whole file to test_text_generator.py
below
from synthetic_data.distinct_generators.text_generator import random_text | ||
|
||
|
||
class TestTextGeneratorFunctions(unittest.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename class name
import numpy as np | ||
import pandas as pd | ||
|
||
from synthetic_data.distinct_generators.text_generator import random_text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename import
self.rng = np.random.default_rng(12345) | ||
|
||
def test_return_type(self): | ||
text_arr = random_text(self.rng) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
variable array naming from str_arr
to text_arr
"generator": "string", | ||
"name": "str", | ||
"generator": "text", | ||
"name": "txt", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename for consistency
expected_df = pd.DataFrame.from_dict( | ||
dict(zip(["int", "dat", "str", "cat", "flo"], expected_data)) | ||
dict(zip(["int", "dat", "txt", "cat", "flo"], expected_data)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename so text passes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* update string --> text * variable * update var name
* Datetime generator and tests * mock added * clean up comments * fix: add feature to test workflow * git ignore and rm DSstore (#291) * git ignore and rm DSstore * Update .gitignore Co-authored-by: Taylor Turner <taylorfturner@gmail.com> --------- Co-authored-by: Taylor Turner <taylorfturner@gmail.com> * pre-made list * start and end type specification * removing unneeded space * better name for a function * better format catch from michael * better format catch from michael * changed to equate better across languages * docstring fix and update * default values at declaration * del space * testing for format usage * testing for format usage * Float generator + tests (#292) * fFloat generator * extra line * another line * Update tests/test_float_generator.py Co-authored-by: Michael Davis <36012613+micdavis@users.noreply.github.com> * another line * readability per michael's request * clean up * assertGreaterEqual * better test_sig_figs * sig_fig protection * clearer assert --------- Co-authored-by: Michael Davis <36012613+micdavis@users.noreply.github.com> * categorical test + gen * space and ensuring num_rows does not exceed nbr of categories * space fix * second row test * fixed merge conflicts * random int generator and test * refactored implementation of int generator and tests * added space between class and imports. added more test cases * add eof * new eof line * one new line eof * distinct generator test folder * add module to distinct_generators * rename folder * text generator and test (#295) * text generator and test * fixed style and 256 to 255 error * fixed eof * new eof line * one line eof * updated num_rows test with multiple row counts * removed whitespace in eof * moved text_generator test into distinct_gen test folder * pre-commit BLACK (#297) * pre-commit BLACK * added pre-commit to makefile and README and requirement-test * added eof * passed pre-commit checks for black hook * file move * just moving files that have already been merged into right folders * directory change in files * variable name change * Update tests/distinct_generators/test_float_generator.py Co-authored-by: Taylor Turner <taylorfturner@gmail.com> * fixes * black * added isort * fixed overriding issues with iSort and black hooks * added eof whitespace * changed exclude * removed skip in isort setup.cfg * included venv skip for isort * fixed the skip folder name * precommit stuff * docs * docs * docs * initial fixes for flake8 errors * flake8 fixes * Fixes to some docstrings and the bare except * fixed docstrings * removed noqa and a few other fixes * check-manifest Fixed conflicts by rebasing changes with flake8 changes. * removed setup and added eof line * added setup.cfg * removed setup * Categorical Generator w Probabilities (#308) * Now handles probabilities * Update tests/distinct_generators/test_categorical_generator.py Co-authored-by: Taylor Turner <taylorfturner@gmail.com> * no prints, new tests, and yeah * regex * pre commit stuff and line * type change and doc update * docs * pre commit --------- Co-authored-by: Taylor Turner <taylorfturner@gmail.com> * whitespace hook added fixed merge conflict with previous merge of check-manifest * Revert "whitespace hook added" This reverts commit 842feeb. * added examples to exclude for general fixers hook * added pyupgrade hook * added autoflake hook (#312) * added autoflake hook * removed passes * lots of stuff * formatting * isort * removing ordered stuff until richard is done * regex * pr requests * Update synthetic_data/dataset_generator.py Co-authored-by: Taylor Turner <taylorfturner@gmail.com> * len assert * new test for path * start/end date generator none type test * start/end date generator none type test * updates * doc string * whoops * no more successful value error * last commit? plz? * generator parameter order updates plus DS generator update * whoops * error change * datetime test update * revert * Empty colums_to_generate triggers warning, tests for **col_ * trying to fix github * ahhh * update * whoops * Name option for columns * WIP tests * fixed tests * Finished tests * ok now the tests are actually done * test update per taylor's request * small change to test * last commit? plz? * last commit? ;__; * ok now it's last one * text/string generators become one + removing to_csv from dataset_generator (#317) * Getting rid of csv stuff * merged string/txt generators and fixed all files affected by that merge * docstring update and test loop deleted * renamed file * forgot this * more renaming * update string --> text (#321) * update string --> text * variable * update var name * connected tabular generator with dataset_generator * tests * empty * updated tests and synthetic_data * added get_ordered_column * renamed vars, added more datetime formats in tests, added integration test between dataset_generator and get_ordered_column * renamed data to actual_data * added log error for not correct sorting option. added docstrings to get_ordered_col * refactored log * edge case for when sort is none, don't log error * Tests for logging * removed pass * fixed typo * removed redundant test * renamed variables and removed redundant test case * renamed variables again * major datetime_test overhaul * pre-commit failed * minor fixes * refactored tests * changed assert and removed passes * empty * added get_ordered_column * renamed vars, added more datetime formats in tests, added integration test between dataset_generator and get_ordered_column * renamed data to actual_data * added log error for not correct sorting option. added docstrings to get_ordered_col * refactored log * edge case for when sort is none, don't log error * Tests for logging * removed pass * fixed typo * removed redundant test * renamed variables and removed redundant test case * renamed variables again * major datetime_test overhaul * pre-commit failed * minor fixes * refactored tests * changed assert and removed passes * pre-commits * connected dataset_generator to tabular_generator. Small fix to int generator * pre-commits * empty * check what's happenignwith float * added float * added test cases * refactored tests, fixed edge cases, and refactored synthesize method * fixed issue with generate_columns, made tests DRYer, and edited test case * removed unnecessary data and renamed var * pre-commit * updated test and col_data var. added tests to generators for edgecase * changed tests for text and int gen and changed var name in test_generators * readded test * major refactor to tabular generator * fixed pre-commits * fixed distinct generator tests * fixed edge case in distinct gens, docs, edge case for none generator, refactored uncorrelated_synthesize function, and implemented parameters test * fixed a few test cases, removed default param values, and made uncorrelated_synthesize private * Revert "fixed a few test cases, removed default param values, and made uncorrelated_synthesize private" This reverts commit caabdbf. * added fixes from prev reverted commit * removed prints * broken test updates: * categorical fix * int string error * tests for get_ordered_column_integration, uncorrelated_synthesize outputs, and params_build * remove print statements * fixed precision edge case of int * reintegrated outdated tests * added test case for None columns * removed print * changed test to f string * fixed docstrings for datetime generator * empty commit --------- Co-authored-by: lizlouise1335 <liz.smith@richmond.edu> Co-authored-by: Jeremy Goodsitt <jeremy.goodsitt@gmail.com> Co-authored-by: Richard Bann <87214439+drahc1R@users.noreply.github.com> Co-authored-by: Michael Davis <36012613+micdavis@users.noreply.github.com> Co-authored-by: TCH323 <richard@bann.com>
simply chaning from
string
andrandom_string
totext
andrandom_text
for naming consistency between generators and the data profiler column profile (i.e.text_column_profile
)