Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update string --> text #321

Conversation

taylorfturner
Copy link
Contributor

@taylorfturner taylorfturner commented Jul 28, 2023

simply chaning from string and random_string to text and random_text for naming consistency between generators and the data profiler column profile (i.e. text_column_profile)

@taylorfturner taylorfturner added the enhancement New feature or request label Jul 28, 2023
@taylorfturner taylorfturner self-assigned this Jul 28, 2023
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename whole file

@@ -12,7 +12,7 @@
from synthetic_data.distinct_generators.datetime_generator import random_datetimes
from synthetic_data.distinct_generators.float_generator import random_floats
from synthetic_data.distinct_generators.int_generator import random_integers
from synthetic_data.distinct_generators.string_generator import random_string
from synthetic_data.distinct_generators.text_generator import random_text
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix imports

"string": random_string,
"text": random_text,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix mapping

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename whole file to test_text_generator.py below

from synthetic_data.distinct_generators.text_generator import random_text


class TestTextGeneratorFunctions(unittest.TestCase):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename class name

import numpy as np
import pandas as pd

from synthetic_data.distinct_generators.text_generator import random_text
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename import

self.rng = np.random.default_rng(12345)

def test_return_type(self):
text_arr = random_text(self.rng)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

variable array naming from str_arr to text_arr

Comment on lines -25 to +26
"generator": "string",
"name": "str",
"generator": "text",
"name": "txt",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename for consistency

Comment on lines 91 to +92
expected_df = pd.DataFrame.from_dict(
dict(zip(["int", "dat", "str", "cat", "flo"], expected_data))
dict(zip(["int", "dat", "txt", "cat", "flo"], expected_data))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename so text passes

Copy link
Contributor

@tazitoo tazitoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@danielbarcklow danielbarcklow merged commit a660e67 into capitalone:feature/simple-tabular-generator Jul 28, 2023
taylorfturner added a commit to taylorfturner/synthetic-data that referenced this pull request Aug 10, 2023
* update string --> text

* variable

* update var name
taylorfturner added a commit that referenced this pull request Aug 23, 2023
* Datetime generator and tests

* mock added

* clean up comments

* fix: add feature to test workflow

* git ignore and rm DSstore (#291)

* git ignore and rm DSstore

* Update .gitignore

Co-authored-by: Taylor Turner <taylorfturner@gmail.com>

---------

Co-authored-by: Taylor Turner <taylorfturner@gmail.com>

* pre-made list

* start and end type specification

* removing unneeded space

* better name for a function

* better format catch from michael

* better format catch from michael

* changed to equate better across languages

* docstring fix and update

* default values at declaration

* del space

* testing for format usage

* testing for format usage

* Float generator + tests (#292)

* fFloat generator

* extra line

* another line

* Update tests/test_float_generator.py

Co-authored-by: Michael Davis <36012613+micdavis@users.noreply.github.com>

* another line

* readability per michael's request

* clean up

* assertGreaterEqual

* better test_sig_figs

* sig_fig protection

* clearer assert

---------

Co-authored-by: Michael Davis <36012613+micdavis@users.noreply.github.com>

* categorical test + gen

* space and ensuring num_rows does not exceed nbr of categories

* space fix

* second row test

* fixed merge conflicts

* random int generator and test

* refactored implementation of int generator and tests

* added space between class and imports. added more test cases

* add eof

* new eof line

* one new line eof

* distinct generator test folder

* add module to distinct_generators

* rename folder

* text generator and test (#295)

* text generator and test

* fixed style and 256 to 255 error

* fixed eof

* new eof line

* one line eof

* updated num_rows test with multiple row counts

* removed whitespace in eof

* moved text_generator test into distinct_gen test folder

* pre-commit BLACK (#297)

* pre-commit BLACK

* added pre-commit to makefile and README and requirement-test

* added eof

* passed pre-commit checks for black hook

* file move

* just moving files that have already been merged into right folders

* directory change in files

* variable name change

* Update tests/distinct_generators/test_float_generator.py

Co-authored-by: Taylor Turner <taylorfturner@gmail.com>

* fixes

* black

* added isort

* fixed overriding issues with iSort and black hooks

* added eof whitespace

* changed exclude

* removed skip in isort setup.cfg

* included venv skip for isort

* fixed the skip folder name

* precommit stuff

* docs

* docs

* docs

* initial fixes for flake8 errors

* flake8 fixes

* Fixes to some docstrings and the bare except

* fixed docstrings

* removed noqa and a few other fixes

* check-manifest
Fixed conflicts by rebasing changes with flake8 changes.

* removed setup and added eof line

* added setup.cfg

* removed setup

* Categorical Generator w Probabilities (#308)

* Now handles probabilities

* Update tests/distinct_generators/test_categorical_generator.py

Co-authored-by: Taylor Turner <taylorfturner@gmail.com>

* no prints, new tests, and yeah

* regex

* pre commit stuff and line

* type change and doc update

* docs

* pre commit

---------

Co-authored-by: Taylor Turner <taylorfturner@gmail.com>

* whitespace hook added
fixed merge conflict with previous merge of check-manifest

* Revert "whitespace hook added"

This reverts commit 842feeb.

* added examples to exclude for general fixers hook

* added pyupgrade hook

* added autoflake hook (#312)

* added autoflake hook

* removed passes

* lots of stuff

* formatting

* isort

* removing ordered stuff until richard is done

* regex

* pr requests

* Update synthetic_data/dataset_generator.py

Co-authored-by: Taylor Turner <taylorfturner@gmail.com>

* len assert

* new test for path

* start/end date generator none type test

* start/end date generator none type test

* updates

* doc string

* whoops

* no more successful value error

* last commit? plz?

* generator parameter order updates plus DS generator update

* whoops

* error change

* datetime test update

* revert

* Empty colums_to_generate triggers warning, tests for **col_

* trying to fix github

* ahhh

* update

* whoops

* Name option for columns

* WIP tests

* fixed tests

* Finished tests

* ok now the tests are actually done

* test update per taylor's request

* small change to test

* last commit? plz?

* last commit? ;__;

* ok now it's last one

* text/string generators become one + removing to_csv from dataset_generator (#317)

* Getting rid of csv stuff

* merged string/txt generators and fixed all files affected by that merge

* docstring update and test loop deleted

* renamed file

* forgot this

* more renaming

* update string --> text (#321)

* update string --> text

* variable

* update var name

* connected tabular generator with dataset_generator

* tests

* empty

* updated tests and synthetic_data

* added get_ordered_column

* renamed vars, added more datetime formats in tests, added integration test between dataset_generator and get_ordered_column

* renamed data to actual_data

* added log error for not correct sorting option. added docstrings to get_ordered_col

* refactored log

* edge case for when sort is none, don't log error

* Tests for logging

* removed pass

* fixed typo

* removed redundant test

* renamed variables and removed redundant test case

* renamed variables again

* major datetime_test overhaul

* pre-commit failed

* minor fixes

* refactored tests

* changed assert and removed passes

* empty

* added get_ordered_column

* renamed vars, added more datetime formats in tests, added integration test between dataset_generator and get_ordered_column

* renamed data to actual_data

* added log error for not correct sorting option. added docstrings to get_ordered_col

* refactored log

* edge case for when sort is none, don't log error

* Tests for logging

* removed pass

* fixed typo

* removed redundant test

* renamed variables and removed redundant test case

* renamed variables again

* major datetime_test overhaul

* pre-commit failed

* minor fixes

* refactored tests

* changed assert and removed passes

* pre-commits

* connected dataset_generator to tabular_generator. Small fix to int generator

* pre-commits

* empty

* check what's happenignwith float

* added float

* added test cases

* refactored tests, fixed edge cases, and refactored synthesize method

* fixed issue with generate_columns, made tests DRYer, and edited test case

* removed unnecessary data and renamed var

* pre-commit

* updated test and col_data var. added tests to generators for edgecase

* changed tests for text and int gen and changed var name in test_generators

* readded test

* major refactor to tabular generator

* fixed pre-commits

* fixed distinct generator tests

* fixed edge case in distinct gens, docs, edge case for none generator, refactored uncorrelated_synthesize function, and implemented parameters test

* fixed a few test cases, removed default param values, and made uncorrelated_synthesize private

* Revert "fixed a few test cases, removed default param values, and made uncorrelated_synthesize private"

This reverts commit caabdbf.

* added fixes from prev reverted commit

* removed prints

* broken test updates:

* categorical fix

* int string error

* tests for get_ordered_column_integration, uncorrelated_synthesize outputs, and params_build

* remove print statements

* fixed precision edge case of int

* reintegrated outdated tests

* added test case for None columns

* removed print

* changed test to f string

* fixed docstrings for datetime generator

* empty commit

---------

Co-authored-by: lizlouise1335 <liz.smith@richmond.edu>
Co-authored-by: Jeremy Goodsitt <jeremy.goodsitt@gmail.com>
Co-authored-by: Richard Bann <87214439+drahc1R@users.noreply.github.com>
Co-authored-by: Michael Davis <36012613+micdavis@users.noreply.github.com>
Co-authored-by: TCH323 <richard@bann.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants