Skip to content

Releases: gretelai/gretel-synthetics

RC3

04 Aug 17:34
f7cf83b
Compare
Choose a tag to compare
RC3 Pre-release
Pre-release
Bugfix on generation, RC2 prep (#41)

* Bugfix on generation, RC2 prep

* Use SCM for install

Co-authored-by: John Myers <john@gretel.ai>

RC2

04 Aug 17:25
deb22ec
Compare
Choose a tag to compare
RC2 Pre-release
Pre-release
Support parallel synthetic text generation using multiprocessing (#39)

* Support parallel synthetic text generation using multiprocessing

* add cloudpickle to test reqs

* review comments

* set CUDA_VISIBLE_DEVICES to -1 in workers

* decode symbols one by one

* remove un-used var, bump version for RC

Co-authored-by: Malte Isberner <malte@gretel.ai>
Co-authored-by: John Myers <john@gretel.ai>

0.11.0 RC1

04 Aug 14:58
deb22ec
Compare
Choose a tag to compare
0.11.0 RC1 Pre-release
Pre-release

RC1. Adds "read only" most for Batches and default generation via CPUs with maximum parallelization.

Bugfix

19 Jun 23:47
57e6222
Compare
Choose a tag to compare

🐛 Corrects calculation of batch_size when using the DataFrameBatch interface

Bugfix

19 Jun 21:41
Compare
Choose a tag to compare

🐛 When generating new lines via Batch mode, passed max_invalid param is now used vs the module default

Bugfix for 0.10.x

18 Jun 20:14
f4f6279
Compare
Choose a tag to compare

🐞 Fix when synthetic Batches are converted back to a DataFrame with a custom field delimiter

DataFrame support and more!

16 Jun 00:14
Compare
Choose a tag to compare

Major changes to Gretel Synthetics including native support for DataFrames and batched column training!

⚙️ Introduce a batch module that allows a DataFrame to be ingested and split into batches of smaller DataFrames where each batch has a subset of the columns of the source DataFrame. This allows training of datasets with several columns while still allowing the preservation of correlations and statistical data. See our Medium Blog for details and our example dataframe_batch Notebook located in the examples directory.

📖 Massive updates to docstrings for the config module. Details for each config parameter.

🤖 Update to generation functionality. If a validator is provided, the gen_lines config option will be used only to count valid lines that are generated. In order to stop run away generation, a max_invalid parameter exists that specifies the maximum number of invalid lines that can be generated. If this number of invalid lines is exceeded, a RunTimeError will be thrown and generation will be halted.

Sentence Piece Updates

03 Jun 21:52
508276d
Compare
Choose a tag to compare

⬆️ Upgraded to latest SetencePiece and added a max_line_len param to the Config options. This allows you to override the default SentencePiece line limit and set a custom one. During our testing, we found that we had to set the limit a few thousand characters higher than the actual line limit. For a line that was 49500 chars long, we had to make the limit about 53000, etc.

PyPI Bug Fix

26 May 15:09
5edeff8
Compare
Choose a tag to compare

🐛 On installation from PIP where setup.py would fail.

📓 Updates to UCI Notebook

Python 3.6 Support

22 May 00:34
Compare
Choose a tag to compare

This update removes the annotations module from being used in order to provide type checks. We also provide Python 3.6 support by using the [3.6] extras option. By default, the package will work on Colab since Colab already installs a back port of dataclasses. So installing on Colab with the extras is not necessary.