Skip to content

Releases: gretelai/gretel-synthetics

Validation loss splitting

30 Apr 16:02
b7d4cb6
Compare
Choose a tag to compare
Aw/core 107 validate (#93)

* add validation_split param

Auto-select Tokenizer

20 Apr 20:33
a15f0f8
Compare
Choose a tag to compare

Automatically select character-based tokenization over SentencePiece if vocab_size is set to zero.

Misc updates

06 Apr 22:10
Compare
Choose a tag to compare
  • Added a new Record Generator object to DF mode that generates entire records with custom validation
  • Added custom RuntimeError when not enough training data is ingested
  • Added ability for custom callbacks to capture epoch training details

Batch DF Updates

29 Jan 20:30
e119a26
Compare
Choose a tag to compare

Updated routines for generating data to return summary objects that have more detail on their properties.

Smart seeding bugfix

08 Dec 19:56
fd11045
Compare
Choose a tag to compare

Bugfix to ensure model weights are reset when a list of seed values is provided to the generator

Seeding and DP updates

25 Nov 21:42
2ccadcb
Compare
Choose a tag to compare

⚙️ Smart seeding now supports a list of seeds. A list of seeds will yield a 1:1 mapping of seeds to generated lines. This is useful for synthesizing partial data tables

⚙️ When using DataFrame Batch mode, we now will write out the original Training DF header order to the model directory. When a model is loaded from disk, the resulting generated DataFrame will have the columns ordered the way they were in the training data.

🐛 When using DP mode. We (temporarily) will patch TensorFlow 2.4.x to utilize new Keras LSTM codepaths. This will be globally patched for Keras within the running Python Interpreter. This provides a drastic speedup when training a DP model.

📖 Doc updates for new seeding features.

Modular refactor, tokenizers, and differential privacy, oh my!

17 Nov 19:01
e15479e
Compare
Choose a tag to compare

Major changes:

  • Totally refactored modules and package structure. This will enable future contributions to utilize other underlying underlying ML libraries as the core engine. Configurations are now specific to underlying engine. LocalConfig can be replaced with TensorFlowConfig, although the former is still supported for backwards compatibility.

  • With TensorFlow 2.4.x, TensorFlow Privacy can be used to provide differential private training with modified Keras DP optimizers.

  • Added new tokenizer module that can be used independently from the underlying model training. By default, we continue to use SentencePiece as the tokenizer. We have also added a char-by-char tokenizer that can be useful to use when using differential privacy.

  • Misc bug fixes and optimizations

  • Changes in this release are backwards compatible with previous versions.

Please see our updated README and examples directory.

RC0 0.15.0

13 Nov 15:43
bd18546
Compare
Choose a tag to compare
RC0 0.15.0 Pre-release
Pre-release
v0.15.0.rc0

Update README.md

Smart seeding

20 Oct 11:25
0d26b6f
Compare
Choose a tag to compare

Enable "Smart Seeding" which allows a prefix to be provided during line generation. The generator will complete the line based on the provided seed. When training on structured data (DataFrames) this enables the first N column values to be pre-provided and then remaining columns will be generated based on the initial values.

v0.14.0: Jm/syn 21 (#58)

05 Oct 19:11
53c3df2
Compare
Choose a tag to compare
  • Introduce Keras Early Stopping and Save Best Model features. Set default number of epochs to 100 which should allow most training sequences to automatically stop without potential over-fitting.

  • Provide better tracking of which epoch's model was used as the best one in the model history table

  • Temporarily disable DP mode