Skip to content

Releases: bghira/SimpleTuner

v0.9.3.1 - follow-up improvements for fp16/fp32 removal

07 Apr 14:25
0f8c00d
Compare
Choose a tag to compare

What's Changed

  • pesky vae cache bugs by @bghira in #342
  • NaN guards for VAE
  • Remove fp16 fully
  • Remove fp32 training options
  • Remove upcast logic
  • Add dreambooth guide
  • Fix 'instanceprompt' caption strategy
  • Disable multiprocessing by default to save memory by @bghira in #344

Full Changelog: v0.9.3...v0.9.3.1

v0.9.3 - no more autocast

03 Apr 04:04
b917ac4
Compare
Choose a tag to compare
  • option renamed, vae_cache_behaviour -> vae_cache_scan_behaviour
  • add --encode_during_training to skip pre-processing of vae embeds (only - for now)
  • majorly reworked vae embed management code means there may be some issues at first
  • debug.log in project dir root now contains the DEBUG level outputs
  • add --adam_bfloat16 for sdxl and sd2.x, essentially mandatory now.
  • precision level improvements and fixes for SD 2.x and SDXL training
    • fp16 is no longer supported on SD 2.x or SDXL. use bf16.
    • the SD 2.x VAE can still run in fp16 mode, but it's not recommended. it's bf16 by default now.
  • pytorch 2.3 requirement on apple mps for access to bf16
  • mps: fixes to loading torch files from disk, moving items into correct dtypes
  • mps: fix multiprocessing, enable by default while preserving --disable_multiprocessing as a workaround
    • apple python uses "spawn" multiprocessing strategy but linux uses "fork", switched to fork by default
  • mps: enable unet attention slicing on SD 2.x to avoid NDArray crash in MPS
  • (for large datasets) preserve_data_backend_cache now accepts string values as well as bool, "image" and "text" to preserve just a split of the cache.
  • skip certain OS directories on MacOS and Jupyter notebooks

pull requests

  • sd2x: num steps remaining fix
  • vaecache: exit with problematic data backend id by @bghira in #332
  • Feature/on demand vae cache by @bghira in #336
  • stochastic bf16 impl by uptightmoose by @bghira in #337
  • next by @bghira in #339

Full Changelog: v0.9.2...v0.9.3

v0.9.2 - an apple a day

22 Mar 01:44
5f3dda7
Compare
Choose a tag to compare

What's Changed

Train LoRAs for SD 1.x/2.x/XL models on Apple hardware now.

  • metadatabackend: add parquet support for metadata by @bghira in #321
    • Much quicker metadata experience for extremely-large datasets.
  • remove --crop as global argument
    • Use crop, crop_style, and crop_aspect via multidatabackend.json
  • added --disable_multiprocessing for certain situations where it may help performance by @bghira in #328
  • apple mps: various bugfixes for LoRA training, SDXL, SD 2.x
  • sd2x: various bugfixes for EMA, validations noise scheduler config
  • add --disable_multiprocessing for possible performance improvements on certain systems
  • metadata: abstract logic into pluggable backends
  • metadata: support for parquet backend, pull data directly from Pandas dataframes
  • vaecache: improve and fix logic for scan_for_errors=true
  • aspect bucketing: make it more robust for extremely diverse datasets by @bghira in #323

Full Changelog: v0.9.1...v0.9.2

v0.9.1 - DoRA the explorah

28 Feb 14:41
396ff92
Compare
Choose a tag to compare

This release has some breaking changes for users who:

  • Use RESOLUTION_TYPE=area (resolution_type=area for multidatabackend config)
  • Use crop=false
  • Use crop=true and crop_aspect=preserve

as the precision level for aspect buckets has changed.

Updating to this release is recommended, but if you're in the middle of a training run, don't update yet.

What's Changed

  • prompt handler: check for index error before requesting caption from parquet backend
  • vaecache: more robust handling of batched encoding
  • vaecache: fix for trailing slash existing in cache_dir property resulting in excessive scans at startup
  • bucket manager: cheaply remove duplicate images from the dataset during sorting
  • aspect bucketing: modify precision of bucket categories to 3 decimals from 2, this might require re-caching aspect buckets and VAE outputs for users whose configuration matches the earlier description
  • sd2.x: LoRA training fixes, still not quite right, but better
  • DoRA: initial support from PEFT integrated for XL and legacy models.

by @bghira in #320

Full Changelog: v0.9.0...v0.9.1

v0.9.0

26 Feb 02:35
22e3cb5
Compare
Choose a tag to compare

note: these changes include all of the v0.9 improvements through all RCs, since v0.8.2.

SimpleTuner v0.9.0 Release Notes

I'm excited to announce the release of SimpleTuner v0.9.0! This release includes numerous features, improvements, and bug fixes that enhance the functionality, stability, and performance of SimpleTuner.

An experimental multi-node captioning script is included that was used to create photo-concept-bucket, a free dataset produced by my group that contains roughly 568k CogVLM image-caption pairs, incl other metadata such as dominant colours and aspect ratio.

Below is a summary of the key changes for v0.9.0.

New Features

  • Multi-Dataset Sampler: Enhanced support for training with multiple datasets, enabling more flexible and varied data feeding strategies.
  • Caption Filter Lists for Dataloaders: Ability to filter captions directly in the text embed layer, improving data quality for training.
  • Sine Learning Rate Scheduler: Introduced a sine scheduler to optimize learning rate adjustments, starting training at lr_end instead of learning_rate.
  • LoRA Trainer Support: Integration of LoRA (Low-Rank Adaptation) for efficient model training for SDXL and SD 1.5/2.x.
  • Advanced Caption Controls: Introduction of the parquet caption strategy, offering more efficient control over caption processing, especially for datasets with millions of images.
  • Distributed VAE Encoding/Captioning: Support for distributed VAE encoding and captioning scripts, enhancing performance for large-scale datasets.
  • Multi-Stage Resize for Very-Large Images: Improvements in handling very large images through multi-stage resizing, potentially reducing artifacts.
  • CSV to S3 Direct Upload: Functionality to upload data directly to S3 from CSV without saving locally first, streamlining data preparation.

Improvements

  • VAE Cache: Fixes and enhancements in VAE cache handling, including rebuilds in case of errors or every epoch for 'freshness'.
    • Dataset repeats are now implemented, such that an embed can be seen n times before it will be considered exhausted.
  • Text Embedding Cache: Optimizations in text embedding cache generation, writing, and processing for improved performance. Fixes to the threading behaviour.
  • CogVLM and Diffusers: Updates including default 4bit inference for CogVLM and bump to Diffusers 0.26.0.
  • AWS S3 and SD 2.x Fixes: Various fixes and enhancements for S3 data backends and Stable Diffusion 2.x support, including multi-GPU training and LoRA support fixes.
  • Logging Reduction: Major reduction in debug noise for cleaner and more informative logs.

Bug Fixes

  • Caption Processing: Fixes for issues related to prepend_instance_prompt doubling up prompt contents and handling of captions from parquet databases.
  • Optimizer Adjustments: Various fixes and adjustments for optimizers, including Adafactor and AdamW.
  • Training State Handling: Fixes for save/load state issues, ensuring correct handling of global steps, epochs, and more.

Breaking Changes

  • instance_data_dir is no longer in use - you must configure a data backend loader. See DATALOADER for more information.
  • CogVLM Filename Cleaning: Disabled filename cleaning by default. Projects relying on automatic filename cleaning will need to adjust their preprocessing accordingly.
  • Configuration Values: Configuration names and values have changed. Ensure to review your configuration.

Documentation and Miscellaneous

  • Documentation Updates: Comprehensive updates to installation guides, tutorials, and README to reflect new features and changes.
  • Kohya Config Conversion Script: Provided a script to convert Kohya basic parameters into SimpleTuner command-line arguments, facilitating easier migration and setup.

Full Changelog from v0.8.2: View the complete list of changes

We thank all contributors who have helped shape this release. Your contributions, bug reports, and feedback have been invaluable. Happy tuning!

v0.9.0-rc10

17 Feb 15:10
aa24608
Compare
Choose a tag to compare
v0.9.0-rc10 Pre-release
Pre-release

What's Changed

  • feature: sine scheduler so that training begins at lr_end by @bghira in #311
  • bugfix: prepend_instance_prompt was simply doubling up prompt contents by @bghira in #312
  • a script for converting kohya basic params into simpletuner cmdline args by @bghira in #313

Full Changelog: v0.9.0-rc9...v0.9.0-rc10

v0.9.0-rc9

14 Feb 02:56
4260e1e
Compare
Choose a tag to compare
v0.9.0-rc9 Pre-release
Pre-release

image

The work-in-progress Terminus checkpoint, trained with this release.

What's Changed

  • sd2.x: fix multi-gpu training with wandb
  • sd2.x: adafactor fixes by @bghira in #307
  • remove test.img folder writes debug code by @bghira in #308
  • slight fix-ups with batching and progress bars by @bghira in #309

Full Changelog: v0.9.0-rc8...v0.9.0-rc9

v0.9.0-rc8

11 Feb 05:22
2e88a1b
Compare
Choose a tag to compare
v0.9.0-rc8 Pre-release
Pre-release

What's Changed

Documentation fixes, improvement to training quality by removing DDIM from the setup.

Fixes for optimizers and unusual configuration value combinations.

Fixes for slow writes for text embed caching being overlooked, resulting in training not finding embeds.

  • fix documentation by @bghira in #296
  • Fix for zero snr_gamma
  • Fix subfolder support
  • Add null check for subfolders by @bghira in #300
  • prodigy optimizer added
  • wandb fix for multigpu training
  • adafactor: fix bugs, make it work like AdamW. added --adafactor_relative_step for the truly adventurous
  • fix xformers / other deps
  • zsnr: do not use betas from ddim sampler, instead use ddpm directly
  • adamw: fix non-8bit optimizer settings for LR not being passed in
  • text embed cache: add write thread progress bar, slow write warning and delay for encoding when we hit the buffer size. by @bghira in #305

Full Changelog: v0.9.0-rc7...v0.9.0-rc8

v0.9.0-rc7 - bugfix release

04 Feb 23:21
f7136f3
Compare
Choose a tag to compare
Pre-release

What's Changed

  • regressions and fixes for filtering captions by @bghira in #295

Full Changelog: v0.9.0-rc6...v0.9.0-rc7

v0.9.0-rc6 - getting closer!

01 Feb 17:45
07b427f
Compare
Choose a tag to compare
Pre-release

What's Changed

  • feature: caption filter lists for dataloaders
  • feature: print the aggregated statistics after caching image embeds to list the reasons images were skipped, and the counts
  • bugfix: sd 2.x feature parity to sdxl, lora training fixes, save/load state fixes
  • bugfix: --validation_resolution can now be specified in megapixels, it will auto-convert
  • S3DataBackend: fix the ability to use S3 prefixes for data read/write/discovery (vae_cache_prefix etc)
    by @bghira in #292

Full Changelog: v0.9.0-rc5...v0.9.0-rc6