Fix Flux schedule shift and add resolution-dependent schedule shift #905

mhirki · 2024-08-31T12:55:18Z

This is based on discussion after PR #892. This fixes --flux_schedule_shift by applying the shift to the sigmas. This makes it equivalent with Kohya's code.

This pull request also adds new option --flux_schedule_auto_shift for resolution-dependent shifting based on official Flux inference code:
https://github.com/black-forest-labs/flux/blob/b4f689aaccd40de93429865793e84a734f4a6254/src/flux/sampling.py#L66

People on Discord have been trying this yesterday already and results have been positive. Previous version of the code is here (before rebasing it due refactored train.py):
mhirki@7cafb86

I tested the code using my previous configuration from this run:
https://huggingface.co/mikaelh/flux-sanna-marin-v0.4-fp8-adan2

One of the observations from my own testing was that learning rate needed to be reduced. Initially I tried running with --flux_schedule_shift=3.1582 and lowering learning rate by 2x but that ended up overfitting so I ended up using a checkpoint from 2000 steps. Then I tried --flux_schedule_auto_shift with 5x lower learning rate and it turned out well.