v0.30.0: Advanced optimizer support, MoE DeepSpeed support, add upcasting for FSDP, and more
Core
- We've simplified the
tqdm
wrapper to make it fully passthrough, no need to havetqdm(main_process_only, *args)
, it is now justtqdm(*args)
and you can pass inis_main_process
as a kwarg. - We've added support for advanced optimizer usage:
- Schedule free optimizer introduced by Meta by @muellerzr in #2631
- LOMO optimizer introduced by OpenLMLab by @younesbelkada in #2695
- Enable BF16 autocast to everything during FP8 and enable FSDP by @muellerzr in #2655
- Support dataloader send_to_device calls to use non-blocking by @drhead in #2685
- allow gather_for_metrics to be more flexible by @SunMarc in #2710
- Add
cann
version info to command accelerate env for NPU by @statelesshz in #2689 - Add MLU rng state setter by @ArthurinRUC in #2664
- device agnostic testing for hooks&utils&big_modeling by @statelesshz in #2602
Documentation
- Through collaboration between @fabianlim (lead contribuitor), @stas00, @pacman100, and @muellerzr we have a new concept guide out for FSDP and DeepSpeed explicitly detailing how each interop and explaining fully and clearly how each of those work. This was a momumental effort by @fabianlim to ensure that everything can be as accurate as possible to users. I highly recommend visiting this new documentation, available here
- New distributed inference examples have been added thanks to @SunMarc in #2672
- Fixed some docs for using internal trackers by @brentyi in #2650
DeepSpeed
- Accelerate can now handle MoE models when using deepspeed, thanks to @pacman100 in #2662
- Allow "auto" for gradient clipping in YAML by @regisss in #2649
- Introduce a
deepspeed
-specific Docker image by @muellerzr in #2707. To use, pull thegpu-deepspeed
tagdocker pull huggingface/accelerate:cuda-deepspeed-nightly
Megatron
- Megatron plugin can support NPU by @zhangsheng377 in #2667
Big Modeling
Bug Fixes
- Fix up state with xla + performance regression by @muellerzr in #2634
- Parenthesis on xpu_available by @muellerzr in #2639
- Fix
is_train_batch_min
type in DeepSpeedPlugin by @yhna940 in #2646 - Fix backend check by @jiqing-feng in #2652
- Fix the rng states of sampler's generator to be synchronized for correct sharding of dataset across GPUs by @pacman100 in #2694
- Block AMP for MPS device by @SunMarc in #2699
- Fixed issue when doing multi-gpu training with bnb when the first gpu is not used by @SunMarc in #2714
- Fixup
free_memory
to deal with garbage collection by @muellerzr in #2716 - Fix sampler serialization failing by @SunMarc in #2723
- Fix deepspeed offload device type in the arguments to be more accurate by @yhna940 in #2717
Full Changelog
- Schedule free optimizer support by @muellerzr in #2631
- Fix up state with xla + performance regression by @muellerzr in #2634
- Parenthesis on xpu_available by @muellerzr in #2639
- add third-party device prefix to
execution_device
by @faaany in #2612 - add strict arg to load_checkpoint_and_dispatch by @SunMarc in #2641
- device agnostic testing for hooks&utils&big_modeling by @statelesshz in #2602
- Docs fix for using internal trackers by @brentyi in #2650
- Allow "auto" for gradient clipping in YAML by @regisss in #2649
- Fix
is_train_batch_min
type in DeepSpeedPlugin by @yhna940 in #2646 - Don't use deprecated
Repository
anymore by @Wauplin in #2658 - Fix test_from_pretrained_low_cpu_mem_usage_measured failure by @yuanwu2017 in #2644
- Add MLU rng state setter by @ArthurinRUC in #2664
- fix backend check by @jiqing-feng in #2652
- Megatron plugin can support NPU by @zhangsheng377 in #2667
- Revert "fix backend check" by @muellerzr in #2669
tqdm
:*args
should come ahead ofmain_process_only
by @rb-synth in #2654- Handle MoE models with DeepSpeed by @pacman100 in #2662
- Fix deepspeed moe test with version check by @pacman100 in #2677
- Pin DS...again.. by @muellerzr in #2679
- fix backend check by @jiqing-feng in #2670
- Deprecate tqdm args + slight logic tweaks by @muellerzr in #2673
- Enable BF16 autocast to everything during FP8 + some tweaks to enable FSDP by @muellerzr in #2655
- Fix the rng states of sampler's generator to be synchronized for correct sharding of dataset across GPUs by @pacman100 in #2694
- Simplify test logic by @pacman100 in #2697
- Add source code for DataLoader Animation by @muellerzr in #2696
- Block AMP for MPS device by @SunMarc in #2699
- Do a pip freeze during workflows by @muellerzr in #2704
- add cann version info to command accelerate env by @statelesshz in #2689
- Add version checks for the import of DeepSpeed moe utils by @pacman100 in #2705
- Change dataloader send_to_device calls to non-blocking by @drhead in #2685
- add distributed examples by @SunMarc in #2672
- Add diffusers to req by @muellerzr in #2711
- fix bnb multi gpu training by @SunMarc in #2714
- allow gather_for_metrics to be more flexible by @SunMarc in #2710
- Add Upcasting for FSDP in Mixed Precision. Add Concept Guide for FSPD and DeepSpeed. by @fabianlim in #2674
- Segment out a deepspeed docker image by @muellerzr in #2707
- Fixup
free_memory
to deal with garbage collection by @muellerzr in #2716 - fix sampler serialization by @SunMarc in #2723
- Fix sampler failing test by @SunMarc in #2728
- Docs: Fix build main documentation by @SunMarc in #2729
- Fix Documentation in FSDP and DeepSpeed Concept Guide by @fabianlim in #2725
- Fix deepspeed offload device type by @yhna940 in #2717
- FEAT: Add LOMO optimizer by @younesbelkada in #2695
- Fix tests on main by @muellerzr in #2739
New Contributors
- @brentyi made their first contribution in #2650
- @regisss made their first contribution in #2649
- @yhna940 made their first contribution in #2646
- @Wauplin made their first contribution in #2658
- @ArthurinRUC made their first contribution in #2664
- @jiqing-feng made their first contribution in #2652
- @zhangsheng377 made their first contribution in #2667
- @rb-synth made their first contribution in #2654
- @drhead made their first contribution in #2685
Full Changelog: v0.29.3...v0.30.0