Release v0.29.0: NUMA affinity control, MLU Support, and DeepSpeed Improvements · huggingface/accelerate

Core

Accelerate can now optimize NUMA affinity, which can help increase throughput on NVIDIA multi-GPU systems. To enable it either follow the prompt during accelerate config, set the ACCELERATE_CPU_AFFINITY=1 env variable, or manually using the following:

from accelerate.utils import set_numa_affinity

# For GPU 0
set_numa_affinity(0)

Big thanks to @stas00 for the recommendation, request, and feedback during development

Allow for setting deterministic algorithms in set_seed by @muellerzr in #2569
Fixed the test script for TPU v2/v3 by @vanbasten23 in #2542
Cambricon MLU device support introduced by @huismiling in #2552
A big refactor was performed to the PartialState and AcceleratorState to allow for easier future-proofing and simplification of adding new devices by @muellerzr in #2576
Fixed a reproducibility issue in distributed environments with Dataloader shuffling when using BatchSamplerShard by @universuen in #2584
notebook_launcher can use multiple GPUs in Google Colab if using a custom instance that supports multiple GPUs by @StefanTodoran in #2561

Big Model Inference

Add log message for RTX 4000 series when performing multi-gpu inference with device_map which can lead to hanging by @SunMarc in #2557
Fix load_checkpoint_in_model behavior when unexpected keys are in the checkpoint by @fxmarty in #2588

DeepSpeed

Fix issue with the mapping of main_process_ip and master_addr when not using standard as deepspeed launcher by @asdfry in #2495
Improve deepspeed env gen by checking for bad keys, by @muellerzr and @ricklamers in #2565
We now support custom deepspeed env files. Like normal deepspeed, set it with the DS_ENV_FILE environmental variable by @muellerzr in #2566
Resolve ZeRO-3 Initialization Failure in already-started distributed environments by @sword865 in #2578

What's Changed

Fix test_script.py on TPU v2/v3 by @vanbasten23 in #2542
Add mapping main_process_ip and master_addr when not using standard as deepspeed launcher by @asdfry in #2495
split_between_processes for Dataset by @geronimi73 in #2433
Include working driver check by @muellerzr in #2558
🚨🚨🚨Move to using tags rather than latest for docker images and consolidate image repos 🚨 🚨🚨 by @muellerzr in #2554
Add Cambricon MLU accelerator support by @huismiling in #2552
Add NUMA affinity control for NVIDIA GPUs by @muellerzr in #2535
Add log message for RTX 4000 series when performing multi-gpu inference with device_map by @SunMarc in #2557
Improve deepspeed env gen by @muellerzr in #2565
Allow for setting deterministic algorithms by @muellerzr in #2569
Unpin deepspeed by @muellerzr in #2570
Rm uv install by @muellerzr in #2577
Allow for custom deepspeed env files by @muellerzr in #2566
[docs] Missing functions from API by @stevhliu in #2580
Update data_loader.py to Ensure Reproducibility in Multi-Process Environments with Dataloader Shuffle by @universuen in #2584
Refactor affinity and make it stateful by @muellerzr in #2579
Refactor and improve model estimator tool by @muellerzr in #2581
Fix load_checkpoint_in_model behavior when unexpected keys are in the checkpoint by @fxmarty in #2588
Guard stateful objects by @muellerzr in #2572
Expound PartialState docstring by @muellerzr in #2589
[docs] Fix kwarg docstring by @stevhliu in #2590
Allow notebook_launcher to launch to multiple GPUs from Colab by @StefanTodoran in #2561
Fix warning log for unused checkpoint keys by @fxmarty in #2594
Resolve ZeRO-3 Initialization Failure in Pre-Set Torch Distributed Environments (huggingface/transformers#28803) by @sword865 in #2578
Refactor PartialState and AcceleratorState by @muellerzr in #2576
Allow for force unwrapping by @muellerzr in #2595
Pin hub for tests by @muellerzr in #2608
Default false for trust_remote_code by @muellerzr in #2607
fix llama example for pippy by @SunMarc in #2616
Fix links in Quick Tour by @muellerzr in #2617
Link to bash in env reporting by @muellerzr in #2623
Unpin hub by @muellerzr in #2625

New Contributors

@asdfry made their first contribution in #2495
@geronimi73 made their first contribution in #2433
@huismiling made their first contribution in #2552
@universuen made their first contribution in #2584
@StefanTodoran made their first contribution in #2561
@sword865 made their first contribution in #2578

Full Changelog: v0.28.0...v0.29.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.29.0: NUMA affinity control, MLU Support, and DeepSpeed Improvements

Core

Big Model Inference

DeepSpeed

What's Changed

New Contributors

Contributors