v0.27.0: PyTorch 2.2.0 Support, PyTorch-Native Pipeline Parallism, DeepSpeed XPU support, and Bug Fixes
PyTorch 2.2.0 Support
With the latest release of PyTorch 2.2.0, we've guaranteed that there are no breaking changes regarding it
PyTorch-Native Pipeline Parallel Inference
With this release we are excited to announce support for pipeline-parallel inference by integrating PyTorch's PiPPy framework (so no need to use Megatron or DeepSpeed)! This supports automatic model-weight splitting to each device using a similar API to device_map="auto"
. This is still under heavy development, however the inference side is stable enough that we are ready for a release. Read more about it in our docs and check out the example zoo.
Requires pippy
of version 0.2.0 or later (pip install torchpippy -U
)
Example usage (combined with accelerate launch
or torchrun
):
from accelerate import PartialState, prepare_pippy
model = AutoModelForSequenceClassification.from_pretrained("gpt2")
model = prepare_pippy(model, split_points="auto", example_args=(input,))
input = input.to("cuda:0")
with torch.no_grad():
output = model(input)
# The outputs are only on the final process by default
# You can pass in `gather_outputs=True` to prepare_pippy to
# make them available on all processes
if PartialState().is_last_process:
output = torch.stack(tuple(output[0]))
print(output.shape)
DeepSpeed
This release provides support for utilizing DeepSpeed on XPU devices thanks to @faaany
What's Changed
- Convert model.hf_device_map back to Dict by @SunMarc in #2326
- Fix model memory issue by @muellerzr in #2327
- Fixed typos in readme files of docs folder. by @rishit5 in #2329
- Disable P2P in just the 4000 series by @muellerzr in #2332
- Avoid duplicating memory for tied weights in
dispatch_model
, and in forward with offloading by @fxmarty in #2330 - Show DeepSpeed option when multi-XPU is selected in
accelerate config
by @faaany in #2346 - FIX: add oneCCL environment variable for non-MPI launcher (accelerate launch) by @faaany in #2339
- device agnostic test_accelerator/test_multigpu by @wangshuai09 in #2343
- Fix mpi4py/failing deepspeed test issues by @muellerzr in #2353
- Fix
block_size
picking inmegatron_lm_gpt_pretraining
example. by @nilq in #2342 - Fix dispatch_model with tied weights test on T4 by @fxmarty in #2354
- bugfix to allow usage of TE or MSAMP in
FP8RecipeKwargs
by @sudhakarsingh27 in #2355 - Pin DeepSpeed until patch by @muellerzr in #2366
- Remove init_hook_kwargs by @fxmarty in #2365
- device agnostic optimizer testing by @statelesshz in #2363
add_hook_to_module
andremove_hook_from_module
compatibility with fx.GraphModule by @fxmarty in #2369- Adding
requires_grad
tokwargs
when registering empty parameters. by @BlackSamorez in #2376 - Add
adapter_only
option tosave_fsdp_model
andload_fsdp_model
to only save/load PEFT weights by @AjayP13 in #2321 - device agnostic cli/data_loader/grad_sync/kwargs_handlers/memory_utils testing by @wangshuai09 in #2356
- Fix batch_size sanity check logic for
split_batches
by @izhx in #2344 - Pin Torch version to <2.2.0 by @Rocketknight1 in #2394
- Address PIP-632 deprecation of distutils by @AieatAssam in #2388
- [don't merge yet] unpin torch by @ydshieh in #2406
- Revert "[don't merge yet] unpin torch" by @muellerzr in #2407
- Fix CI due to pytest by @muellerzr in #2408
- Added activateEnviroment.sh to readme by @TJ-Solergibert in #2409
- Fix XPU inference by @notsyncing in #2383
- Fix the size of int and bool type when computing module size by @notsyncing in #2411
- Adding Local SGD support for NPU by @statelesshz in #2415
- Unpin torch by @muellerzr in #2418
- Use Ruff for formatting too by @akx in #2400
- torch-native pipeline parallelism for big models by @muellerzr in #2345
- Update FSDP docs by @pacman100 in #2430
- Make output end up on all GPUs at the end by @muellerzr in #2423
- Migrate pippy examples over and run tests by @muellerzr in #2424
- [FIX] fix the wrong
nproc_per_node
in the multi gpu test by @faaany in #2422 - Fix fp8 things by @muellerzr in #2403
- [FIX] allow
Accelerator
to prepare models in eval mode for XPU&CPU by @faaany in #2426 - [Fix] make all tests pass on XPU by @faaany in #2427
New Contributors
- @rishit5 made their first contribution in #2329
- @faaany made their first contribution in #2346
- @wangshuai09 made their first contribution in #2343
- @nilq made their first contribution in #2342
- @BlackSamorez made their first contribution in #2376
- @AjayP13 made their first contribution in #2321
- @Rocketknight1 made their first contribution in #2394
- @AieatAssam made their first contribution in #2388
- @ydshieh made their first contribution in #2406
- @notsyncing made their first contribution in #2383
- @akx made their first contribution in #2400
Full Changelog: v0.26.1...v0.27.0