Support `aten::split_with_sizes_copy.out`/`aten::_chunk_cat`/`aten::_chunk_cat.out` to align with CUDA as a fast pass #1213

zhangxiaoli73 · 2024-12-25T08:49:49Z

🚀 The feature, motivation and pitch

Motivation

torch.split_with_sizes_copy.out is called in PyTorch FSDP2 https://github.com/pytorch/pytorch/blob/main/torch/distributed/fsdp/_fully_shard/_fsdp_collectives.py#L105

CUDA has a fast pass implementation for this aten op. https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/native_functions.yaml#L14757 . To achieve same performance as CUDA, please enable a similar fast path for XPU device.

Other cuda specific aten ops in FSDP2:
aten::_chunk_cat
https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/native_functions.yaml#L5720

aten::_chunk_cat.out
https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/native_functions.yaml#L5725

Customer impact
FSDP2 performance on XPU device.

Alternatives

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

xytintel · 2024-12-26T07:13:11Z

#1089

fengyuan14 changed the title ~~Support a aten split_with_sizes_copy.out fast path as CUDA~~ Support aten::split_with_sizes_copy.out to align with CUDA as a fast pass Dec 25, 2024

fengyuan14 self-assigned this Dec 25, 2024

fengyuan14 added the xpu-op label Dec 25, 2024

zhangxiaoli73 changed the title ~~Support aten::split_with_sizes_copy.out to align with CUDA as a fast pass~~ Support aten::split_with_sizes_copy.out/aten::_chunk_cat/aten::_chunk_cat.out to align with CUDA as a fast pass Dec 25, 2024

xytintel closed this as completed Jan 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support `aten::split_with_sizes_copy.out`/`aten::_chunk_cat`/`aten::_chunk_cat.out` to align with CUDA as a fast pass #1213

Support `aten::split_with_sizes_copy.out`/`aten::_chunk_cat`/`aten::_chunk_cat.out` to align with CUDA as a fast pass #1213

zhangxiaoli73 commented Dec 25, 2024 •

edited

Loading

xytintel commented Dec 26, 2024

Support aten::split_with_sizes_copy.out/aten::_chunk_cat/aten::_chunk_cat.out to align with CUDA as a fast pass #1213

Support aten::split_with_sizes_copy.out/aten::_chunk_cat/aten::_chunk_cat.out to align with CUDA as a fast pass #1213

Comments

zhangxiaoli73 commented Dec 25, 2024 • edited Loading

🚀 The feature, motivation and pitch

Alternatives

Additional context

xytintel commented Dec 26, 2024

Support `aten::split_with_sizes_copy.out`/`aten::_chunk_cat`/`aten::_chunk_cat.out` to align with CUDA as a fast pass #1213

Support `aten::split_with_sizes_copy.out`/`aten::_chunk_cat`/`aten::_chunk_cat.out` to align with CUDA as a fast pass #1213

zhangxiaoli73 commented Dec 25, 2024 •

edited

Loading